Issue
I am trying to find an object that is downloaded into the browser during the loading of a website.
This is the website, https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en,
I'm not very good with web technology and such.
I am trying to save the request and response headers and the actual response using only the link to the website.
If you look at the network traffic, you can see an object jobsearch.ftl?lang=en
that loads towards the end and you can see the reponse and headers.
Here are the screenshots, of the network event log showing the request and response headers.
And the actual response.
These are the objects that I want to save. How can I do that?
I have tried
import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException, TimeoutException, StaleElementReferenceException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
chromepath = "~/chromedriver/chromedriver"
caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {'performance': 'ALL'}
driver = webdriver.Chrome(executable_path=chromepath, desired_capabilities=caps)
driver.get('https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en')
def process_browser_log_entry(entry):
response = json.loads(entry['message'])['message']
return response
browser_log = driver.get_log('performance')
events = [process_browser_log_entry(entry) for entry in browser_log]
events = [event for event in events if 'Network.response' in event['method']]
But I only get some of the headers, they look like this,
{'method': 'Network.responseReceivedExtraInfo',
'params': {'blockedCookies': [],
'headers': {'Cache-Control': 'private',
'Connection': 'Keep-Alive',
'Content-Encoding': 'gzip',
'Content-Security-Policy': "frame-ancestors 'self'",
'Content-Type': 'text/html;charset=UTF-8',
'Date': 'Mon, 27 Sep 2021 18:18:10 GMT',
'Keep-Alive': 'timeout=5, max=100',
'P3P': 'CP="CAO PSA OUR"',
'Server': 'Taleo Web Server 8',
'Set-Cookie': 'locale=en; path=/careersection/; secure; HttpOnly',
'Transfer-Encoding': 'chunked',
'Vary': 'Accept-Encoding',
'X-Content-Type-Options': 'nosniff',
'X-UA-Compatible': 'IE=edge',
'X-XSS-Protection': '1'},
'headersText': 'HTTP/1.1 200 OK\r\nDate: Mon, 27 Sep 2021 18:18:10 GMT\r\nServer: Taleo Web Server 8\r\nCache-Control: private\r\nP3P: CP="CAO PSA OUR"\r\nContent-Encoding: gzip\r\nVary: Accept-Encoding\r\nX-Content-Type-Options: nosniff\r\nSet-Cookie: locale=en; path=/careersection/; secure; HttpOnly\r\nContent-Security-Policy: frame-ancestors \'self\'\r\nX-XSS-Protection: 1\r\nX-UA-Compatible: IE=edge\r\nKeep-Alive: timeout=5, max=100\r\nConnection: Keep-Alive\r\nTransfer-Encoding: chunked\r\nContent-Type: text/html;charset=UTF-8\r\n\r\n',
'requestId': '1E3CDDE80EE37825EF2D9C909FFFAFF3',
'resourceIPAddressSpace': 'Public'}},
{'method': 'Network.responseReceived',
'params': {'frameId': '1624E6F3E724CA508A6D55D556CBE198',
'loaderId': '1E3CDDE80EE37825EF2D9C909FFFAFF3',
'requestId': '1E3CDDE80EE37825EF2D9C909FFFAFF3',
'response': {'connectionId': 26,
They don't contain all the information I can see from the web inspector in chrome.
I want to get the whole response and request headers and the actual response. Is this the correct way? Is there another better way which doesn't use selenium and only requests instead?
Solution
You can use the selenium-wire
library if you want to use Selenium
to work with this. However, if you're only concerned for a specific API, then rather than using Selenium, you can use the requests
library for hitting the API and then print the results of the request
and response
headers.
Considering you're looking for the earlier, using the Selenium way, one way to achieve this is using selenium-wire
library. However, it will give the result for all the background API's/requests being hit - which you can then easily filter after either piping the result to a text file or in terminal itself
Install using pip install selenium-wire
Install webdriver-manager
using pip install webdriver-manager
Install Selenium 4 using pip install selenium==4.0.0.b4
Use this code
from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
svc= Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=svc)
driver.maximize_window()
# To use firefox browser
driver.get("https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en")
for request in driver.requests:
if request.response:
print(
request.url,
request.response.status_code,
request.headers,
request.response.headers
)
which gives a detailed output of all the requests - copying the relavent one -
https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en 200
Host: epco.taleo.net
Connection: keep-alive
sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Date: Tue, 28 Sep 2021 11:14:14 GMT
Server: Taleo Web Server 8
Cache-Control: private
P3P: CP="CAO PSA OUR"
Content-Encoding: gzip
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
Set-Cookie: locale=en; path=/careersection/; secure; HttpOnly
Content-Security-Policy: frame-ancestors 'self'
X-XSS-Protection: 1
X-UA-Compatible: IE=edge
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html;charset=UTF-8
Answered By - demouser123
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.