Issue
I am trying to scrape information via Requests+BeautifulSoup from a page that requires log in.
My idea was inserting my credentials via Selenium and, once logged in, launch r=requests.get(url)
and then soup = bs(r.text, "html.parser")
, and perform my scraping.
But even if I manage to insert my credentials and access the target url page, the html I get from Requests is still the one from the log-in page.
In detail (but not real datas...):
url = 'https.place_holder' #the page from which I want to scrape data
browser.get(url) #the browser gets redirected to the log-in page
# I add my credentials via Selenium
user_name = browser.find_element('name', 'os_username')
user_name.send_keys('Donald_Duck')
pwd = browser.find_element('name', 'os_password')
pwd.send_keys('I_love_Mickey')
log_in_button = browser.find_element('name', 'login')
log_in_button.click()
print('\nLOGIN SUCCESSFUL!\n\n')`
#at this point I can see that via Selenium I got access to the page from which I want to access data
current_page = browser.current_url #to refresh page after logging in
r = requests.get(current_page, headers=headers)
soup = bs(r.text, "html.parser")
#at this point I would expect to be able to scrape from the target page, but if I check the html of r, I can clearly see that I still find myself in the log-in page.
How can I solve this issue?
Solution
If you are still using selenium
there are two options in my opinion:
scrape the elements you need with
selenium
in the way you still located the input fieldsSimply convert
browser.page_source
intobs4
object to go withbeautifulsoup
, so there is no need for use ofrequests
in your usecase:soup = bs(browser.page_source, "html.parser")
If you really need to use requests
check following question: How to "log in" to a website using Python's Requests module?
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.