Issue
I'm scraping something in PyCharm and am looking to just make sure that it is working first before proceeding. The code will not print its outputs, though, to the console after I run it. Here is the code:
Thank you!!!
from bs4 import BeautifulSoup
import requests
url = "https://www.zillow.com/philadelphia-pa/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22Philadelphia%2C%20PA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-75.2058476090698%2C%22east%22%3A-75.17623602154539%2C%22south%22%3A39.9520661821946%2C%22north%22%3A39.97380838759173%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A13271%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Afalse%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22fsbo%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22auc%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A15%7D"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('section', class_="list-card-info")
for list in lists:
title = list.find('a', class_="list-card-addr")
price = list.find('div', class_="list-card-price")
info = [title, price]
print(info)
Solution
I think you can bypass the captcha by adding a header to the request:
header = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
'referer':'https://www.google.com/'
}
page = requests.get(url, headers=header)
soup = BeautifulSoup(page.content, 'html')
lists = soup.find_all(class_="list-card-info")
for list in lists:
title = list.find('address', class_="list-card-addr")
price = list.find('div', class_="list-card-price")
info = [title, price]
print(info)
It should return:
[<address class="list-card-addr">Good Food Flats - Student Housing | 4030 Baring St, Philadelphia, PA</address>, <div class="list-card-price">$825+<abbr class="list-card-label"> <!-- -->1 bd</abbr></div>]
[<address class="list-card-addr">Vue32 | 3201 Race St, Philadelphia, PA</address>, <div class="list-card-price">$2,037+<abbr class="list-card-label"> <!-- -->1 bd</abbr></div>]
[<address class="list-card-addr">N40 Apartments, 44 N 40th St, Philadelphia, PA 19104</address>, <div class="list-card-price">$3,221+/mo</div>]
[<address class="list-card-addr">Fairmount North | 2601 Poplar St, Philadelphia, PA</address>, <div class="list-card-price">$1,560+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">The HUB on Chestnut | 3945 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$1,680+<abbr class="list-card-label"> <!-- -->1 bd</abbr></div>]
[<address class="list-card-addr">Korman Residential at 3737 Chestnut | 3737 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$3,800+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">2116 Chestnut | 2116 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$2,115+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">Arrive University City | 3601 Market St, Philadelphia, PA</address>, <div class="list-card-price">$2,200+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">Chestnut Hall | 3900 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$1,325+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[None, None]
Answered By - IMB
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.