Issue
So I have a code that scrapes the Zillow page. It scrapes number of bedrooms, bathrooms and its size (in sqft). The list that I get is this:
['1 bd2 ba982 sqft', '3 bds2 ba1,462 sqft', etc.]
but I want to get it to be like this:
['1bd 2ba 982sqft', '3bds 2ba 1,462sqft', etc.]
What should I change in my code:
import requests
from bs4 import BeautifulSoup
# import gspread
URL = "https://www.zillow.com/san-francisco-ca/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.52499667529297%2C%22east%22%3A-122.34166232470703%2C%22south%22%3A37.662044543503555%2C%22north%22%3A37.88836615784793%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22days%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A12%7D"
header = {
"User-Agent": "YOUR AGENT",
"Accept-Language": "YOUR LANGUAGEA"
}
response = requests.get(URL, headers=header)
web_page = response.text
soup = BeautifulSoup(web_page, 'lxml')
# Bedrooms and bathrooms
quantity_dirty = soup.find_all("ul", class_="StyledPropertyCardHomeDetailsList-c11n-8-84-3__sc-1xvdaej-0 eYPFID")
quantity_list_clean = [quantity.getText() for quantity in quantity_dirty if not quantity.getText().startswith('--')]
print(quantity_list_clean)
Solution
Try to change a little bit how you extract the text from the <ul>
:
import requests
from bs4 import BeautifulSoup
# import gspread
URL = "https://www.zillow.com/san-francisco-ca/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.52499667529297%2C%22east%22%3A-122.34166232470703%2C%22south%22%3A37.662044543503555%2C%22north%22%3A37.88836615784793%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22days%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A12%7D"
header = {"User-Agent": "YOUR AGENT", "Accept-Language": "YOUR LANGUAGEA"}
response = requests.get(URL, headers=header)
web_page = response.text
soup = BeautifulSoup(web_page, "lxml")
# Bedrooms and bathrooms
quantity_dirty = soup.select(
"ul.StyledPropertyCardHomeDetailsList-c11n-8-84-3__sc-1xvdaej-0.eYPFID"
)
quantity_list_clean = [
" ".join(
q.getText(strip=True)
for q in quantity.select("li")
if not q.getText().startswith("--")
)
for quantity in quantity_dirty
]
print(quantity_list_clean)
Prints:
[
"1bd 2ba 982sqft",
"3bds 2ba 1,462sqft",
"1bd 1ba 1,310sqft",
"2bds 3ba 2,860sqft",
"2bds 1ba 682sqft",
"2bds 2ba 835sqft",
"3bds 2ba 1,550sqft",
"1bd 1ba 819sqft",
"4bds 4ba 2,568sqft",
]
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.