Issue
I have the code below that I wrote using selenium and beautifulsoup. I'm using it to screen scrape some info about home prices from zillow. I'm scraping based on zip code. I'm able to get the home details by using find_all to pull the ul class, I have an example of my current code and output below. What I would like to do is further parse the output from the find_all into a dict with the home details as keys and the values as values. I've included an example output below. Can anyone suggest how to do this?
code:
import pandas as pd
import numpy as np
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
Options
options = Options()
chrome_options = Options()
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"), chrome_options=chrome_options)
#re allows for matching text with regular expressions (including through BeautifulSoup)
#dateutil.parser provies .parse() to convert plain text dates in a variety of formats into datetime objects
import re, dateutil.parser
#BeautifulSoup provide a model for the source HTML
from bs4 import BeautifulSoup
#Webdriver is interface to the selected browser (PhantomJS)
from selenium import webdriver
#Ability to select values in HTML <select> tags
from selenium.webdriver.support import select
import time
from selenium.webdriver.common import action_chains, keys
driver.get(zip_url)
tstsoup = BeautifulSoup(driver.page_source)
zip_url = 'https://www.zillow.com/homes/for_sale/95536_rb/'
tstsoup.find_all('ul',{'class':'list-card-details'})
output:
<ul class="list-card-details"><li class="">4<abbr class="list-card-label"> <!-- -->bds</abbr></li><li class="">2<abbr class="list-card-label"> <!-- -->ba</abbr></li><li class="">2,172<abbr class="list-card-label"> <!-- -->sqft</abbr></li><li class="list-card-statusText">- House for sale</li></ul>
desired output:
{'bds:4,'ba':2,'sqft':2,172}
Solution
I have take data as html and find li
according in html where data is split to find key and value pair for d
as dictionary and appened to list lst
which make list of dictionaries
html="""<ul class="list-card-details"><li class="">4<abbr class="list-card-label"> <!-- -->bds</abbr></li><li class="">2<abbr class="list-card-label"> <!-- -->ba</abbr></li><li class="">2,172<abbr class="list-card-label"> <!-- -->sqft</abbr></li><li class="list-card-statusText">- House for sale</li></ul>
"""
soup = BeautifulSoup(html, 'html.parser')
main_data=soup.find_all('ul',{'class':'list-card-details'})
lst=[]
for data in main_data:
d={ i.text.split(" ")[1] : i.text.split(" ")[0] for i in data.find_all("li",class_="") }
lst.append(d)
print(lst[0])
Ouput:
{'bds': '4', 'ba': '2', 'sqft': '2,172'}
Answered By - Bhavya Parikh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.