Issue
I'm currently working on some web scraping using python and selenium, and I can't seem to pull the link information from a href in an anchor tag for a specific class. for reference, its from zillow (specifically, this url : https://www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/ ).
I've tried a few different options in order to select the anchor tag listed but can't seem to return the information i need :
links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
-- returns
None
also tried
links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
-- returns
None
links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
-- returns
None
and lastly
links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))
I know I can pull all the anchor tags, but certainly there is a step im missing here to get the nested anchor tag value? or am i pulling the wrong class? not sure where im going wrong?
Solution
To print the value of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and using list slicing you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D') print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[class='list-card-top'] > a[href]")))])
Using XPATH in a single line:
driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D') print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='list-card-top']/a[@href]")))])
Console Output:
['https://www.zillow.com/homedetails/San-Francisco-CA-94134/15166498_zpid/', 'https://www.zillow.com/b/avery-450-san-francisco-ca-BTfktx/', 'https://www.zillow.com/b/solaire-san-francisco-ca-65g7KK/', 'https://www.zillow.com/homedetails/117-Saint-Charles-Ave-San-Francisco-CA-94132/15195262_zpid/', 'https://www.zillow.com/homedetails/433-40th-Ave-San-Francisco-CA-94121/15092586_zpid/', 'https://www.zillow.com/homedetails/123-Carl-St-San-Francisco-CA-94117/2078490576_zpid/', 'https://www.zillow.com/b/fifteen-fifty-san-francisco-ca-BdnYPc/', 'https://www.zillow.com/b/l-seven-san-francisco-ca-9NJtD7/', 'https://www.zillow.com/homedetails/4642-18th-St-San-Francisco-CA-94114/332858409_zpid/']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Answered By - undetected Selenium
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.