Issue
I'm trying to automate searching for ads in Facebook Ads Library. For that, I've used Selenium and BeautifulSoup to get the page's code.
The BeautifulSoup function returns a bs4.ResultSet with the page's HTML, which as I understand is a list.
I'm trying to loop through that list with soup.find_all, and for each element that is found, I want to test and see if there's a specific string in that.
But actually, my code isn't working as expected. The if statement inside the for loop always returns False.
# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)
# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
ads_list = []
for tag in soup.find_all('div', class_='_99s5'):
if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag):
ads_list.append(tag)
else:
None
Solution
As mentioned before, the strategy of using classes is not the best, as they can be very dynamic, so it would be better to stick to id, tag or perhaps text - but sometimes there may be no alternatives.
To select only the cards with a <span>
containing the information that it has been used in ads, you can work with css selectors
.
Following line will search for your outer <div>
with class _99s5
, that has a <span>
containing your text and creates a ResultSet
with these outer <div>
:
ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')
Example
Note: Language of your browser/driver should be englisch, else you have to change the text you expect to find.
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)
# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')
Alternativ, not that happy about, but to give you an orientation would be to select the <div>
with a direct child <span>
containing your text and move up the structure with .parent
:
ads_list = []
for tag in soup.select('div > span:-soup-contains("ads use this creative and text")'):
ads_list.append(tag.parent.parent.parent.parent.parent.parent)
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.