Issue
I am scraping The URLs of headlines and their dates from here So this gives me all URls and their which are present on the page. The dates on the respective urls ie the day their published are like June 24th 2021, June 21st 2021 and so on. I would like to scrape only those URLs which are published 6days before from today's date. ie if the webpage has 20URLs of the headlines which are published on dates, I want to scrape only those websites which are published from 4th August 2021 till today. So in my output there will only be dates from 4th August onwards uptil today.
Here's my code of extracting all URL headlines and their dates from the website
websites = ['https://www.thespiritsbusiness.com/tag/rum/']
for spirits in websites:
browser.get(spirits)
time.sleep(1)
news_links = browser.find_elements_by_xpath('//*[@id="archivewrapper"]/div/div[2]/h3')
n_links = [ele.find_element_by_tag_name('a').get_attribute('href') for ele in news_links]
dates = browser.find_elements_by_xpath('//*[@id="archivewrapper"]/div/div[2]/small')
n_dates = [ele.text for ele in dates]
print(n_links)
print(n_dates)
How can I do this? Please help me! Thanks in advance.
Solution
I would first collect all the links irrespective of the dates and then group them. Below code does the same.
driver.implicitly_wait(10)
driver.get("https://www.thespiritsbusiness.com/category/news/")
news = driver.find_elements_by_xpath("//div[@id='archivewrapper']")
newsdata = {}
for ne in news:
datechk = ne.find_element_by_tag_name("small").get_attribute("innerText").replace(' ','')
link = ne.find_element_by_xpath("//div[@id='archivewrapper']//h3/a").get_attribute("href")
if datechk in newsdata:
newsdata[datechk].append(link)
else:
newsdata[datechk] = [link]
dates = "August{}th,2021"
for i in range(4,12):
if dates.format(i) in newsdata:
print(("{} : {}".format(dates.format(i),newsdata[dates.format(i)])))
driver.quit()
And the output:
August4th,2021 : ['https://www.thespiritsbusiness.com/2021/08/cannabis-drinks-market-to-reach-us6bn-by-2031/', 'https://www.thespiritsbusiness.com/2021/08/cannabis-drinks-market-to-reach-us6bn-by-2031/', 'https://www.thespiritsbusiness.com/2021/08/cannabis-drinks-market-to-reach-us6bn-by-2031/']
August6th,2021 : ['https://www.thespiritsbusiness.com/2021/08/cannabis-drinks-market-to-reach-us6bn-by-2031/', 'https://www.thespiritsbusiness.com/2021/08/cannabis-drinks-market-to-reach-us6bn-by-2031/']
Answered By - pmadhu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.