Issue
I'm trying to scrape content from page similar to this: https://www.newsweek.pl/nwpl_2018002_20181231
. It has "More" (pl. Więcej) button at the bottom of the page, which dynamically loads next articles. Preferably I would like to use Scrapy to do the task, because my other spiders use it, but first I need all of the articles urls; so I'm trying to click()
this button with Selenium as follow:
def parse_issue(self, response):
self.logger.info('Parse function called parse_issue on {}'.format(response.url))
self.driver.get(response.url)
while True:
try:
more_button = self.driver.find_element_by_xpath('//div[@class="showMoreBtn"]')
time.sleep(2)
more_button.click()
time.sleep(5)
print('clicked.')
except Exception as e:
print(e)
break
articles_elements = self.driver.find_elements_by_xpath('.//div[@class="pure-u-1-1 pure-u-md-1-4 smallItem"]/a')
articles_url = [element.get_attribute("href") for element in articles_elements]
print(articles_url, response.url)
Unfortunately, as a result I only get urls of articles that are already in the source of the page. Can someone suggest me what I'm doing wrong?
Solution
You need to change the logic.While running infinte loop and loading for the page to get element values.
- Induce
WebDriverWait
() andelement_to_be_clickable
() to click on the button. - Induce
WebDriverWait
() andvisibility_of_all_elements_located
() to get all elements. - Declare a
list
beforewhile loop
and while clicking onMore button
keep checking if item exists already inside the list else append into list.
Code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver=webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.newsweek.pl/nwpl_2018002_20181231")
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button.cmp-button_button.cmp-intro_acceptAll"))).click()
articles_url=[]
while True:
try:
articles_elements=WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a.elemRelative")))
for element in articles_elements:
if element.get_attribute("href") in articles_url:
continue
else:
articles_url.append(element.get_attribute("href"))
#Click Show More Button
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,"//div[@class='showMoreBtn']"))).click()
time.sleep(2)
except Exception as e:
print(e)
break
print(articles_url)
print(len(articles_url))
Output on Console:
['https://www.newsweek.pl/dariusz-cwiklak-kosciol-i-rzad-szczuja-seksem-felieton/8cljc0t', 'https://www.newsweek.pl/tomasz-lis-kosciol-i-panstwo-pis-felieton/jj6jnh1', 'https://www.newsweek.pl/marcin-marczak-testuje-samochody/qbv7f1m', 'https://www.newsweek.pl/biznes/marcin-marczak-testuje-samochody/q7g4gn2', 'https://www.newsweek.pl/notowania-rafal-bohenek-i-martyna-wojciechowska/61b04bd', 'https://www.newsweek.pl/krzysztof-materna-rzad-i-kosciol-drza-przed-seksem-felieton/59s9cvz', 'https://www.newsweek.pl/wydarzenie-tygodnia-zamachy-terrorystyczne-w-nowej-zelandii/bjy6wjz', 'https://www.newsweek.pl/henryk-sawka-episkopat-i-pedofilia-rysunek/rft1lz7', 'https://www.newsweek.pl/kartka-z-kalendarza-slowo-ok-debiutuje-w-prasie/gge77hw', 'https://www.newsweek.pl/z-bliska-wydra-olbrzymia-i-czapla-czarnobrzucha-zobacz-zdjecie/xqpy3hj', 'https://www.newsweek.pl/zbigniew-holdys-sedziowie-przysiegli-i-glosne-zbrodnie-felieton/z7tjc4x', 'https://www.newsweek.pl/marcin-meller-walka-ze-strachem-przed-homopropaganda-felieton/63t1d5m', 'https://www.newsweek.pl/biznes/gadzety-sluchawki-do-ucha-bez-kabli-nowe-technologie/91kjww0', 'https://www.newsweek.pl/gadzety-sluchawki-do-ucha-bez-kabli-nowe-technologie/lybxzqr', 'https://www.newsweek.pl/pis-straszy-osobami-lgbt-tak-partia-chce-wygrac-wybory/1ksj5s9', 'https://www.newsweek.pl/agata-bielik-robson-o-tym-dlaczego-prawica-boi-sie-seksu-wywiad/n30lmzt', 'https://www.newsweek.pl/polskie-nastolatki-i-seks-jak-strasza-nim-dorosli/mdey5s5', 'https://www.newsweek.pl/jacek-saryusz-wolski-sylwetka-polityka-i-znawcy-ue/lkkt1ed', 'https://www.newsweek.pl/madeleine-albright-o-przyszlosci-usa-i-trumpie-wywiad/66bffm3', 'https://www.newsweek.pl/polska/spoleczenstwo/slawomir-swierzynski-polityczne-ambicje-lidera-bayer-full/0nmhzme', 'https://www.newsweek.pl/wiedza/psychologia/psychologia-zlosc-jak-opowiadac-o-niej-dzieciom/r4k75fb', 'https://www.newsweek.pl/polska/spoleczenstwo/uprzedzenia-polakow-zalezne-od-pogladow-politycznych/v2724h3', 'https://www.newsweek.pl/wiedza/psychologia/psychologia-agnieszka-stein-jak-rozmawiac-z-dziecmi-o-seksie/jkt2yj6', 'https://www.newsweek.pl/wiedza/psychologia/michal-czernecki-o-zaufaniu-relacjach-i-swojej-ksiazce-wywiad/r711jgv', 'https://www.newsweek.pl/wiedza/historia/harry-kessler-sylwetka-niemieckiego-zolnierza-i-dyplomaty/j4h7yl7', 'https://www.newsweek.pl/wiedza/psychologia/psychologia-po-co-nam-wstyd-emocje-ochronne/5zjh7lg', 'https://www.newsweek.pl/swiat/real-madryt-tajemnice-najslynniejszego-klubu-pilkarskiego/0p9mz56', 'https://www.newsweek.pl/swiat/brexit-rozbije-jednosc-ue-tego-chcieliby-putin-i-murdoch/5b6pem5', 'https://www.newsweek.pl/swiat/bernie-sanders-pierwszy-socjalista-ameryki-i-kandydat-na-prezydenta/7th7vd1', 'https://www.newsweek.pl/wiedza/nauka/wyscig-plemnikow-jak-rajd-paryz-dakar-nowe-odkrycia-naukowcow/s0s996b', 'https://www.newsweek.pl/biznes/zakaz-handlu-w-niedziele-polacy-maja-dosc-czy-rzad-zniesie-zakaz/zyl36b1', 'https://www.newsweek.pl/zakaz-handlu-w-niedziele-polacy-maja-dosc-czy-rzad-zniesie-zakaz/yz7dzzm', 'https://www.newsweek.pl/gospodarka-i-obietnice-wyborcze-pis-kto-za-to-wszystko-zaplaci/r87v3kn', 'https://www.newsweek.pl/biznes/gospodarka-i-obietnice-wyborcze-pis-kto-za-to-wszystko-zaplaci/8xrz977', 'https://www.newsweek.pl/kultura/magdalena-i-borys-lankoszowie-o-filmie-ciemno-prawie-noc-wywiad/fjqj597', 'https://www.newsweek.pl/kultura/japonia-i-rodzina-w-filmie-zlodziejaszki-hirokazu-koreedy/r7kmsyn', 'https://www.newsweek.pl/wiedza/nauka/najstarsze-tatuaze-swiata-wielka-moc-kaktusowej-igly/pvppbzw', 'https://www.newsweek.pl/kultura/serhij-zadan-o-konformizmie-w-nowej-powiesci-recenzja/ntgrnc3', 'https://www.newsweek.pl/kultura/ellen-page-o-nowej-roli-w-serialu-hollywood-i-sytuacji-osob-lgbt/sp9zy0n', 'https://www.newsweek.pl/kultura/love-death-robots-nowy-serial-netflixa-o-przyszlosci/54bjyv6', 'https://www.newsweek.pl/kultura/grzyby-i-ludzie-czyli-nasladowanie-natury-recenzja-wystawy/v2bnhyv', 'https://www.newsweek.pl/kultura/powrot-the-cinematic-orchestra-na-bardzo-udanej-plycie-recenzja/c0pbept', 'https://www.newsweek.pl/kultura/ola-bilinska-i-konrad-kucz-razem-na-nowej-plycie-recenzja/7efdxdn', 'https://www.newsweek.pl/kultura/american-dream-po-polsku-w-ksiazce-doroty-malesy-recenzja/bv2d8lb', 'https://www.newsweek.pl/kultura/spektakl-nogi-syreny-to-znakomity-kabaret-recenzja/p002s9b', 'https://www.newsweek.pl/kultura/starosc-tez-radosc-recenzja-ksiazki-nasze-dusze-noca/sj0de13', 'https://www.newsweek.pl/wiedza/psychologia/psychologia-milosc-zlapana-w-pulapke-jak-dawac-przyklad/mq55r7p', 'https://www.newsweek.pl/opinie/tomasz-lis-kosciol-i-panstwo-pis-felieton/wgbkt0m']
48
Answered By - KunduK
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.