Issue
I am working on google search crawling
Here is my code
def parse(self, response):
all_page = response.xpath('//*[@id="main"]')
for page in all_page:
title = page.xpath('//*[@id="main"]/div/div/div/a/h3/div/text()').extract()
link = page.xpath('//*[@id="main"]/div/div/div/a/@href').extract()
print('title', title)
print('link', link)
output is
title ['iPhone - Compare Models - Apple', 'iPhone - Compare Models - Apple (MY)',......]
link[https://www.apple.com/iphone/compare/&sa=U&ved=2ahUKEwiKvsnnmLDxAhWZIDQIHXEdA60QFjAGegQIBRAB&usg=AOvVaw1FCyWoMh1LcbM65W6l8ypN', '/url?q=https://www.apple.com/my/iphone/compare/&sa=U&ved=2ahUKEwiKvsnnmLDxAhWZIDQIHXEdA60QFjAHegQICBAB&usg=AOvVaw3i33ED_sBrbAuNLAJsOlxe',....]
I want like this
title : 'iPhone - Compare Models - Apple'
title : ''iPhone - Compare Models - Apple (MY)'
How to do that?
Thank you
Solution
Your title
and link
results are list
s with several items, so you need to loop over them, in parallel with zip
assuming you get one link per title.
It looks like it from your example, but make sure it is the case.
def parse(self, response):
all_page = response.xpath('//*[@id="main"]')
for page in all_page:
titles = page.xpath('//*[@id="main"]/div/div/div/a/h3/div/text()').extract()
links = page.xpath('//*[@id="main"]/div/div/div/a/@href').extract()
for title, link in zip(titles, links):
print (f"title: '{title}'\n\n"
f"Link: {link}")
Answered By - Trevis
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.