Issue
In my script below if I take out "return" statement and place there "print" then I get all the results. However, If i run it as it is, i get only the first item. My question is how I can get all the results using "return" in this case, I meant, what should be the process?
Here is the script:
import requests
from lxml import html
main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"
def abacus_scraper(main_link):
tree = html.fromstring(requests.get(main_link).text)
for titles in tree.cssselect("a.issuesInYear"):
title = titles.cssselect("span")[0].text
title_link = titles.attrib['href']
return title, title_link
print(abacus_scraper(main_link))
Result:
('2017 - Volume 53 Abacus', '/journal/10.1111/(ISSN)1467-6281/issues?activeYear=2017')
Solution
As soon as you return from a function, you exit the for loop.
You should keep a list inside abacus, and append to the list on each iteration. After the loop is finished, then return the list.
For example:
import requests
from lxml import html
main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"
def abacus_scraper(main_link):
results = []
tree = html.fromstring(requests.get(main_link).text)
for titles in tree.cssselect("a.issuesInYear"):
title = titles.cssselect("span")[0].text
title_link = titles.attrib['href']
results.append([title, title_link])
return results
print(abacus_scraper(main_link))
Answered By - Solaxun
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.