Issue
a python beginner here. I am using BeautifulSoup to scrape the details(title, quantity in stock) of all books in the first page of books.toscrape.com . For that, first getting links to all the individual books has to take place. I have made the function page1_url for the same. The problem is, upon returning the list of the links extracted, only the first element of the list is returned. Please help in identifying the error or provide an alternative code using BeautifulSoup only. Thanks in advance!
import requests
from bs4 import BeautifulSoup
def page1_url(page1):
response= requests.get(page1)
data= BeautifulSoup(response.text,'html.parser')
b1= data.find_all('h3')
for i in b1:
l=i.find_all('a')
for j in l:
l1=j['href']
books_urls=[]
books_urls.append(base_url + l1)
books_urls=list(books_urls)
return books_urls
allPages = ['http://books.toscrape.com/catalogue/page-1.html',
'http://books.toscrape.com/catalogue/page-2.html']
base_url= 'http://books.toscrape.com/catalogue/'
bookURLs= page1_url(allPages[0])
print(bookURLs)
Solution
You are rewriting the books_urls
list for each link, and you are returning the function after the first element in the for j in l
loop:
import requests
from bs4 import BeautifulSoup
def page1_url(page1):
response= requests.get(page1)
data= BeautifulSoup(response.text,'html.parser')
b1= data.find_all('h3')
# you were rewriting this list for each link
books_urls = []
for i in b1:
l=i.find_all('a')
for j in l:
l1=j['href']
books_urls.append(base_url + l1)
# these lines had too many indents
books_urls=list(books_urls)
return books_urls
allPages = ['http://books.toscrape.com/catalogue/page-1.html',
'http://books.toscrape.com/catalogue/page-2.html']
base_url= 'http://books.toscrape.com/catalogue/'
bookURLs= page1_url(allPages[0])
print(bookURLs)
['http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html', 'http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html', 'http://books.toscrape.com/catalogue/soumission_998/index.html', 'http://books.toscrape.com/catalogue/sharp-objects_997/index.html', ... 'http://books.toscrape.com/catalogue/its-only-the-himalayas_981/index.html']
Answered By - chemicalwill
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.