Issue
I am using beautifulsoup
to scrape the data. There are multiple urls and I have to save the data I scrape from these urls in the same CSV file. When I try to scrape from separate files and save to the same CSV file, the data in the last url I scraped in the CSV file is there. Below is the piece of code that I scraped the data from.
images = []
pages = np.arange(1, 2, 1)
for page in pages:
url = "https://www.bkmkitap.com/sanat"
results = requests.get(url, headers=headers)
soup = BeautifulSoup(results.content, "html.parser")
book_div = soup.find_all("div", class_="col col-12 drop-down hover lightBg")
sleep(randint(2, 10))
for bookSection in book_div:
img_url = bookSection.find("img", class_="lazy stImage").get('data-src')
images.append(img_url)
books = pd.DataFrame(
{
"Image": images,
} )
books.to_csv("bkm_art.csv", index=False, header=True,encoding = 'utf-8-sig')
Solution
Main issue in your example is that you do not call the second page, so you wont get these results - Iterate all of them and then create your CSV.
Second one, as you want to append data to an existing file, is figured out by @M B
Note: Try to avoid selecting your elements by classes, cause they are more dynamic then id
or HTML structure
Example
import requests, random
from bs4 import BeautifulSoup
data = []
for page in range(1, 3, 1):
url = f"https://www.bkmkitap.com/sanat?pg={page}"
results = requests.get(url, headers=headers)
soup = BeautifulSoup(results.content, "html.parser")
for bookSection in soup.select('[id*="product-detail"]'):
data.append({
'image':bookSection.find("img", class_="lazy stImage").get('data-src')
})
books = pd.DataFrame(data)
books.to_csv("bkm_art.csv", index=False, header=True,encoding = 'utf-8-sig')
Output
image
0 https://cdn.bkmkitap.com/sanat-dunyamiz-190-ey...
1 https://cdn.bkmkitap.com/sanat-dunyamiz-189-te...
2 https://cdn.bkmkitap.com/tiyatro-gazetesi-sayi...
3 https://cdn.bkmkitap.com/mavi-gok-kultur-sanat...
4 https://cdn.bkmkitap.com/sanat-dunyamiz-iki-ay...
... ...
112 https://cdn.bkmkitap.com/hayal-perdesi-iki-ayl...
113 https://cdn.bkmkitap.com/cins-aylik-kultur-der...
114 https://cdn.bkmkitap.com/masa-dergisi-sayi-48-...
115 https://cdn.bkmkitap.com/istanbul-sanat-dergis...
116 https://cdn.bkmkitap.com/masa-dergisi-sayi-49-...
117 rows × 1 columns
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.