Issue
I have a problem, I cannot "take" the data I have extracted from selenium and store them somewhere to manipulate or store them
I am grabbing the data, like so:
try:
books = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.ID, "titleitem"))
)
finally:
driver.quit()
inside the try function I have extracted the data like this:
for i, book in enumerate(books):
splited = books[i].text.split("\n")
writer = str(splited[0])
title = str(splited[1])
publiser = str(splited[2])
country = str(splited[3])
ISBN = str(splited[4])
So in the end I have this code to extract exactly the data I want:
try:
books = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.ID, "titleitem"))
)
for i, book in enumerate(books):
splited = books[i].text.split("\n")
writer = str(splited[0])
title = str(splited[1])
publiser = str(splited[2])
country = str(splited[3])
ISBN = str(splited[4])
finally:
driver.quit()
Those variables are the things I want to grab. When I print them, they appear normal (as they are on the website) But then, I try to insert them to a pandas dataframe, like this (fake_books is declared as a pd.DataFrame()):
tmp = pd.Series({'title' : title, 'author': writer, 'publiser': ekdoths})
fake_books = fake_books.append(tmp)
I have also tries a list of dictionaries:
books = [{}]
...
for i, book in enumerate(books):
splited = books[i].text.split("\n")
books[i]['writer'] = str(splited[0])
books[i]['title'] = str(splited[1])
books[i]['ekdoths'] = str(splited[2])
books[i]['polh'] = str(splited[3])
books[i]['ISBN'] = str(splited[4])
Neither of those things work, the programm is just "lagging" and printing an emply dataframe of list
Solution
I always use this method, I create a list of dictionaries then I pass it into pd.DataFrame
# create empty list as the beginning of the code
df_list = []
for i, book in enumerate(books):
splited = books[i].text.split("\n")
writer = str(splited[0])
title = str(splited[1])
publiser = str(splited[2])
country = str(splited[3])
ISBN = str(splited[4])
# add the scraped data into dictionary then append it into df_list
df_list.append({"writer":writer, "title":title, "publiser":publiser, "country":country, "ISBN":ISBN})
# and the end of your code after scraping all you want
df = pd.DataFrame(df_list)
Answered By - Alberto Hanna
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.