Issue
I have a list of headers and values coming from a web scrapped page. But once i pass it to a for loop i get a very bigger list. In code i am iterating the headers and the web elements and storing them in a list, not sure why
header = ['h1','h2'......'h60']
Webscrapped list, in code get values like below
row.find_elements(By.TAG_NAME, "td")[i].text -> something like this values [1,2.........5]
here i got 'i' till 60 where the headers is also till 60, so i pass the same 'i' value for key and value parsing
Expected Size: The actual web page table size is [60 cols * 9 rows] But i get pandas dataframe around [60 cols * 580+ rows]
Code i tried
value_list = []
header_list = ['h1','h2','h3',.......,'h60']
for i in range(0,len(header_list)):
for row in table_trs:
value_list.append({
header_list[i]:row.find_elements(By.TAG_NAME, "td")[i].text
})
df = pd.DataFrame(value_list)
Solution
I think you meant to do this:
value_list = []
header_list = ['h1','h2','h3',.......,'h60']
for row in table_trs:
value_list.append({
header: row.find_elements(By.TAG_NAME, "td")[i].text
for i, header in enumerate(header_list)
})
df = pd.DataFrame(value_list)
i.e. each element in value_list
should be a complete dictionary representing a row in the table
whereas in your original code you made a dictionary per column per row
I think you could also do it like:
value_list = []
header_list = ['h1','h2','h3',.......,'h60']
for row in table_trs:
value_list.append([
row.find_elements(By.TAG_NAME, "td")[i].text
for i in range(0, len(header_list))
])
df = pd.DataFrame(value_list, columns=header_list)
to avoid repeating the headers in every row (more efficient if you have a lot of rows to build)
Answered By - Anentropic
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.