Issue
all I'm attempting to pull table data from a site that is displayed from a search box. I'm having trouble cleaning up the data in excel. Are there any tricks to get this a little neater? here is the code and screenshot below.
I need it to display the headers of Case Number, Style/Defendant, File Date and then the case numbers below, description of the defendant case, and file date.
soup = BeautifulSoup(driver.page_source,'html.parser')
df = pd.read_html(str(soup))[0]
df.dropna(inplace=True)
print(df)
datatoexcel = pd.ExcelWriter('Case_Files.xlsx')
df.to_excel(datatoexcel)
datatoexcel.save()
print('DataFrame is Written to Excel Successfully.')
This is what the table looks like from the print(df) screen
Unnamed: 0_level_0 Name Date of Birth
Case Number Style / Defendant File Date
Case Number Style / Defendant File Date
2 Case Number Style / Defendant File Date
3 08A575873 In the Matter of Petition for Compromise of Mi... 11/17/2008
6 Case Number Style / Defendant File Date
7 08A575874 In the Matter of Petition for Compromise of Mi... 11/17/2008
DataFrame is Written to Excel Successfully.
Solution
IIUC, drop the first two levels of column index then filter out your rows:
>>> df.droplevel(level=[0, 1], axis=1).query("`Case Number` != 'Case Number'")
Case Number Style / Defendant File Date
3 08A575873 In the Matter of Petition for Compromise of Mi... 11/17/2008
7 08A575874 In the Matter of Petition for Compromise of Mi... 11/17/2008
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.