Issue
I wanted to scrape date table from the different html webpages into csv file but dates are importing into encoded format
I am using beautiful soup with python 3 , also opening the file with encoding utf-8 for html pages. i am trying to import the table from the page https://www.timeanddate.com/holidays/india/2010
Sample code :
rows = table.find_all('tr')
csvFile = open("test12.csv","w+", newline='', encoding = "utf-8")
try:
writer=csv.writer(csvFile)
for row in rows:
csvRow = []
for cell in row.findAll(['td','th']):
csvRow.append(cell.get_text())
writer.writerow(csvRow)
I am getting following result. dates are not importing in proper format
Date Â
1 जनवरी रविवार 5 जनवरी गà¥à¤°à¥à¤µà¤¾à¤° 14 जनवरी शनिवार 15 जनवरी रविवार 23 जनवरी सोमवार 26 जनवरी गà¥à¤°à¥à¤µà¤¾à¤° 28 जनवरी शनिवार
Solution
Let Pandas do all that work:
import pandas as pd
url = 'https://www.timeanddate.com/holidays/india/2010'
# Gets all tables from site and stores as list of dataframes
table = pd.read_html(url)
# Get the dataframe in index position 0
table = table[0]
# Drop the rows with nulls
table = table.dropna(axis=0)
# Write to file
table.to_csv('file.csv', index=False)
And this can be condensed into 1 line:
pd.read_html('https://www.timeanddate.com/holidays/india/2010')[0].dropna(axis=0).to_csv('C:/file.csv', index=False)
Output:
print (table.head(10).to_string())
Date Unnamed: 1_level_0 Name Type
Date Unnamed: 1_level_1 Name Type
0 Jan 1 Friday New Year's Day Restricted Holiday
1 Jan 5 Tuesday Guru Govind Singh Jayanti Restricted Holiday
2 Jan 14 Thursday Pongal Restricted Holiday
3 Jan 20 Wednesday Vasant Panchami Restricted Holiday
4 Jan 26 Tuesday Republic Day Gazetted Holiday
6 Feb 8 Monday Maharishi Dayanand Saraswati Jayanti Restricted Holiday
7 Feb 12 Friday Maha Shivaratri/Shivaratri Gazetted Holiday
8 Feb 14 Sunday Chinese New Year Observance
9 Feb 14 Sunday Valentine's Day Observance
10 Feb 19 Friday Shivaji Jayanti Restricted Holiday
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.