Issue
I have been dealing with something, but it didn't work no matter what i tried. I need to use multiple replace function, howewer python allowed me to use it only one time.
It's my csv output. (https://i.stack.imgur.com/HtBSn.png)]
Firstly, there are values which seem as N/A. it has to be 0 or something, briefly, should be string.
Secondly, there are space in some countries name. Like North Macedonia it shouldn't be there.
`
import csv
import requests
from bs4 import BeautifulSoup
from csv import QUOTE_NONE
from csv import writer
response = requests.get('https://www.worldometers.info/coronavirus/#news').content
soup = BeautifulSoup(response,'lxml')
tbody=soup.find('table', id='main_table_countries_today').find('tbody').find_all('tr')[100:110]
with open('corona1.csv','w', newline='') as csv_file:
csv_writer = writer(csv_file, escapechar=' ', quoting=csv.QUOTE_NONE)
csv_writer.writerow(['countries','total_cases','total_deaths','total_recovered','active_cases','total_cases_in_1m','deaths_in_1m','population'])
for value in tbody:
countries = value.find_all('td')[1].text.replace(",", "").strip()
total_cases= value.find_all('td')[2].text.replace(",", "").strip()
total_deaths=value.find_all('td')[4].text.replace(",", "").strip()
total_recovered=value.find_all('td')[6].text.replace(",", "").strip()
active_cases=value.find_all('td')[8].text.replace(",", "").strip()
total_cases_in_1m=value.find_all('td')[10].text.replace(",", "").strip()
deaths_in_1m=value.find_all('td')[11].text.replace(",", "").strip()
population=value.find_all('td')[14].text.replace(",", "").strip()
csv_writer.writerow([countries,total_cases,total_deaths,total_recovered,active_cases,total_cases_in_1m,deaths_in_1m,population])
this is my current python code. what should i change?
i would like to have something like
total_recovered=value.find_all('td')[6].text.replace(",", "").replace("N/A","0").replace(" ","").strip()
Solution
Edit: I this code works for me. The repetitive work I excluded into a method and call it in the csv.writerow
import csv
import requests
from bs4 import BeautifulSoup
from csv import QUOTE_NONE
from csv import writer
response = requests.get('https://www.worldometers.info/coronavirus/#news').content
soup = BeautifulSoup(response,'lxml')
tbody=soup.find('table', id='main_table_countries_today').find('tbody').find_all('tr')[100:110]
replacement = {
",": "",
"N/A": "0",
"\n": "",
" ": ""
}
def cleanup(webcontent, indecies):
out = []
for index in indecies:
content = webcontent.find_all('td')[index].text
for k in [*replacement]:
content = content.replace(k,replacement[k])
out.append(content)
return out
with open('corona1.csv','w') as csv_file:
csv_writer = writer(csv_file, escapechar=' ', quoting=csv.QUOTE_NONE)
csv_writer.writerow(['countries','total_cases','total_deaths','total_recovered','active_cases','total_cases_in_1m','deaths_in_1m','population'])
for value in tbody:
csv_writer.writerow(cleanup(value, [1,2,4,6,8,10,11,14]))
Note: If you try to open the file in excel it is not correct formatted but for most other Programs and Apis it is. You have to change the separator in excel. Have a look here Import or export text(.txt or .csv) in Excel.
Answered By - JPudel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.