Friday, November 11, 2022

[FIXED] Using multiple replace function in Beautifulsoup

November 11, 2022 beautifulsoup, python No comments

Issue

I have been dealing with something, but it didn't work no matter what i tried. I need to use multiple replace function, howewer python allowed me to use it only one time.

It's my csv output. (https://i.stack.imgur.com/HtBSn.png)]

Firstly, there are values which seem as N/A. it has to be 0 or something, briefly, should be string.

Secondly, there are space in some countries name. Like North Macedonia it shouldn't be there.

import csv
import requests
from bs4 import BeautifulSoup
from csv import QUOTE_NONE
from csv import writer


response = requests.get('https://www.worldometers.info/coronavirus/#news').content

soup = BeautifulSoup(response,'lxml')

tbody=soup.find('table', id='main_table_countries_today').find('tbody').find_all('tr')[100:110]

with open('corona1.csv','w', newline='') as csv_file:
    csv_writer = writer(csv_file, escapechar=' ', quoting=csv.QUOTE_NONE)
    csv_writer.writerow(['countries','total_cases','total_deaths','total_recovered','active_cases','total_cases_in_1m','deaths_in_1m','population'])



    for value in tbody:
            countries = value.find_all('td')[1].text.replace(",", "").strip()
            total_cases= value.find_all('td')[2].text.replace(",", "").strip()
            total_deaths=value.find_all('td')[4].text.replace(",", "").strip()
            total_recovered=value.find_all('td')[6].text.replace(",", "").strip()
            active_cases=value.find_all('td')[8].text.replace(",", "").strip()
            total_cases_in_1m=value.find_all('td')[10].text.replace(",", "").strip()
            deaths_in_1m=value.find_all('td')[11].text.replace(",", "").strip()
            population=value.find_all('td')[14].text.replace(",", "").strip()


            csv_writer.writerow([countries,total_cases,total_deaths,total_recovered,active_cases,total_cases_in_1m,deaths_in_1m,population])

this is my current python code. what should i change?

i would like to have something like

total_recovered=value.find_all('td')[6].text.replace(",", "").replace("N/A","0").replace(" ","").strip()

Solution

Edit: I this code works for me. The repetitive work I excluded into a method and call it in the csv.writerow

import csv
import requests
from bs4 import BeautifulSoup
from csv import QUOTE_NONE
from csv import writer


response = requests.get('https://www.worldometers.info/coronavirus/#news').content

soup = BeautifulSoup(response,'lxml')

tbody=soup.find('table', id='main_table_countries_today').find('tbody').find_all('tr')[100:110]

replacement = {
    ",": "",
    "N/A": "0",
    "\n": "",
    " ": ""
}

def cleanup(webcontent, indecies):
    out = []
    for index in indecies:
        content = webcontent.find_all('td')[index].text
        for k in [*replacement]:
            content = content.replace(k,replacement[k])
        out.append(content)
    return out
     
with open('corona1.csv','w') as csv_file:
    csv_writer = writer(csv_file, escapechar=' ', quoting=csv.QUOTE_NONE)
    csv_writer.writerow(['countries','total_cases','total_deaths','total_recovered','active_cases','total_cases_in_1m','deaths_in_1m','population'])

    for value in tbody:
        csv_writer.writerow(cleanup(value, [1,2,4,6,8,10,11,14]))

Note: If you try to open the file in excel it is not correct formatted but for most other Programs and Apis it is. You have to change the separator in excel. Have a look here Import or export text(.txt or .csv) in Excel.

Answered By - JPudel

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, November 11, 2022

[FIXED] Using multiple replace function in Beautifulsoup

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels