Sunday, January 7, 2024

[FIXED] How to remove \xa0 from soup in beautifulsoup python

January 07, 2024 beautifulsoup, encoding, parsing, python No comments

Issue

I am currently using Beautifulsoup to parse the HTML code of a webpage.

To get the text from an element, I use the ".text" attribute:

soup.find('p', {'class': 'example'}).text

But the problem is that sometimes I get "\xa0" in the result:

"some text «\xa0text\xa0»"

I tried using the "replace" function:

soup = BeautifulSoup(driver.page_source.replace('\xa0', ' '), "lxml")

NOTE: I don't want to have to use a function for every single string I parse, I would like to have the soup already purged from those characters from the beginning.

Solution

The problem is that the HTML source probably contains  , not the literal \xa0. Try replacing that instead, or as well.

soup = BeautifulSoup(
    driver.page_source.replace(
        '&nbsp;', ' ').replace('\xa0', ' '), "lxml")

Answered By - tripleee

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 7, 2024

[FIXED] How to remove \xa0 from soup in beautifulsoup python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels