Issue
I am currently using Beautifulsoup to parse the HTML code of a webpage.
To get the text from an element, I use the ".text" attribute:
soup.find('p', {'class': 'example'}).text
But the problem is that sometimes I get "\xa0"
in the result:
"some text «\xa0text\xa0»"
I tried using the "replace" function:
soup = BeautifulSoup(driver.page_source.replace('\xa0', ' '), "lxml")
NOTE: I don't want to have to use a function for every single string I parse, I would like to have the soup already purged from those characters from the beginning.
Solution
The problem is that the HTML source probably contains
, not the literal \xa0
. Try replacing that instead, or as well.
soup = BeautifulSoup(
driver.page_source.replace(
' ', ' ').replace('\xa0', ' '), "lxml")
Answered By - tripleee
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.