Issue
This is the answer that scrapes a particular section of an article on a website.
soup.find("div", {"id": "content_wrapper"}).text
I am supposed to replace each new line ('\n') in the body text with a whitespace (' '). I have done this with -soup.find("div", {"id": "content_wrapper"}).text.replace("\n", " ").strip()
But I still need to replace each of the '\xa0' and '\u200a' strings in the body text with a whitespace (' ') and Strip out all leading and trailing whitespaces.
How do I do this please?
Thank you!
Solution
You just can add new replace methods after a replace method.
text = soup.find('div', {'id': 'content_wrapper'}).text
modified_text = text.replace('\n', ' ').replace('\xa0', ' ').replace('\u200a', ' ').strip()
If I understood correctly you want to remove these whitespaces too. Then, you shouldn't replace the words with whitespace " ". You should replace them with empty string, "".
text = soup.find('div', {'id': 'content_wrapper'}).text
modified_text = text.replace('\n', '').replace('\xa0', '').replace('\u200a', '').strip()
Answered By - f1nch
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.