Issue
from bs4 import BeautifulSoup
# current output as below
"""
'DOMINGUEZ, JONATHAN D. VS. RAMOS,\n
SILVIA M'
"""
# desired one is
# DOMINGUEZ, JONATHAN D. VS. RAMOS, SILVIA M
x = """<td width="350px" valign="top"
style="padding:.5rem;">
DOMINGUEZ, JONATHAN D. VS. RAMOS,
SILVIA M
</td>"""
soup = BeautifulSoup(x, 'lxml')
print(soup.select_one('td').get_text(strip=True, separator='\n'))
I checked the docs and I believe that get_text()
can do that but am not sure how!
Solution
You might need a regular expression, this could also get rid of extra spaces:
from bs4 import BeautifulSoup
import re
x = """<td width="350px" valign="top"
style="padding:.5rem;">
DOMINGUEZ, JONATHAN D. VS. RAMOS,
SILVIA M
</td>"""
soup = BeautifulSoup(x, 'lxml')
text = re.sub(r'\s+', ' ', soup.select_one('td').get_text(strip=True))
print(text)
Giving:
DOMINGUEZ, JONATHAN D. VS. RAMOS, SILVIA M
Answered By - Martin Evans
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.