Issue
I am scraping a website and it breaks the page up into multiple <p>
tags. How can join the together into one string?
article_wrapper = article.find('div', class_ = 'column column--full article__content')
article_content = article_wrapper.find_all('p')
for element in article_content:
print(element.text)# prints article content
I tried to make a data = []
and then append the element each time but that doesn't seem to work.
data = []
for element in article_content:
data.append(element)
print(element.text)# prints article content
print('DATA')
print(data)
I want it all in one string so that I can pass it through together instead of separate.
Solution
You were basically there. You printed the text of the tag, but appended the whole tag to your list. If you appended element.text
, you would have had it. If you want it as one string instead of a list of strings, you can do this:
txt = ""
for article in article_content:
if not element.find('a'): # to filter out the extra text
txt += article.text
If you want to keep each paragraph separate in a list, just change your append line to this:
data = []
for element in article_content:
if not element.find('a'): # to filter out the extra text
data.append(element.text)
Answered By - goalie1998
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.