Issue
I'm looking to extract the two numeric value from this bs4.
forecast = [<div class="cell "><span>1.2</span><span class="m-unit"></span> - <span>2.0</span><span class="m-unit"></span></div>,
<div class="cell "><span>1.5</span><span class="m-unit"></span> - <span>2.6</span><span class="m-unit"></span></div>,
Do you know how to integrate them directly into a dataframe?
forecast[1].contents[3]
But is not robust to extract all the numerical values from the forecast bs4 elements
Solution
If the pattern is always identical and no other deviations occur, the following procedure can be followed:
pd.DataFrame([e.text.split('-') for e in forcast])
Note: For reliable results, more detailed information is needed in the questionnaire.
Example
from bs4 import BeautifulSoup
import pandas as pd
html = '''<div class="cell "><span>1.2</span><span class="m-unit"></span> - <span>2.0</span><span class="m-unit"></span></div>
<div class="cell "><span>1.5</span><span class="m-unit"></span> - <span>2.6</span><span class="m-unit"></span></div>'''
soup = BeautifulSoup(html)
forcast = soup.select('div')
pd.DataFrame([e.text.split('-') for e in forcast])
Output
0 | 1 | |
---|---|---|
0 | 1.2 | 2 |
1 | 1.5 | 2.6 |
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.