Issue
I'm still new to Python and thanks to everyone for the earlier help. I am trying to parse a webscraped bs4 element with no tables into a df. The data I need is identified as 'pre'. I thought using read_html with the right attributes would work, but I'm getting a None value from the bs4 element.
Code:
headers= {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
url = 'https://www.usbr.gov/pn-bin/instant.pl?parameter=CHRO%20q&syer=2022&smnth=7&sdy=12&eyer=2022&emnth=7&edy=19&format=2'
response = requests.get(url) #reply from website
soup = BeautifulSoup(response.text, 'html5lib')#html data from the website, parsed in lxml by beautifulsoup
data = soup.select('pre')[1]#selects second block of 'pre' - containing the needed data
#print(data.text.strip())#prints the data
input= pd.read_html(data, attrs = {'pre':'table'})#reads html data
df1=pd.DataFrame(input, index=None,)
Solution
Trying to use StringIO
method with pandas read_csv()
import io
from bs4 import BeautifulSoup
import requests
import pandas as pd
headers= {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
url = 'https://www.usbr.gov/pn-bin/instant.pl?parameter=CHRO%20q&syer=2022&smnth=7&sdy=12&eyer=2022&emnth=7&edy=19&format=2'
response = requests.get(url) #reply from website
soup = BeautifulSoup(response.text, 'html5lib')#html data from the website, parsed in lxml by beautifulsoup
data = soup.select('pre')[1]#selects second block of 'pre' - containing the needed data
#print(data.text.strip())#prints the data
df = pd.read_csv(io.StringIO(data.text))
df = df.xs('BEGIN DATA', axis=1, drop_level=True)
print(df.iloc[:-1])
Output:
DATE TIME CHRO Q
07/12/2022 00:00 10.60
07/12/2022 00:15 10.60
07/12/2022 00:30 10.60
07/12/2022 00:45 10.60
...
07/19/2022 22:45 9.36
07/19/2022 23:00 9.36
07/19/2022 23:15 9.36
07/19/2022 23:30 9.36
07/19/2022 23:45 9.36
Length: 769
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.