Issue
I am trying to extract a specific part of a html text via beautifulsoup.
Although I can already produce a result its not the way I intend it to be.
Here is a snippet from what i would like to capture:
<span class="dotted produktPreview" data-toggle="popover" data-placement="left" title="" data-content="0.34 $/100 g" data-original-title="Baseprice">0.87</span>
So I am Using BeautifulSoup and my code so far is this:
import lxml
import requests
from bs4 import BeautifulSoup
URL = "1234"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'lxml')
produktPreview = soup.find_all('span', attrs={'class':'produktPreview'})
print(produktPreview)
So I basically want to get one level "deeper" into the span classes but I dont get it done, the result is more or less the exact line I pasted in the beginning,
The expected output or what iam looking for is that i want to put 2 tags/data into a dataframe. The tags/data are:
0.34 $/100 g
0.87
Which i want to extract from the html text, i just dont know how to parse it for soup.findall to extract them for me
thank you all for your help, rued
Solution
produktPreview
gives you a list.So pick the fist element and access by attribute.
print(produktPreview[0]["data-content"])
print(produktPreview[0].text)
If you have many span
tag use for loop
for item in produktPreview:
print(item["data-content"])
print(item.text)
Answered By - Gnanavel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.