Issue
Hi everyone i am parsing an html doc with beautifulsoup. However, one area of information I cant seem to parse:
the html:
<small>
<span class="label label-primary">CVE-2019-11198</span>
<span class="label label-warning">6.1 - Medium</span>
- August 05, 2019
</small>
I am parsing this whole block, but want to parse the CVE-2019-11198
, 6.1
, Medium
, and August 05, 2019
as separate values. Instead im getting the whole block under <small>
with the following code:
original:
cves=soup.find_all("div", class_="cve_listing")
for cve in cves:
#CVE, vuln numeric rating, vuln sev cat, vuln date
vulninfo=cve.find("small").text
updated:
cves=soup.find_all("div", class_="cve_listing")
for cve in cves:
vulncve=cve.find("span", class_="label-primary")
vulninfo=cve.select_one('span.label').parent
vulninfores=[x.get_text(strip=True) for x in vulninfo.contents if len(x.text) > 1]
outputs:
AttributeError: 'NavigableString' object has no attribute 'text'
any thoughts on how to parse this efficiently?
Solution
You need a bit modify your question.
You have selected
"div", class_="cve_listing"
but didn't show the htmlYou can't invoke get_text() and contents method at the same time. Try the below code:
Example:
cves=soup.find_all("div", class_="cve_listing")
for cve in cves:
vulncve=cve.find("span", class_="label-primary")
vulninfo=cve.select_one('span.label')
vulninfores=[x.get_text(strip=True) for x in soup.select(".cve_listing small")][-1]
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.