Issue
I am trying to get the information from here
https://www.philips.com/a-w/security/security-advisories
I want each category article to be assigned to a dataframe
So for example First article-->first row of the datafrane, second article -->second row of the datafrane..etc
For the beginning I am trying the following code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from requests.models import ContentDecodingError
url = "https://www.philips.com/a-w/security/security-advisories"
# get the html content of the url
html_content = requests.get(url).text
# parse the html content
soup = BeautifulSoup(html_content, 'html.parser')
span_content = []
for span in soup.find_all("span", class_="p-body-copy-02"):
span_content.append(span.text)
The span_content
contains information in the following form where each new category is starting after the publication and update date fields:
['Publication Date:\xa02022 August 25',
'Update Date: 2022 August 25',
'Philips is currently monitoring... specific to their Philips’ products.',
Publication Date:\xa02022 August 18',
'Update Date: 2022 August 18',
'Philips is currently monitoring...specific to their Philips’ products.',
etc]
I am trying the following code to get rid of the publication date and update date:
def delete_date(span_content):
for i in range(len(span_content)):
if span_content[i] == 'Publication Date:' or span_content[i] == 'Update Date:':
span_content.pop(i)
break
return span_content
delete_date(span_content)
However this is working.
So how do I get rid of the publication date and update date and cast the information into a dataset?
index Info
0 Philips is currently monitoring... specific to their Philips’ products
1 Philips is currently monitoring...specific to their Philips’ products
... ...
etc etc
Solution
The following should work on your setup:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
url = 'https://www.philips.com/a-w/security/security-advisories'
big_list = []
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
parent_div = soup.select_one('dl.p-accordion')
titles = parent_div.select('dt')
for t in titles:
advisory = t.find_next_sibling('dd')
try:
advisory.find('span', string = re.compile("Publication Date")).decompose()
advisory.find('span', string = re.compile("Update Date")).decompose()
except Exception as e:
try:
advisory.find('strong', string = re.compile("Publication Date")).decompose()
advisory.find('strong', string = re.compile("Update Date")).decompose()
except Exception as e:
advisory.find('p', string = re.compile("Publication Date")).decompose()
advisory.find('p', string = re.compile("Update Date")).decompose()
big_list.append((t.text, advisory.get_text(strip=True)))
df = pd.DataFrame(big_list, columns = ['Title', 'Description'])
print(df)
Result:
Title Description
0 Realtek Advisory (CVE-2022-27255) - (2022 August 25) Philips is currently monitoring developments and updates related to the Realtek AP-Router SDK Advisory (CVE-2022-27255). Realtek has confirmed that their eCos SDK-based routers, the ‘SIP ALG’ module is vulnerable to buffer overflow.Successful execution of this vulnerability could allow a crash or achieve the remote execution code. Realtek has released a patch that remediate this vulnerability.At this time, no Philips products are known to be impacted. In accordance with Philips’ Global Security Policy, Philips continues to analyze the matter, and further information will be posted on the Philips Product Security Advisory page as appropriate. Philips is committed to ensuring the safety, security, integrity, and regulatory compliance of our products to be deployed and to operate within Philips approved product specifications. Therefore, in accordance with Philips’s policy and regulatory requirements, all changes of configuration or software to Philips’ products (including operating system security updates and patches) may be implemented only in accordance with Philips’s product-specific, verified & validated, authorized, and communicated customer procedures or field actions. If a product does require operating system security updates, configuration changes, or other actions to be taken by our customer or by Philips Customer Services, product-specific service documentation will be produced by Philips’s product teams and made available to Philips service delivery platforms such as the Philips InCenter Customer Portal.Contract-entitled customers may use Philips InCenter and are encouraged to request Philips InCenter access and reference product-specific information posted. If customers still have questions, all customers (contract-entitled or otherwise) are encouraged to contact their local service support team or regional product service support as appropriate for up-to-date information specific to their Philips’ products.
1 Cisco Advisory (CVE-2022-20866) - (2022 August 18) Philips is currently monitoring developments and updates related to the recently released Ciscoadvisory. Cisco has confirmed a critical vulnerability (CVE-2022-20866) exists in the handling of RSA keys on devices running Adaptive Security Appliance (ASA) Software and Firepower Threat Defense (FTD) Software.Successful execution of this vulnerability could allow an unauthenticated, remote attacker to retrieve an RSA private key. Cisco has released software updates that help remediate this vulnerability.At this time, no Philips products are known to be impacted. In accordance with Philips’ Global Security Policy, Philips continues to analyze the matter, and further information will be posted on the Philips Product Security Advisory page as appropriate.Philips is committed to ensuring the safety, security, integrity, and regulatory compliance of our products to be deployed and to operate within Philips approved product specifications. Therefore, in accordance with Philips’s policy and regulatory requirements, all changes of configuration or software to Philips’ products (including operating system security updates and patches) may be implemented only in accordance with Philips’s product-specific, verified & validated, authorized, and communicated customer procedures or field actions.If a product does require operating system security updates, configuration changes, or other actions to be taken by our customer or by Philips Customer Services, product-specific service documentation will be produced by Philips’s product teams and made available to Philips service delivery platforms such as the Philips InCenter Customer Portal.Contract-entitled customers may use Philips InCenter and are encouraged to request Philips InCenter access and reference product-specific information posted. If customers still have questions, all customers (contract-entitled or otherwise) are encouraged to contact their local service support team or regional product service support as appropriate for up-to-date information specific to their Philips’ products.
[...]
HedgeHog's solution is more elegant tho, and it should work if you would install/update soupsieve. Documentation for soupsieve: https://facelessuser.github.io/soupsieve/
And for BeautifulSoup: https://beautiful-soup-4.readthedocs.io/en/latest/index.html
Answered By - platipus_on_fire
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.