Issue
I am trying to scrape the release date and number of downloads from the below code
<p><i class="no-flip-over">Release date</i> : <span class="no-flip-over">2022-06-02</span></p>
<p><i class="no-flip-over">Downloads</i> : <span class="no-flip-over" data-times-funtouch="">703</span></p>
Here's is my function to scrape it
def phone_data(url):
r = requests.get(url)
sp = BeautifulSoup(r.text, 'lxml')
data = {
"Release_Date" : sp.select_one('i.no-flip-over').text.strip().replace('\n', ' '),
"Downloads" : sp.select_one('i.no-flip-over').text.strip().replace('\n', ' '),
}
print(data)
phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')
Here's my output:
{'Release_Date': '', 'Downloads': ''}
I am unable to see the values besides the keys in the dictionary
Solution
Solution provided by @QHarr I would also recommend in fact you know exactly about the facts to scrape, so this is just an alternative that comes from the other site and may fits title of the question a bit better
Simply iterate all specs and create a dict with key value pair:
data = dict(e.text.split(' : ',1) for e in sp.select('.msg h1 ~ p:has(i+span)'))
Sure you will scrape more as these two facts, but also get a very good overview about all the .keys()
maybe there are some with typos, ... and you can pick an adjust in post processing.
Example
import requests
from bs4 import BeautifulSoup
def phone_data(url):
r = requests.get(url)
sp = BeautifulSoup(r.text, 'lxml')
data = dict(e.text.split(' : ',1) for e in sp.select('.msg h1 ~ p:has(i+span)'))
return data
phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')
{'Release date': '2022-02-25',
'File size': '1.87M',
'Downloads': '3545',
'Support system': 'Windows'}
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.