Issue
I am a python / beautifulsoup newbie here.
I am trying to get an attribute value within the <option> tag. The HTML snippet is below. Specifically, I am trying to retrieve the value from the first "data-inventory-quantity (in this case, 60).
import requests
import bs4
import lxml
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
import csv
def getTitle(soup):
return soup.find('title').text
def getInventory(soup):
def getPrice(soup):
return soup.find("meta", {"property" : "og:price:amount"}).attrs['content']
urlList = []
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Title', 'Inventory', 'Price'])
for url in urlList:
try:
html = urlopen(url)
except HTTPError as e:
print(e)
except URLError:
print("error")
else:
soup = bs4.BeautifulSoup(html.read(), 'html.parser')
row = [getTitle(soup), getInventory(soup), getPrice(soup)]
print(row)
csv_output.writerow(row)
However, as I need to run this against multiple URLs with each having a unique "value", I cannot figure out how to edit my code so that I do not need to use this specific option "value". I have tried to soup.find a higher level tag, e.g. "soup.find('select', id = 'variant-listbox')['data-inventory-quantity']" but that gives me a "KeyError: 'data-inventory-quantity'". Is there any way to find the data-inventory-quantity when all the other attribute values within this option tag differ for each URL?
HTML:
<option
data-sku=""
selected="selected" value="40323576791107"
data-inventory-quantity="60"
>
Regular - $75.00
</option>
<option
data-sku=""
value="40323576823875"
data-inventory-quantity="4"
>
Variant - $100.00
</option>
</select>
</div>'''
Solution
Try:
from bs4 import BeautifulSoup
html_doc = '''\
<div class="variants ">
<select id="variant-listbox" name="id" class="medium">
<option
data-sku=""
selected="selected" value="40323576791107"
data-inventory-quantity="60"
>
Regular - $75.00
</option>
<option
data-sku=""
value="40323576823875"
data-inventory-quantity="4"
>
Variant - $100.00
</option>
</select>
</div>'''
soup = BeautifulSoup(html_doc, 'html.parser')
o = soup.select_one('option[data-inventory-quantity]')
print(o['data-inventory-quantity'])
Prints:
60
If you want to selecte the selected option:
o = soup.select_one('option[data-inventory-quantity][selected]')
print(o['data-inventory-quantity'])
EDIT: To have getInventory(soup)
function:
def getInventory(soup):
o = soup.select_one('option[data-inventory-quantity]')
return o['data-inventory-quantity']
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.