Issue
I have problem to create CSV file from URL,https://rog.asus.com/motherboards/rog-maximus/rog-maximus-xi-formula-model/spec/. I need to download from page specs about motherboards, for now i have only header generated in python, but I dont know how to download itemcontent. This is what I got to create headers in CSV.
import requests
from bs4 import BeautifulSoup
import csv
# Wskazany link
url = "https://rog.asus.com/pl/motherboards/rog-maximus/rog-maximus-xi-formula-model/spec/"
# Pobierz zawartość strony
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Znajdź wszystkie elementy <div> z odpowiednią klasą
spec_elements = soup.find_all('div', class_='ProductSpecSingle__productSpecItemRow__3sjMJ')
# Nagłówek do zapisania w pliku CSV
header = ['Procesor', 'Chipset', 'Pamięć', 'Grafika', 'Obsługa Multi-GPU',
'Gniazda rozszerzeń', 'Magazyn danych', 'Sieć LAN', 'Bezprzewodowa sieć']
# Słownik do przechowywania danych specyfikacyjnych
data_dict = {key: '' for key in header}
# Dla każdego elementu <div>
for spec_element in spec_elements:
# Znajdź element <h2> z odpowiednią klasą
spec_title_element = spec_element.find('h2', class_='ProductSpecSingle__productSpecItemTitle__8gSrN')
# Znajdź element <span> z odpowiednią klasą
spec_value_element = spec_element.find('span', class_='ProductSpecSingle__descriptionItemValue__lVa0O')
# Sprawdź, czy elementy zostały znalezione i czy zawierają tekst
if spec_title_element and spec_value_element and spec_value_element.text.strip():
# Pobierz tekst z elementów
spec_title = spec_title_element.text.strip()
spec_value = spec_value_element.text.strip()
# Sprawdź, czy nagłówek jest w naszej liście nagłówków
for header_name in header:
if header_name.lower() in spec_title.lower():
data_dict[header_name] = spec_value
# Nazwa pliku CSV
csv_filename = 'ASUS/specyfikacja_plyty_glownej.csv'
# Zapisz do pliku CSV z kodowaniem UTF-8
with open(csv_filename, mode='w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
# Zapisz nagłówek
writer.writerow(header)
# Zapisz dane
writer.writerow([data_dict[header_name] for header_name in header])
print(f'Utworzono plik CSV: {csv_filename} z danymi specyfikacji.')
I need to generate CSV with all detalis of Tech Specs using python.
Solution
Two main issues here - Check your response
/ soup
seems that the website is blocking requests, so add a user-agent:
response = requests.get(url, headers={'user-agent':'some-agent'})
Second, the content is rendered dynamically by javascript
, what is not support by requests
- So there are two options:
You know the internal id of the product, use the api to request additional information.
Extract the information in the script section of the footer and convert it via
json.loads()
and interact it like adict
to pick jour information:data = json.loads(json.loads(re.search(r'JSON.parse.*("{.*?}")', response.text).group(1)))
Example
import requests, csv, re, json
# Wskazany link
url = "https://rog.asus.com/pl/motherboards/rog-maximus/rog-maximus-xi-formula-model/spec/"
response = requests.get(url, headers={'user-agent':'some-agent'})
data = json.loads(json.loads(re.search(r'JSON.parse.*("{.*?}")', response.text).group(1)))
specs = {e.get('Display_field'):' '.join([d.get('Display_description') for d in e.get('Description')]) for e in data.get('Spec').get('spec')[0].get('Spec_content')}
with open('zzz_my_result.csv', 'w', newline='') as f:
w = csv.writer(f)
w.writerow(specs.keys())
w.writerow(specs.values())
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.