Friday, December 1, 2023

[FIXED] How to webscrape url and store data in CSV?

December 01, 2023 beautifulsoup, csv, python, python-requests, web-scraping No comments

Issue

I have problem to create CSV file from URL,https://rog.asus.com/motherboards/rog-maximus/rog-maximus-xi-formula-model/spec/. I need to download from page specs about motherboards, for now i have only header generated in python, but I dont know how to download itemcontent. This is what I got to create headers in CSV.

import requests
from bs4 import BeautifulSoup
import csv

# Wskazany link
url = "https://rog.asus.com/pl/motherboards/rog-maximus/rog-maximus-xi-formula-model/spec/"

# Pobierz zawartość strony
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Znajdź wszystkie elementy <div> z odpowiednią klasą
spec_elements = soup.find_all('div', class_='ProductSpecSingle__productSpecItemRow__3sjMJ')

# Nagłówek do zapisania w pliku CSV
header = ['Procesor', 'Chipset', 'Pamięć', 'Grafika', 'Obsługa Multi-GPU',
          'Gniazda rozszerzeń', 'Magazyn danych', 'Sieć LAN', 'Bezprzewodowa sieć']

# Słownik do przechowywania danych specyfikacyjnych
data_dict = {key: '' for key in header}

# Dla każdego elementu <div>
for spec_element in spec_elements:
    # Znajdź element <h2> z odpowiednią klasą
    spec_title_element = spec_element.find('h2', class_='ProductSpecSingle__productSpecItemTitle__8gSrN')
    
    # Znajdź element <span> z odpowiednią klasą
    spec_value_element = spec_element.find('span', class_='ProductSpecSingle__descriptionItemValue__lVa0O')
    
    # Sprawdź, czy elementy zostały znalezione i czy zawierają tekst
    if spec_title_element and spec_value_element and spec_value_element.text.strip():
        # Pobierz tekst z elementów
        spec_title = spec_title_element.text.strip()
        spec_value = spec_value_element.text.strip()
        
        # Sprawdź, czy nagłówek jest w naszej liście nagłówków
        for header_name in header:
            if header_name.lower() in spec_title.lower():
                data_dict[header_name] = spec_value

# Nazwa pliku CSV
csv_filename = 'ASUS/specyfikacja_plyty_glownej.csv'

# Zapisz do pliku CSV z kodowaniem UTF-8
with open(csv_filename, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    
    # Zapisz nagłówek
    writer.writerow(header)
    
    # Zapisz dane
    writer.writerow([data_dict[header_name] for header_name in header])

print(f'Utworzono plik CSV: {csv_filename} z danymi specyfikacji.')

I need to generate CSV with all detalis of Tech Specs using python.

Solution

Two main issues here - Check your response / soup seems that the website is blocking requests, so add a user-agent:

response = requests.get(url, headers={'user-agent':'some-agent'})

Second, the content is rendered dynamically by javascript, what is not support by requests - So there are two options:

You know the internal id of the product, use the api to request additional information.
Extract the information in the script section of the footer and convert it via json.loads() and interact it like a dictto pick jour information:
```
data = json.loads(json.loads(re.search(r'JSON.parse.*("{.*?}")', response.text).group(1)))
```

Example

import requests, csv, re, json

# Wskazany link
url = "https://rog.asus.com/pl/motherboards/rog-maximus/rog-maximus-xi-formula-model/spec/"

response = requests.get(url, headers={'user-agent':'some-agent'})

data = json.loads(json.loads(re.search(r'JSON.parse.*("{.*?}")', response.text).group(1)))

specs = {e.get('Display_field'):' '.join([d.get('Display_description') for d in e.get('Description')]) for e in data.get('Spec').get('spec')[0].get('Spec_content')}

with open('zzz_my_result.csv', 'w', newline='') as f:
    w = csv.writer(f)
    w.writerow(specs.keys())
    w.writerow(specs.values())

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, December 1, 2023

[FIXED] How to webscrape url and store data in CSV?

Issue

Solution

Example

0 comments:

Post a Comment

Popular Posts

Labels