Sunday, January 14, 2024

[FIXED] Web Scraping Help - How can I convert to a CSV?

January 14, 2024 beautifulsoup, python, web-scraping No comments

Issue

I'm trying to webscrape some statistics for a school project, and need to put them into table form for analysis. I've been able to pull the required data, but have been unable to find a way to convert to a clean table. Essentially, the ideal output would be a CSV with the following columns, with a row for each "table" group, see below:

i.e., Col 1 = "id", Col 2 = "game_id" ... up to Col 45, and then 1 row for each block of data you can see in the devtool network response.

I'm very much new to python and not a coder by trade, so any guidance would be appreciated - I've pretty much read every post related to this topic and haven't been able to figure it out. Thank you in advance! :)

See below for code so far. When I try using .csv or anything, it says that the data is a "None" attribute. Have gotten to the point where I can pull the data, but am unable to get it in the table form above.

import requests
from bs4 import BeautifulSoup

URL = <URL HERE>

headers = {
        'value': '*/*',
        'accept': '*/*',
        'cookie': 'cookie-consent=true'
}

page = requests.get(URL, headers = headers)
soup = BeautifulSoup(page.content, 'html.parser')

Solution

You can try something like this

import json

URL = 'https://www.breakingpoint.gg/api/trpc/playerStats.fetchPlayerStatsByGame,modes.fetchModes,playerStats.fetchPlayerStatsByGame,playerStats.fetchPlayerStatsByGame,playerStats.fetchPlayerStatsByGame,games.fetchTeamsGameHistory,search.findAll?batch=1&input=%7B%220%22%3A%7B%22json%22%3A%7B%22gameId%22%3A%2269c37b74-5bbc-45e6-ac2c-6ea8b5270ce4%22%7D%7D%2C%221%22%3A%7B%22json%22%3Anull%2C%22meta%22%3A%7B%22values%22%3A%5B%22undefined%22%5D%7D%7D%2C%222%22%3A%7B%22json%22%3A%7B%22gameId%22%3A%22167d90bb-b1d7-4c94-9653-2ecd3f946635%22%7D%7D%2C%223%22%3A%7B%22json%22%3A%7B%22gameId%22%3A%2235c9700c-57bb-4a34-a775-5a14faaf41ae%22%7D%7D%2C%224%22%3A%7B%22json%22%3A%7B%22gameId%22%3A%22153f7784-d4e1-4f3d-9815-5c0e02a6d0c3%22%7D%7D%2C%225%22%3A%7B%22json%22%3A%7B%22team1Id%22%3A11%2C%22team2Id%22%3A4%7D%7D%2C%226%22%3A%7B%22json%22%3A%7B%22searchTerm%22%3A%22%22%7D%7D%7D'

headers = {
    'value': '*/*',
    'accept': '*/*',
    'cookie': 'cookie-consent=true'
}

response = requests.get(URL, headers=headers)
data = response.json()

import pandas as pd
# Use pandas to normalize the JSON data into a DataFrame
df = pd.json_normalize(data)

# Explode the 'result.data.json' column to have one row per entry
df = df.explode("result.data.json")

# Normalize the exploded data further
normalized_data = pd.json_normalize(df['result.data.json'])

# Print the resulting normalized data
print(normalized_data)

The pd.json_normalize() function is used to flatten the nested JSON structure in the data dictionary and convert it into a pandas DataFrame (df). This creates a DataFrame with columns corresponding to the keys in the JSON data.

The explode() function is used to transform the DataFrame by expanding the rows based on the values in the specified column (result.data.json). This is done to handle cases where the column contains lists or nested structures, creating a new row for each element in the list.

Answered By - kabooya

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 14, 2024

[FIXED] Web Scraping Help - How can I convert to a CSV?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels