Issue
I have a dictionary in a JSON file and I have loaded that JSON file in my Jupyter Notebook. That dictionary has 3 keys
- the first key is the stats key which is the basic stats about the dictionary
- the second key is named questions which is about the questions that were asked in the survey
- the third key is the responses which are the answers to the questions.
My problem is that I have converted the JSON file into DataFrame but the column names which are all the questions asked are not in clean form.My DataFrame Columns
I want clean questions.
import numpy as np
import pandas as pd
import json
# Load Json File
filepath = "C:/Users/osmi-survey-2016_1479139902.json"
with open(filepath,"r") as openFile:
my_json_file_health = json.load(openFile)
# Extract questions and responses
questions = my_json_file_health.get("questions", [])
responses = my_json_file_health.get("responses", [])
# Create a list of dictionaries for responses with column names as keys
response_dicts = [{question["question"]: response["answers"].get(question["id"], None) for question in questions}
for response in responses]
# Convert the list of response dictionaries to a DataFrame
responses_df = pd.DataFrame(response_dicts)
responses_df
Solution
Here you go:
import re, html
tag_re = re.compile(r'(<!--.*?-->|<[^>]*>)')
for col in df.columns:
df.rename(columns={col: html.escape(tag_re.sub('', col))}, inplace=True)
Answered By - gtomer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.