Issue
I have this project where I pull a json from gitlab, filter its contents, and then spit out a csv report. I've been going in circles putting in different solutions, and I did get it to work on my personal PC using two different methods, but neither work in the gitlab environment.
First issue: The dataset I'm pulling is rather peculiar in that its 2d:
{'id':0, 'entity_type':'Project', 'details':{'author_name':'Billy', 'author_id':02}, 'created_at':10242022}
which translates to:
id | entity_type | details | created_at |
---|---|---|---|
0 | Project | {'author_name':'Billy', 'author_id':02} | 10-24-2022 |
What I've been trying to do is flatten it into this:
id | entity_type | author_name | author_id | created_at |
---|---|---|---|---|
0 | Project | 'Billy' | 02 | 10-24-2022 |
What I have so far is the following, which works on my personal PC, but for some reason not the gitlab:
import pandas as pd
#pull data from the gitlab
df = pd.DataFrame(data)
sub_df = df.details.apply(pd.Series) #pull out details column
sub_df = pd.concat({'details':sub_df}, axis=1, names=['l1','l2']) #
df = pd.concat({'':df}, axis=1, names =['l1','l2'])
df = pd.concat((df, sub_df), axis=1)
#code to delete unnecessary data from df.
Now, on my PC when I run this code, pd.Series helps turn the 'details' column into multiple columns of the headers within. However, when I run it on the GitLab, it instead separates into multiple columns of 'details' with each value stored underneath, ie:
details | details | details | details | ... |
---|---|---|---|---|
author_name | Billy | author_id | 02 | ... |
author_name | Sam | author_id | 05 | ... |
Solution
Try with json_normalize
instead of apply(pd.Series)
df = pd.concat([df, pd.json_normalize(df.pop('details'))], axis=1)
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.