Friday, May 27, 2022

[FIXED] Convert nested JSON to Dataframe with columns referencing nested paths

May 27, 2022 dataframe, json, key-value, pandas, python No comments

Issue

I am trying to convert a nested JSON into a CSV file with three columns: the level 0 key, the branch, and the lowest level leaf.

For example, in the JSON below:

{
    "protein": {
        "meat": {
            "chicken": {},
            "beef": {},
            "pork": {}
        },
        "powder": {
            "^ISOPURE": {},
            "substitute": {}
        }
    },
    "carbs": {
        "_vegetables": {
            "veggies": {
                "lettuce": {},
                "carrots": {},
                "corn": {}
            }
        },
        "bread": {
            "white": {},
            "multigrain": {
                "whole wheat": {}
            },
            "other": {}
        }
    },
    "fat": {
        "healthy": {
            "avocado": {}
        },
        "unhealthy": {}
    }
}

I want to create an output like this (didn't include entire tree example just to get point across):

level 0	branch	leaf
protein	protein.meat	chicken
protein	protein.meat	beef

I tried using json normalize but the actual file will not have paths that I can use to identify the nested fields as each dictionary is unique.

This returns the level 0 field but I need to have these as rows, not columns. Any help would be very much appreciated.

I created a function that pcan unnest the json based on key values like this:

import json

with open('path/to/json') as m:
    my_json = json.load(m)


def unnest_json(data):
    for key, value in data.items():
    print(str(key)+'.'+str(value))
    if isinstance(value, dict):
        unnest_json(value)
    elif isinstance(value, list):
        for val in value:
            if isinstance(val, str):
                pass
            elif isinstance(val, list):
                pass
            else:
                unnest_json(val)

unnest_json(my_json)

Solution

Probably not the cleanest approach but I think you can use some sort of recursive function (traverse in below code) to convert the dictionary into a list of column values and then convert them to pandas DataFrame.

data = {
    "protein": {
        "meat": {
            "chicken": {},
            "beef": {},
            "pork": {}
        },
        "powder": {
            "^ISOPURE": {},
            "substitute": {}
        }
    },
    "carbs": {
        "_vegetables": {
            "veggies": {
                "lettuce": {},
                "carrots": {},
                "corn": {}
            }
        },
        "bread": {
            "white": {},
            "multigrain": {
                "whole wheat": {}
            },
            "other": {}
        }
    },
    "fat": {
        "healthy": {
            "avocado": {}
        },
        "unhealthy": {}
    }
}

def traverse(col_values, dictionary, rows):
    for key in dictionary:
        new_col_values = list(col_values)
        if dictionary[key]:
            new_col_values[1] += '.' + key
            traverse(new_col_values, dictionary[key], rows)
        else:
            new_col_values[2] = key
            rows.append(new_col_values)

rows = []
for key in data:
    traverse([key, str(key), None], data[key], rows)

import pandas as pd

df = pd.DataFrame(rows, columns=["level 0", "branch", "leaf"])
print(df)

Answered By - tax evader

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, May 27, 2022

[FIXED] Convert nested JSON to Dataframe with columns referencing nested paths

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels