Tuesday, November 21, 2023

[FIXED] convert column of lists of dictionaries to separate columns

November 21, 2023 pandas, python No comments

Issue

I have:

grp = ["A","B","C","A","C","C","B"]
dictl = ["[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]"]
df = pd.DataFrame({'grp':grp,'dictl':dictl})

I would like to convert it to:

pd.DataFrame({'grp':["A","B","C","A","C","C","B"],
              'TypeID':["0","0","","0","0","","0"],
              'Description':["blah","blah","","blah","blah","","blah"],
              'DateCreated':["2018-08-09T14:00:30.957","2018-08-09T14:00:30.957","","2018-08-09T14:00:31.504","2018-08-09T14:00:31.504","","2018-08-09T14:00:31.504"]})

I tried suggestions from Change a column containing list of dict to columns in a DataFrame, and had the following issues:

for grp, dictl in df:
    rec = {'Name': grp}
    rec.update(x for d in dictl for x in d.items())
    records.append(rec)

error: ValueError: too many values to unpack (expected 2)

and

df['dictl'].apply(lambda c:
                                  pd.Series({next(iter(x.keys())).strip(':'):
                                             next(iter(x.values())) for x in c})
                                  )

gave error: AttributeError: 'str' object has no attribute 'keys'

I have > 2m rows, so would like this method to be quick if possible

Solution

Going by your stringified input, you can do this:

groups = ["A","B","C","A","C","C","B"]
strings = ["[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]"]

import ast

dictl = [ast.literal_eval(string) for string in strings]

# In case of missing data
defaults = {'TypeID': 0, 'Description': '', 'DateCreated': ''}

# Some fancy list comprehension-ternary if-dict creation, because why not?
dictl_grp = [{**item[0], 'grp': group} if len(item)
             else {'grp': group, **defaults} 
             for group, item in zip(groups, dictl)]

import pandas as pd

df = pd.DataFrame.from_records(dictl_grp)
print(df)

which yields

   TypeID Description              DateCreated grp
0       0        blah  2018-08-09T14:00:30.957   A
1       0        blah  2018-08-09T14:00:30.957   B
2       0                                        C
3       0        blah  2018-08-09T14:00:31.504   A
4       0        blah  2018-08-09T14:00:31.504   C
5       0                                        C
6       0        blah  2018-08-09T14:00:31.504   B

(I renamed a few variables for clarity.)

Answered By - 9769953

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 21, 2023

[FIXED] convert column of lists of dictionaries to separate columns

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels