Issue
I have:
grp = ["A","B","C","A","C","C","B"]
dictl = ["[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]"]
df = pd.DataFrame({'grp':grp,'dictl':dictl})
I would like to convert it to:
pd.DataFrame({'grp':["A","B","C","A","C","C","B"],
'TypeID':["0","0","","0","0","","0"],
'Description':["blah","blah","","blah","blah","","blah"],
'DateCreated':["2018-08-09T14:00:30.957","2018-08-09T14:00:30.957","","2018-08-09T14:00:31.504","2018-08-09T14:00:31.504","","2018-08-09T14:00:31.504"]})
I tried suggestions from Change a column containing list of dict to columns in a DataFrame, and had the following issues:
for grp, dictl in df:
rec = {'Name': grp}
rec.update(x for d in dictl for x in d.items())
records.append(rec)
error: ValueError: too many values to unpack (expected 2)
and
df['dictl'].apply(lambda c:
pd.Series({next(iter(x.keys())).strip(':'):
next(iter(x.values())) for x in c})
)
gave error: AttributeError: 'str' object has no attribute 'keys'
I have > 2m rows, so would like this method to be quick if possible
Solution
Going by your stringified input, you can do this:
groups = ["A","B","C","A","C","C","B"]
strings = ["[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:30.957'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]",
"[]","[{'TypeID': 0, 'Description': 'blah', 'DateCreated': '2018-08-09T14:00:31.504'}]"]
import ast
dictl = [ast.literal_eval(string) for string in strings]
# In case of missing data
defaults = {'TypeID': 0, 'Description': '', 'DateCreated': ''}
# Some fancy list comprehension-ternary if-dict creation, because why not?
dictl_grp = [{**item[0], 'grp': group} if len(item)
else {'grp': group, **defaults}
for group, item in zip(groups, dictl)]
import pandas as pd
df = pd.DataFrame.from_records(dictl_grp)
print(df)
which yields
TypeID Description DateCreated grp
0 0 blah 2018-08-09T14:00:30.957 A
1 0 blah 2018-08-09T14:00:30.957 B
2 0 C
3 0 blah 2018-08-09T14:00:31.504 A
4 0 blah 2018-08-09T14:00:31.504 C
5 0 C
6 0 blah 2018-08-09T14:00:31.504 B
(I renamed a few variables for clarity.)
Answered By - 9769953
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.