Issue
I would like to take my pandas Dataframe and convert it to a list of dictionaries. I can do this using the pandas to_dict('records')
function. However, this function takes any column values that are lists and returns numpy arrays. I would like for the content of the returned list of dictionaries to be base python objects rather than numpy arrays.
I understand I could iterate my outputted dictionaries but I was wondering if there is something more clever to do this.
Here is some sample code that shows my problem:
import pandas as pd
import numpy as np
data = pd.concat([
pd.Series(['a--b', 'c--d', 'e--f'], name='key'),
pd.Series(['123', '456', '789'], name='code'),
pd.Series([np.array(['123', '098']), np.array(['000', '999']), np.array(['789', '432'])], name='codes')
], axis=1)
output = data.to_dict('records')
# this prints <class 'numpy.ndarray'>
print(type(output[0]['codes']))
output
, in this case, looks like this:
[{'key': 'a--b', 'code': '123', 'codes': array(['123', '098'], dtype='<U3')},
{'key': 'c--d', 'code': '456', 'codes': array(['000', '999'], dtype='<U3')},
{'key': 'e--f', 'code': '789', 'codes': array(['789', '432'], dtype='<U3')}]
I would like for that print statement to print a list. I understand I could simply do the following:
for row in output:
row['codes'] = row['codes'].tolist()
# this now prints <class 'list'>, which is what I want
print(type(output[0]['codes']))
However, my dataframe is of course much more complicated than this, and I have multiple columns that are numpy arrays. I know I could expand the snippet above to check which columns are array type and cast them using tolist()
, but I'm wondering if there is something snappier or more clever? Perhaps something provided by Pandas that is optimized?
To be clear, here is the output I need to have:
print(output)
[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
{'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
{'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]
Solution
I ended up creating a list of the numpy-typed column names:
np_fields = ['codes']
and then I replaced each field in place in my dataframe:
for col in np_fields:
data[col] = data[col].map(np.ndarray.tolist)
I then called data.to_dict('records')
once that was complete.
Answered By - Katya Willard
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.