Issue
I'm having trouble turning a column of lists of lists into separate columns. I have a bad solution that works by working on each row independently and then appending them to each other, but this takes far too long for ~500k rows. Wondering if someone has a better solution.
Here is the input:
>>> import pandas as pd
>>> import numpy as np
>>> pd.DataFrame({'feat': [[["str1","", 3], ["str3","", 5], ["str4","", 3]],[["str1","", 4], ["str2","", 5]] ]})
feat | |
---|---|
0 | [[str1, , 3], [str3, , 5], [str4, , 3]] |
1 | [[str1, , 4], [str2, , 5]] |
Desired output:
>>> pd.DataFrame({'str1': [3, 4], 'str2': [np.nan,5] , 'str3': [5,np.nan], 'str4': [3,np.nan]})
str1 | str2 | str3 | str4 | |
---|---|---|---|---|
0 | 3 | NaN | 5 | 3 |
1 | 4 | 5 | NaN | NaN |
Update: Solved by @ifly6! Fastest solution by far. For 100k rows and 80 total variables, the total time taken was 8.9 seconds for my machine.
Solution
Loading your df
, create df1
as follows:
df1 = pd.DataFrame.from_records(df.explode('feat').values.flatten()).replace('', np.nan)
df1.index = df.explode('feat').index
Set index on df1
from the original data to preserve row markers (passing index=df.explode('feat').index
does not work). (Alternatively, to get to the point where you have separated the lists into columns, you could use df.explode('feat')['feat'].apply(pd.Series)
. I prefer, however, to avoid apply
so use the DataFrame constructor instead.)
Reset index on df1
then set multi-index (cannot set the column 0 index directly because it overwrites the original index):
df1.reset_index().set_index(['index', 0])
# df1.set_index(0, append=True) # alternatively should work
Then unstack. You can drop columns that are all NaN
by appending .dropna(how='all', axis=1)
, yielding:
>>> df1.reset_index().set_index(['index', 0]).unstack().dropna(how='all', axis=1)
2
0 str1 str2 str3 str4
index
0 3.0 NaN 5.0 3.0
1 4.0 5.0 NaN NaN
This solution also largely avoids hard-coding which specific columns to look at or move about.
Answered By - ifly6
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.