Issue
I have a series in a dataframe that contains lists of tuples of various lengths after zipping two series together. Eg.
df['lists']
[(0.0, 0), (1.7, 0.28095163296378495), (7.4, 1.12693228043272953), (18.1, 3.053019684863041594), (1.4, 0.053019684863041594), (1.5, 0.01985536)]
[(7.2, 0.14417851715463678), (0.0, 0), (1.5, 0.013) (6.1, 5.15851278579066022)]
I also have created bins.
bins = [0.1,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.2,3.4,3.6,3.8,4.0,4.5,5.0,5.5,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0,21.0,22.0,23.0,24.0,25.0,30.0,35.0,40.0,50.0,60.0,70.0,80.0,90.0,100.0,110.0,125.0,150.0,175.0,200.0,250.0,500.0]
I want to groupby the first element in the tuple according to the bins and exclude any tuple where the element is zero. This is so I can find the mean or do some other calculations on the second element grouped into these bins. Eg.
1.3 NaN
1.4 0.053019684863041594
1.5 0.01642768
1.6 NaN
...
7.0 0.6355553987936832
I can use the explode()
method to separate out the lists but cannot figure it out from there.
Help is greatly appreciated.
Solution
Managed to solve this with a little help from @mozway. Needed a small tweak but it was my fault.
For posterity:
df2 = pd.DataFrame(df['lists'].explode().to_list(), columns=['col1', 'col2'])
out = (df2.loc[df2['col2'].ne(0)].assign(bin=lambda d: pd.cut(d['col2'], bins=bins))).groupby('bin')['col2'].mean()
Answered By - apk19
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.