Issue
I have the below df
:
ID Number Number 2 Number 3
1 10001 NaN NaN 5
2 10001 25 NaN 12
3 10001 78 4 NaN
4 10002 3 NaN NaN
5 10002 234 201 NaN
6 10002 NaN 510 34
7 10003 NaN 765 NaN
8 10003 NaN 422 NaN
9 10003 NaN 753 56
10 10003 231 7587 2345
I want to structure the data so the first NaN rows are deleted by column.
Resuling df
:
ID Number Number 2 Number 3
1 10001 25 4 5
2 10001 78 NaN 12
3 10001 NaN NaN NaN
4 10002 3 201 34
5 10002 234 510 NaN
6 10002 NaN NaN NaN
7 10003 231 765 56
8 10003 NaN 422 2345
9 10003 NaN 753 NaN
10 10003 NaN 7587 NaN
I'm essentially trying to shift the column data up by n rows depending on where the data starts for that column, so at the first rows of ID
there is always data in at least 1 of the Number
columns.
I've tried first_row_index
but this doesn't work by individual column
I've tried dropna
but I can't find a solution where I'm defining what number of rows to drop per column.
Solution
df1 = df.melt('ID').dropna()
df1['var1'] = df1.groupby(['variable', 'ID']).cumcount()
df1.pivot(['ID', 'var1'], 'variable', 'value').reset_index(0)
variable ID Number Number 2 Number 3
var1
0 10001 25.0 4.0 5.0
1 10001 78.0 NaN 12.0
0 10002 3.0 201.0 34.0
1 10002 234.0 510.0 NaN
0 10003 231.0 765.0 56.0
1 10003 NaN 422.0 2345.0
2 10003 NaN 753.0 NaN
3 10003 NaN 7587.0 NaN
Answered By - onyambu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.