Issue
My input is a dataframe :
df = pd.DataFrame({'mycol': ['A', 'B', 'A', 'B', 'B', 'C', 'A', 'C', 'A', 'A']})
print(df)
mycol
0 A
1 B
2 A
3 B
4 B
5 C
6 A
7 C
8 A
9 A
A
appear 5 times, B
appear 3 times and C
appear 2 times.
Lets say we need to slice the groups to take only 3 items of each group. Then index 8 and 9 should be removed because group A
is exceeded by 2 values and index 10 should be created to complete group C
because it lacks 1 value.
PS : the order of the original rows must be preserved and the index is important for me as I need to track the positions (start and end) for each slice.
For that I made the code below but some indexes are messed up and also there is some undesired nan rows.
def func(group):
df2 = pd.DataFrame(None, index=[max(group.index)+1], columns=['mycol'])
result = pd.concat([group.iloc[:3], df2])
result['newcol'] = [group.name + str(i+1) for i in range(len(result))]
return result
final = df.groupby('mycol').apply(func).droplevel(0)
print(final)
mycol newcol
0 A A1
2 A A2
6 A A3
10 NaN A4
1 B B1
3 B B2
4 B B3
5 NaN B4
5 C C1
7 C C2
8 NaN C3
Do you guys know how to fix my code ? Or do you have another suggestions ?
My expected output is this :
mycol newcol
0 A A1
1 B B1
2 A A2
3 B B2
4 B B3
5 C C1
6 A A3
7 C C2
10 NaN C3
Solution
You can use a groupby.cumcount
, pivot
and stack
:
N = 3
c = df.groupby('mycol').cumcount().add(1)
out= (df.assign(newcol=df['mycol']+c.astype(str), c=c)
.pivot(index='mycol', columns='c', values='newcol')
.iloc[:, :N].stack(dropna=False)
.reset_index(0, name='newcol')
)
Output:
mycol newcol
c
1 A A1
2 A A2
3 A A3
1 B B1
2 B B2
3 B B3
1 C C1
2 C C2
3 C NaN
Or with a custom groupby.apply
:
from itertools import count
N = 3
c = count(len(df))
out = (df
.groupby('mycol', group_keys=False)
.apply(lambda g: pd.DataFrame(
{'mycol': [g.name]*min(N, len(g))+[float('nan')]*(N-len(g)),
'newcol': [f'{g.name}{x+1}' for x in range(N)],
}, index=g.index[:N].tolist()+[next(c) for _ in range(N-len(g))])
)
)
Output:
mycol newcol
0 A A1
2 A A2
6 A A3
1 B B1
3 B B2
4 B B3
5 C C1
7 C C2
10 NaN C3
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.