Issue
I have a problem. I have articles and these articles have a unique id
. The problem is, however, the article
description aka article
- this is not unique.
I would like to try changing the name of the article
description article
so that there is only one description left. I always want to use the article
name that occurs most often.
I have tried something, however I don't know how to access that with .apply(lambda x: ...
and write the most used item out to me and then set that as the name.
How can I change the description of the article so that only the most frequently mentioned article description is included?
id article cost
0 1 Bendge 15.30
1 1 Bendge 15.30
2 1 Bendge V2 15.30
3 1 Bendge - volumne 2 15.30
4 5 SEF 14.89
5 1 Bendge 15.30
6 2 DFH 4.56
7 2 DFH 4.56
8 2 DFH V2 4.56
9 2 DFH - volumne 2 4.56
10 2 DFH 4.56
Code
d = {'id': [1, 1, 1, 1, 5, 1, 2, 2, 2, 2, 2],
'article': ['Bendge', 'Bendge', 'Bendge V2', 'Bendge - volumne 2', 'SEF', 'Bendge',
'DFH', 'DFH', 'DFH V2', 'DFH - volumne 2', 'DFH'],
'cost': [15.30, 15.30, 15.30, 15.30, 14.89, 15.30,
4.56, 4.56, 4.56, 4.56, 4.56],
}
df = pd.DataFrame(data=d)
display(df)
df.loc[df['id'] == 1, ['id','article']].value_counts().head(1)
[OUT]
id article
1 Bendge 3
dtype: int64
df['article'] = df.apply(lambda x: x[['id','article']].value_counts().head(1))
[OUT]
KeyError: "None of [Index(['id', 'article'], dtype='object')] are in the [index]"
What I want
id article cost
0 1 Bendge 15.30
1 1 Bendge 15.30
2 1 Bendge 15.30
3 1 Bendge 15.30
4 5 SEF 14.89
5 1 Bendge 15.30
6 2 DFH 4.56
7 2 DFH 4.56
8 2 DFH 4.56
9 2 DFH 4.56
10 2 DFH 4.56
Solution
Use GroupBy.transform
with lambda function with Series.value_counts
and first index:
df['article'] = df.groupby('id')['article'].transform(lambda x: x.value_counts().index[0])
print (df)
id article cost
0 1 Bendge 15.30
1 1 Bendge 15.30
2 1 Bendge 15.30
3 1 Bendge 15.30
4 5 SEF 14.89
5 1 Bendge 15.30
6 2 DFH 4.56
7 2 DFH 4.56
8 2 DFH 4.56
9 2 DFH 4.56
10 2 DFH 4.56
Another solution with Series.mode
with first value:
df['article'] = df.groupby('id')['article'].transform(lambda x: x.mode().iat[0])
print (df)
id article cost
0 1 Bendge 15.30
1 1 Bendge 15.30
2 1 Bendge 15.30
3 1 Bendge 15.30
4 5 SEF 14.89
5 1 Bendge 15.30
6 2 DFH 4.56
7 2 DFH 4.56
8 2 DFH 4.56
9 2 DFH 4.56
10 2 DFH 4.56
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.