Issue
I have a dataframe where I am trying to generate an id (without the complexity of a hash etc) based on an information from a string. The code is as follows:
df['id'] = df.City.str[:3] + '-' + df.Name.str[:3] +'-' + df.index.astype(str)
City Name Id
Paris John Par-Joh-1
Paris Paul Par-Pau-2
Paris Pierre Par-Pie-3
Paris Paula Par-Pau-4
Rome Riccardo Rom-Ric-5
Rome Jean-Paul Rom-Jea-6
Rome Franc Rom-Fra-7
My problem is that the code does not restart count when the name of the column City
changes (see above). How can I adapt the code to reach the desired output (see below)?
City Name Id
Paris John Par-Joh-1
Paris Paul Par-Pau-2
Paris Pierre Par-Pie-3
Paris Paula Par-Pau-4
Rome Riccardo Rom-Ric-1
Rome Jean-Paul Rom-Jea-2
Rome Franc Rom-Fra-3
Thank you
Solution
Use GroupBy.cumcount
with add 1
and convert to string
:
df['id'] = (df.City.str[:3] + '-' + df.Name.str[:3] +'-' +
df.groupby('City').cumcount().add(1).astype(str))
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.