Issue
I have a dataframe that looks like below. This dataframe contains name and address information.
l_name f_name m_name city_state zip
smith sam auburn,wa 98910
smith sam c auburn,wa 98910
smith sam durham,nc 27659
smith sam eddie auburn,wa 98910
smith sam auburn,wa 98910
:
:
aaron joe chris ocala,fl 58910
aaron joe ocala,fl 58910
aaron joe durham,nc 27659
aaron joe ocala,fl 58910
aaron joe a ocala,fl 58910
I am trying to create a code that can group/tag them into the same group if their l_name, f_name, city_state, and zip columns all match.
Here is what the output dataframe I am hoping to look like.
l_name f_name m_name city_state zip tag
smith sam auburn,wa 98910 1
smith sam c auburn,wa 98910 1
smith sam durham,nc 27659 0
smith sam eddie auburn,wa 98910 1
smith sam auburn,wa 98910 1
clair may seattle,wa 98092 2
clair may a seattle,wa 98092 2
clair may seattle,wa 98092 2
:
:
aaron joe chris ocala,fl 58910 n
aaron joe ocala,fl 58910 n
aaron joe durham,nc 27659 0
aaron joe ocala,fl 58910 n
aaron joe a ocala,fl 58910 n
If the person doesn't belong to any group, it will assign a 0 to it. 4 sam smith meets the conditions so they are all in the same group, which is 1. So tag/group 2 will be may clair, and joe aaron will be the n group, depending on how many names were stored in the dataframe.
I am wondering are there any good ways, methods, or suggestions for me to create a code that can output the results that I want above?
Thanks!
Solution
You can take those groups with > 1 count and get the group number (and +1 ), then for groups of count 1, set tag = 0. Then concatenate.
df1 = df[df.groupby(['l_name', 'f_name', 'city_state', 'zip'])['m_name'].transform('count')>1].copy()
df1['tag'] = df1.groupby(['l_name', 'f_name', 'city_state', 'zip']).ngroup() + 1
df2 = df[df.groupby(['l_name', 'f_name', 'city_state', 'zip'])['m_name'].transform('count')==1].copy()
df2['tag'] = 0
df_final = pd.concat([df1,df2])
df_final
l_name f_name m_name city_state zip tag
0 smith sam x auburn,wa 98910 2
1 smith sam c auburn,wa 98910 2
3 smith sam eddie auburn,wa 98910 2
4 smith sam x auburn,wa 98910 2
5 aaron joe chris ocala,fl 58910 1
6 aaron joe x ocala,fl 58910 1
8 aaron joe x ocala,fl 58910 1
9 aaron joe a ocala,fl 58910 1
2 smith sam x durham,nc 27659 0
7 aaron joe x durham,nc 27659 0
Answered By - Jonathan Leon
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.