Issue
I generated a dataset that shows the similarity between users in a graph based on their neighbors. Based on a dataset that shows the trust relations between users in a social network, I'm aiming to build a new dataset that contains the most similar users to my "trustor" user (e.g. the 3 most similar ones) by using a similarity evaluation algorithm. I have listed the users in descending order so that the first time a new "trustor" appears, his/her most similar users appear at first.
new_trust.sort_values(['truster','value'],ascending=False)
So basically I need to keep only the 3 first appearances of each user in my dataframe.
I tried to do a for i in range(new_trust.len())
: but couldn't quite find it.
Solution
If user is being your column truster you can use a groupby
and get the first 3 appearances.
arr = {'truster':{0:1642,1:1642,2:1642,3:1642,4:1642,5:2,6:2,7:2,8:2,9:2},'trustee':{0:1570,1:524,2:1039,3:1545,4:1360,5:1388,6:658,7:1078,8:1336,9:1157},'value':{0:'0,08',1:'0,0533333',2:'0,04',3:'0,04',4:'0,022857',5:'0,001175',6:'0,001169',7:'0,001169',8:'0,001169',9:'0,000902'}}
df_ = pd.from_dict(arr)
df = df_.groupby(['truster']).head(3)
truster trustee value
0 1642 1570 0,08
1 1642 524 0,0533333
2 1642 1039 0,04
5 2 1388 0,001175
6 2 658 0,001169
7 2 1078 0,001169
An other solution would be to use cumcount
:
df_['tmp_seq'] = df_.groupby(['truster']).cumcount()
df = df_.loc[df_['tmp_seq'] < 3]
Answered By - Deusdeorum
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.