Monday, August 1, 2022

[FIXED] Combine values from multiple dataframe rows into lists for every user

August 01, 2022 pandas, python, pytorch No comments

Issue

I have this csv file below (products rated by users) which into pandas dataframe:

--------------------------------
User_id | Product_id | Rating  |
--------------------------------
1       | 00         | 3       |  
1       | 02         | 5       |
2       | 01         | 1       |
2       | 00         | 2       |
2       | 02         | 2       |

I want to change the dataframe so that it has the same number of rows as the source table above, but only two columns:

Column 1: needs to be a list of L length (L = total number of existing kinds of products), and where the n-th value (n = product_id) in the list is the rating given by the user in this row to the product. All all other values in the list need to be zeros
column 2 should be a list of the same L length, where the n-ths values equal to ratings for n-ths products (n = product_id) for all product_ids rated by this user (in the entire table); all other (unrated) values that are not rated by the user need to be zeros

The desired result would be (consistent with the example above):

--------------------------------
User_id | col1       | col2    |
--------------------------------
1       | [3,0,0]    | [3,0,5] |  
1       | [0,0,5]    | [3,0,5] |
2       | [0,1,0]    | [2,1,2] |
2       | [2,0,0]    | [2,1,2] |
2       | [0,0,2]    | [2,1,2] |

I will greatly appreciate any help with this. Please do ask questions if i can make the question & explanation more clear.

Solution

I managed to solve this, however it feels like a lot of expensive code & operations for something relatively simple. If you have any ideas how to simplify this, I'd appreciate it a lot.

df = pd.read_csv('interactionsv21test.csv')

number_of_products = df['product_id'].nunique()

#assign indexes to products https://stackoverflow.com/questions/38088652/pandas-convert-categories-to-numbers
df.product_id = pd.Categorical(df.product_id)
df['product_indx'] = df.product_id.cat.codes
print('source table')
print(df.sort_values(['user_id', 'product_id', 'product_indx'], ascending=True).head(n=3))

df1 = (df.groupby([df['user_id']])
       .apply(lambda x: {int(i):int(k) for i,k in zip(x['product_indx'], x['rating'])})
       .reset_index(name='rating'))

#add blank values for non existing dictionary values https://stackoverflow.com/questions/38987/how-do-i-merge-two-dictionaries-in-a-single-expression
df1['rating_y'] = (df1['rating'].apply(lambda x: {int(k): 0 for k in range(number_of_products )} | x))

df['rating_x'] = df.apply(lambda row: {row['product_indx']:row['rating']}, axis=1)
df['rating_x'] = (df['rating_x'].apply(lambda x: {int(k): 0 for k in range(number_of_products)} | x ))

df = df[['user_id', 'rating_x']].merge(df1[['user_id','rating_y']],how='inner',left_on=['user_id'],right_on=['user_id'])

pd.set_option('display.max_columns', 7)
pd.set_option('display.width', 1000)

print('final result')
print(df.head(n=3))

Output:

source table
    user_id prod_id  rating  is_reviewed  prod_indx
40      198      63       5            1          1
0       198    2590       4            1         41
5       198    6960       4            1         51

final result
    user_id  rating_x                                           rating_y
...
40      198  {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...  {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...
41      198  {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...  {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...
42      198  {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...  {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...

Answered By - Mkaerobus

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, August 1, 2022

[FIXED] Combine values from multiple dataframe rows into lists for every user

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels