Issue
I have this csv file below (products rated by users) which into pandas dataframe:
--------------------------------
User_id | Product_id | Rating |
--------------------------------
1 | 00 | 3 |
1 | 02 | 5 |
2 | 01 | 1 |
2 | 00 | 2 |
2 | 02 | 2 |
I want to change the dataframe so that it has the same number of rows as the source table above, but only two columns:
- Column 1: needs to be a list of L length (L = total number of existing kinds of products), and where the n-th value (n = product_id) in the list is the rating given by the user in this row to the product. All all other values in the list need to be zeros
- column 2 should be a list of the same L length, where the n-ths values equal to ratings for n-ths products (n = product_id) for all product_ids rated by this user (in the entire table); all other (unrated) values that are not rated by the user need to be zeros
The desired result would be (consistent with the example above):
--------------------------------
User_id | col1 | col2 |
--------------------------------
1 | [3,0,0] | [3,0,5] |
1 | [0,0,5] | [3,0,5] |
2 | [0,1,0] | [2,1,2] |
2 | [2,0,0] | [2,1,2] |
2 | [0,0,2] | [2,1,2] |
I will greatly appreciate any help with this. Please do ask questions if i can make the question & explanation more clear.
Solution
I managed to solve this, however it feels like a lot of expensive code & operations for something relatively simple. If you have any ideas how to simplify this, I'd appreciate it a lot.
df = pd.read_csv('interactionsv21test.csv')
number_of_products = df['product_id'].nunique()
#assign indexes to products https://stackoverflow.com/questions/38088652/pandas-convert-categories-to-numbers
df.product_id = pd.Categorical(df.product_id)
df['product_indx'] = df.product_id.cat.codes
print('source table')
print(df.sort_values(['user_id', 'product_id', 'product_indx'], ascending=True).head(n=3))
df1 = (df.groupby([df['user_id']])
.apply(lambda x: {int(i):int(k) for i,k in zip(x['product_indx'], x['rating'])})
.reset_index(name='rating'))
#add blank values for non existing dictionary values https://stackoverflow.com/questions/38987/how-do-i-merge-two-dictionaries-in-a-single-expression
df1['rating_y'] = (df1['rating'].apply(lambda x: {int(k): 0 for k in range(number_of_products )} | x))
df['rating_x'] = df.apply(lambda row: {row['product_indx']:row['rating']}, axis=1)
df['rating_x'] = (df['rating_x'].apply(lambda x: {int(k): 0 for k in range(number_of_products)} | x ))
df = df[['user_id', 'rating_x']].merge(df1[['user_id','rating_y']],how='inner',left_on=['user_id'],right_on=['user_id'])
pd.set_option('display.max_columns', 7)
pd.set_option('display.width', 1000)
print('final result')
print(df.head(n=3))
Output:
source table
user_id prod_id rating is_reviewed prod_indx
40 198 63 5 1 1
0 198 2590 4 1 41
5 198 6960 4 1 51
final result
user_id rating_x rating_y
...
40 198 {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ... {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...
41 198 {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ... {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...
42 198 {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ... {0: 0, 1: 5, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: ...
Answered By - Mkaerobus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.