Issue
I have read a csv file and pivoted it to get to following structure:
pivoted = df.pivot('user_id', 'group', 'value')
lookup = df.drop_duplicates('user_id')[['user_id', 'group']]
lookup.set_index(['user_id'], inplace=True)
result = pivoted.join(lookup)
result = result.fillna(0)
Section of the result:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 group
user_id
2 33653 2325 916 720 867 187 31 0 6 3 42 56 92 15 l-1
4 18895 414 1116 570 1190 55 92 0 122 23 78 6 4 2 l-2
16 1383 70 27 17 17 1 0 0 0 0 1 0 0 0 l-2
50 396 72 34 5 18 0 0 0 0 0 0 0 0 0 l-3
51 3915 1170 402 832 2791 316 12 5 118 51 32 9 62 27 l-4
I want to sum across column 0 to column 13 by each row and divide each cell by the sum of that row. I am still getting used to pandas; if I understand correctly, we should try to avoid for loops when doing things like this? In other words, how can I do this in a 'pandas' way?
Solution
Try the following:
In [1]: import pandas as pd
In [2]: df = pd.read_csv("test.csv")
In [3]: df
Out[3]:
id value1 value2 value3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
In [4]: df["sum"] = df.sum(axis=1)
In [5]: df
Out[5]:
id value1 value2 value3 sum
0 A 1 2 3 6
1 B 4 5 6 15
2 C 7 8 9 24
In [6]: df_new = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)
In [7]: df_new
Out[7]:
value1 value2 value3
0 0.166667 0.333333 0.500
1 0.266667 0.333333 0.400
2 0.291667 0.333333 0.375
Or you can do the following:
In [8]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)
In [9]: df
Out[9]:
id value1 value2 value3 sum
0 A 0.166667 0.333333 0.500 6
1 B 0.266667 0.333333 0.400 15
2 C 0.291667 0.333333 0.375 24
Or just straight up from the beginning:
In [10]: df = pd.read_csv("test.csv")
In [11]: df
Out[11]:
id value1 value2 value3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
In [12]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df.sum(axis=1), axis=0)
In [13]: df
Out[13]:
id value1 value2 value3
0 A 0.166667 0.333333 0.500
1 B 0.266667 0.333333 0.400
2 C 0.291667 0.333333 0.375
Changing the column value1
and the like to your headers should work similarly.
Answered By - NullDev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.