Issue
I have a data frame that looks like this:
I want to create a matrix that will count the number of times each time per 'ID', 'col2' and 'col3' says a fruit value:
Solution
One way (vectorized):
df = df.set_index('ID')
new_df = pd.DataFrame(np.sum(df.to_numpy()[:, None] == np.unique(df.to_numpy())[:, None], axis=2), index=df.index, columns=np.unique(df.to_numpy()))
Output:
>>> new_df
Apple Orange Pear
ID
001 0 2 0
002 1 0 1
003 1 0 1
If you want to operate on only a subset of the columns:
subset = ['col2', 'col3']
new_df = pd.DataFrame(np.sum(df[subset].to_numpy()[:, None] == np.unique(df[subset].to_numpy())[:, None], axis=2), index=df.index, columns=np.unique(df[subset].to_numpy()))
Answered By - richardec
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.