Issue
I have small dataset defined as follows;
df = pd.DataFrame({
'id':['id1', 'id2', 'id3'],
'cmp1':[1,0,0],
'cmp2':[0,1,0],
'cmp3':[1,0,0]})
The goal is to add a new column called col
which'll tell the first occurrence of 1 for every row among the columns, cmp1
,cmp2
,cmp3
. Hence the final output shd look like;
df2 = pd.DataFrame({
'id':['id1', 'id2', 'id3'],
'cmp1':[1,0,0],
'cmp2':[0,1,0],
'cmp3':[1,0,0],
'col':['first', 'second','none']})
As you can see for id=id1
, 1 occurred for the very first time in cmp1, hence we have first in col
.It is irrespective of the fact that we have cmp2=0
, and again cmp3=1
.
Can I get some help on how to define this new column, col
?
Solution
You can use .max(axis=1)
to find those rows with at least one 1
, then np.where
with idxmax
(for the column name):
# the actual data columns
data = df.iloc[:, 1:]
df['col'] = np.where(data.max(axis=1), data.idxmax(axis=1), 'none')
Output:
id cmp1 cmp2 cmp3 col
0 id1 1 0 1 cmp1
1 id2 0 1 0 cmp2
2 id3 0 0 0 none
If you insist on first
, second
,... you would need to do manually
labels = np.array(['first', 'second', 'third'])
df['col'] = np.where(data.max(axis=1), labels[data.to_numpy().argmax(axis=1)], 'none')
and output:
id cmp1 cmp2 cmp3 col
0 id1 1 0 1 first
1 id2 0 1 0 second
2 id3 0 0 0 none
Answered By - Quang Hoang
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.