Issue
I have a dataframe df
like below :
import pandas as pd
data = {'A': ['ABCD_1', 'ABCD_1', 'ABCD_1', 'ABCD_1', 'PQRS_2', 'PQRS_2', 'PQRS_2', 'PQRS_2', 'PQRS_2', 'PQRS_3', 'PQRS_4'], 'B':[2, 3, 5, 6, 7, 8, 9, 11, 13, 15, 17]}
df = pd.DataFrame(data)
df
| A | B |
+------------+----------+
| ABCD_1 | 2 |
| ABCD_1 | 3 |
| ABCD_1 | 5 |
| ABCD_1 | 6 |
| PQRS_2 | 7 |
| PQRS_2 | 8 |
| PQRS_2 | 9 |
| PQRS_2 | 11 |
| PQRS_2 | 13 |
| PQRS_3 | 15 |
| PQRS_4 | 17 |
+------------+----------+
What I want to achieve is that I need to assign 1 to first value of every group of column A
and 0 to the rest of other values. Incase, there is a single group present in Column A
I just assign 1 to that group. So, my expected result should look like below.
Expected Output :
| A | B | P |
+------------+----------+----------+
| ABCD_1 | 2 | 1 |
| ABCD_1 | 3 | 0 |
| ABCD_1 | 5 | 0 |
| ABCD_1 | 6 | 0 |
| PQRS_2 | 7 | 1 |
| PQRS_2 | 8 | 0 |
| PQRS_2 | 9 | 0 |
| PQRS_2 | 11 | 0 |
| PQRS_2 | 13 | 0 |
| PQRS_3 | 15 | 1 |
| PQRS_4 | 17 | 1 |
+------------+----------+----------+
I tried to group them using the below code, however, I wasn't able to get expected results.
Actual Output
df['P'] = pd.factorize(df['A']) + 1
| A | B | P |
+------------+----------+----------+
| ABCD_1 | 2 | 1 |
| ABCD_1 | 3 | 1 |
| ABCD_1 | 5 | 1 |
| ABCD_1 | 6 | 1 |
| PQRS_2 | 7 | 1 |
| PQRS_2 | 8 | 2 |
| PQRS_2 | 9 | 2 |
| PQRS_2 | 11 | 2 |
| PQRS_2 | 13 | 2 |
| PQRS_3 | 15 | 3 |
| PQRS_4 | 17 | 4 |
+------------+----------+----------+
How can I assign 1 and 0 to each of the groups using Pandas ?
Solution
You could achieve this as follows:
- Use
df.duplicated
to get a boolean series withFalse
for each first group entry andTrue
for the others. - Next, use the unary operator (
~
) to switchTrue
andFalse
values, and chainSeries.astype
to turn the booleans into ones (True
) and zeros (False
).
df['P'] = (~df.A.duplicated()).astype(int)
print(df)
A B P
0 ABCD_1 2 1
1 ABCD_1 3 0
2 ABCD_1 5 0
3 ABCD_1 6 0
4 PQRS_2 7 1
5 PQRS_2 8 0
6 PQRS_2 9 0
7 PQRS_2 11 0
8 PQRS_2 13 0
9 PQRS_3 15 1
10 PQRS_4 17 1
Alternatively, but provided that the groups in column A
are sorted, you can also try as follows:
- Compare each row with the next one (see:
Series.shift
) withSeries.ne
. Each row that is unequal to its shift will give usTrue
, all the others will lead toFalse
. Afterwards, again chainastype
.
df['P'] = (df.A.ne(df.A.shift())).astype(int)
print(df)
A B P
0 ABCD_1 2 1
1 ABCD_1 3 0
2 ABCD_1 5 0
3 ABCD_1 6 0
4 PQRS_2 7 1
5 PQRS_2 8 0
6 PQRS_2 9 0
7 PQRS_2 11 0
8 PQRS_2 13 0
9 PQRS_3 15 1
10 PQRS_4 17 1
Answered By - ouroboros1
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.