Issue
I have a small dataframe, like this.
import pandas as pd
import numpy as np
# data's stored in dictionary
details = {
'address_id': [1, 1, 1, 2, 2],
'business': ['verizon', 'verizon', 'comcast', 'sprint', 'att']
}
df = pd.DataFrame(details)
print(df)
I am trying to find out if, and when a person switched to a different cell phone service.
I tried this logic; didn't work.
df['new'] = df.Column1.isin(df.Column1) & df[~df.Column2.isin(df.Column2)]
Basically, given index row 0 and row 1, when the address_id was the same, the business was the same, but the business changed from verizon to comcast in index row 2. Also, given index row 3 and row 4, the address_id was the same, but the business changed from sprint to att in index row 4. I'd like to add a new column to the dataframe to flag these changes. How can I do that?
Solution
UPDATE: Here is an even simpler way than my original answer using join()
(see below) to do what your question asks:
df['new'] = df.address_id.map(df.groupby('address_id').first().business) != df.business
Explanation:
- Use
groupby()
andfirst()
to create a dataframe whosebusiness
column contains the first one encountered for eachaddress_id
- Use
Series.map()
to transform the original dataframe'saddress_id
column into this firstbusiness
value - Add column
new
which isTrue
only if this newbusiness
differs from the originalbusiness
column.
Here is a simple way to do what you've asked using groupby()
and join()
:
df = df.join(df.groupby('address_id').first(), on='address_id', rsuffix='_first')
df = df.assign(new=df.business != df.business_first).drop(columns='business_first')f
Output:
address_id business new
0 1 verizon False
1 1 verizon False
2 1 comcast True
3 2 sprint False
4 2 att True
Explanation:
- Use
groupby()
andfirst()
to create a dataframe whosebusiness
column contains the first one encountered for eachaddress_id
- Use
join()
to add a columnbusiness_first
todf
containing the corresponding first business for eachaddress_id
- Use
assign()
to add a columnnew
containing a boolean indicating whether the row contains a newbusiness
with an existingaddress_id
- Use
drop()
to eliminate thebusiness_first
column.
Answered By - constantstranger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.