Issue
I am reposting since my last question was poorly worded. I have a table that looks like the following:
+------+------+------+------+------------+----------+
| ID 1 | ID 2 | Type | Year | Identified | Multiple |
+------+------+------+------+------------+----------+
| 100 | 10 | A | 2018 | 12 | |
| 100 | 11 | B | 2019 | | multiple |
| 100 | 12 | C | 2019 | | multiple |
| 100 | 13 | D | 2019 | | |
| 200 | 10 | A | 2018 | | |
| 200 | 11 | B | 2019 | | multiple |
| 200 | 12 | C | 2019 | | multiple |
| 200 | 13 | D | 2019 | | |
+------+------+------+------+------------+----------+
I am trying to delete the "multiple" strings inside the "Multiple" column where the ID
group does not have a Identified
value. For example, the first group of ID 1 == 100
contains a not-null Identified
value so we can leave the "multiple" values. However, the ID 1 == 200
group has no Identified
values, so I would like to remove the "multiple" values that appear in this group, giving us the following dataframe.
+------+------+------+------+------------+----------+
| ID 1 | ID 2 | Type | Year | Identified | Multiple |
+------+------+------+------+------------+----------+
| 100 | 10 | A | 2018 | 12 | |
| 100 | 11 | B | 2019 | | multiple |
| 100 | 12 | C | 2019 | | multiple |
| 100 | 13 | D | 2019 | | |
| 200 | 10 | A | 2018 | | |
| 200 | 11 | B | 2019 | | |
| 200 | 12 | C | 2019 | | |
| 200 | 13 | D | 2019 | | |
+------+------+------+------+------------+----------+
Please let me know if I can rephrase my question.
EDIT: if both Identified
and Multiple
columns are blank, then leave blank.
Solution
Assuming your dataframe can be built with the same types than this one (note for next time how you can provide a sample dataframe as one can't directly copy yours)
df= pd.DataFrame({
'ID 1': [100]*4+[200]*4,
'other_cols':range(8),
'Identified':['12']+['']*7,
'Multiple':['','multiple','multiple','']*2
})
print(df)
# ID 1 other_cols Identified Multiple
# 0 100 0 12
# 1 100 1 multiple
# 2 100 2 multiple
# 3 100 3
# 4 200 4
# 5 200 5 multiple
# 6 200 6 multiple
# 7 200 7
So to do the job, check where Identified if not equal (ne
) to blank. groupby.transform
with any
to get True for all the same ID if one is not blank. Use the reverse (~
) of this mask to select the IDs with only blank and assign blank in the other column.
# greate a mask with True where at least one non blank value per ID
mask = df['Identified'].ne('').groupby(df['ID 1']).transform(any)
# reverse the mask and assign blank
df.loc[~mask, 'Multiple'] = ''
print(df)
# ID 1 other_cols Identified Multiple
# 0 100 0 12
# 1 100 1 multiple
# 2 100 2 multiple
# 3 100 3
# 4 200 4
# 5 200 5
# 6 200 6
# 7 200 7
Answered By - Ben.T
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.