Issue
I have a pd.DataFrame
which may look like something like this
data = {"col_x": ["1234", "5678", "9876", "1111"],
"col_y": ["1234", "2222", "3333", "1111"],
"col_grp": [pd.NA, ["5678", "9999"], ["9876", "5555", "1222"], pd.NA]}
df = pd.DataFrame(data)
I want to make another column valid
to check if col_x
equals col_y
or col_x
is in col_grp
.
I tried with
def check_validity(row):
if row["col_x"] == row["col_y"]:
return True
if pd.notnull(row["col_grp"]):
if isinstance(row["col_grp"], list):
return row["col_x"] in row["col_grp"]
else:
return row["col_x"] == row["col_grp"]
return False
df["valid"] = df.apply(lambda row: check_validity(row), axis=1)
But I get
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I know that list
should probably not be in a pd.DataFrame
like this, so I apologize in advance.
Can anybody help me?
Solution
Don't use apply
but a list comprehension that will be more efficient:
df['valid'] = [x==y or isinstance(g, list ) and x in g for (x, y, g)
in zip(df['col_x'], df['col_y'], df['col_grp'])]
If you must use apply
:
def check_validity(row):
x, y, g = row[['col_x', 'col_y', 'col_grp']]
return x==y or isinstance(g, list ) and x in g
df['valid'] = df.apply(lambda row: check_validity(row), axis=1)
Output (with some extra rows):
col_x col_y col_grp valid
0 1234 1234 <NA> True
1 5678 2222 [5678, 9999] True
2 9876 3333 [9876, 5555, 1222] True
3 1111 1111 <NA> True
4 1234 2222 <NA> False
5 1234 2222 [2222] False
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.