Issue
I need to write a complicated function that will evaluate a new column for a DataFrame
in pandas.
This function will have to use data from multiple (more than 10) columns of this DataFrame.
It won't fit into a lambda
, to plug it in easily to the apply()
function.
I don't want to write a function that takes more than 10 arguments and plug it into apply()
, because it would hurt readability of my code.
I would rather not use for
loop to iterate over rows, as it has poor performance.
Is there a clever solution to this problem?
Solution
Simply, make a function that takes the row
as input and pass it to apply() with the axis=1
argument.
For example:
df = pd.DataFrame([[4, 9], ["x", "y"], [True, False]], columns=["A", "B"])
print(df)
# A B
#0 4 9
#1 x y
#2 True False
def f(row):
if type(row.A) is bool:
return "X"
else:
return row.A + row.B
df["C"] = df.apply(f, axis=1)
print(df)
# A B C
#0 4 9 13
#1 x y xy
#2 True False X
Answered By - alec_djinn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.