Issue
I have a function that receives a dataframe and returns a new dataframe, which is the same but with some added columns. Just as an example:
def arbitrary_function_that_adds_columns(df):
# In this trivial example I am adding only 1 column, but this function may add an arbitrary number of columns.
df['new column'] = df['A'] + df['B'] / 8 + df['A']**3
return df
To apply this function to a whole data frame is easy:
import pandas
df = pandas.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
df = arbitrary_function_that_adds_columns(df)
print(df)
How do I apply the arbitrary_function_that_adds_columns
function to a subset of the rows? I am trying this
import pandas
df = pandas.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
rows = df['A'].isin({1,3})
df.loc[rows] = arbitrary_function_that_adds_columns(df.loc[rows])
print(df)
but I receive the original dataframe. The result I'm expecting to get is
A B new column
0 1 2 NaN
1 2 3 10.375
2 3 4 NaN
3 4 5 68.625
Solution
Note that, according to the expected output, you want rows=[1,3]
, not rows = df['A'].isin({1,3})
. The latter selects all the rows whose 'A' value is either 1 or 3.
import pandas as pd
def arbitrary_function_that_adds_columns(df):
# make sure that the function doesn't mutate the original DataFrame
# Otherwise, you will get a SettingWithCopyWarning
df = df.copy()
df['new column'] = df['A'] + df['B'] / 8 + df['A']**3
return df
df = pd.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
rows = [1, 3]
# the function is applied to a copy of a DataFrame slice
>>> sub_df = arbitrary_function_that_adds_columns(df.loc[rows])
>>> sub_df
A B new column
1 2 3 10.375
3 4 5 68.625
# Add the new information to the original df
>>> df = df.combine_first(sub_df)
>>> df
A B new column
0 1 2 NaN
1 2 3 10.375
2 3 4 NaN
3 4 5 68.625
Here is another way that doesn't involve copying the subset of the DataFrame.
def arbitrary_function_that_adds_columns(df, rows='all'):
if rows == 'all':
rows = df.index
sub_df = df.loc[rows]
df.loc[rows, 'new column'] = sub_df['A'] + sub_df['B'] / 8 + sub_df['A']**3
return df
>>> df = arbitrary_function_that_adds_columns(df, rows)
>>> df
A B new column
0 1 2 NaN
1 2 3 10.375
2 3 4 NaN
3 4 5 68.625
Answered By - HarryPlotter
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.