Thursday, March 17, 2022

[FIXED] Apply function to pandas dataframe row using values in other rows

March 17, 2022 dataframe, lambda, pandas, python No comments

Issue

I have a situation where I have a dataframe row to perform calculations with, and I need to use values in following (potentially preceding) rows to do these calculations (essentially a perfect forecast based on the real data set). I get each row from an earlier df.apply call, so I could pass the whole df along to the downstream objects, but that seems less than ideal based on the complexity of objects in my analysis.

I found one closely related question and answer [1], but the problem is actually fundamentally different in the sense that I do not need the whole df for my calcs, simply the following x number of rows (which might matter for large dfs).

So, for example:

df = pd.DataFrame([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], 
                  columns=['PRICE'])
horizon = 3

I need to access values in the following 3 (horizon) rows in my row-wise df.apply call. How can I get a naive forecast of the next 3 data points dynamically in my row-wise apply calcs? e.g. for row the first row, where the PRICE is 100, I need to use [200, 300, 400] as a forecast in my calcs.

[1] apply a function to a pandas Dataframe whose returned value is based on other rows

Solution

By getting the row's index inside of the df.apply() call using row.name, you can generate the 'forecast' data relative to which row you are currently on. This is effectively a preprocessing step to put the 'forecast' onto the relevant row, or it could be done as part of the initial df.apply() call if the df is available downstream.

df = pd.DataFrame(
    [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
    columns=["PRICE"]
)
horizon = 3
    
df["FORECAST"] = df.apply(
    lambda x: [df["PRICE"][x.name + 1 : x.name + horizon + 1]],
    axis=1
)

Results in this:

   PRICE          FORECAST
0    100   [200, 300, 400]
1    200   [300, 400, 500]
2    300   [400, 500, 600]
3    400   [500, 600, 700]
4    500   [600, 700, 800]
5    600   [700, 800, 900]
6    700  [800, 900, 1000]
7    800       [900, 1000]
8    900            [1000]
9   1000                []

Which can be used in your row-wise df.apply() calcs.

EDIT: If you want to strip the index from the resulting 'Forecast':

df["FORECAST"] = df.apply(
    lambda x: [df["PRICE"][x.name + 1 : x.name + horizon + 1].reset_index(drop=True)],
    axis=1
)

Answered By - lukewitmer

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, March 17, 2022

[FIXED] Apply function to pandas dataframe row using values in other rows

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels