Tuesday, April 19, 2022

[FIXED] Adding new column with first non Nan for each row closest to a chosen column from a dataset Python

April 19, 2022 loops, nan, numpy, pandas, python No comments

Issue

Hello I want to create a new column from a given dataset (that I call here "df") with the first non-Nan for each row and closest from a given column

For example, I have a data frame with the years 2009 2010 2011 2012 2013 2014. I want to find the first Non-Nan value for each row but starting from year 2011!

So here's the dataset with Nan and values:

import pandas as pd
import numpy as np
data = np.random.randn(6,6)
mask = np.random.choice([1, 0], data.shape, p=[.1, .9]).astype(bool)
data[mask] = np.nan
df = pd.DataFrame(data=data,columns=['2009','2010','2011','2012','2013','2014'])
df

which output

I started to write the following function which gives the first non-NaN values from 2011 to 2009 for each row:

num_row = 0
for row in df.iterrows():
    num_row = num_row+1
    #print("for :" + str(row[1][str(2015)]))
    indicator = float("nan")
    distance_2011 = 0
    year = 2011
    while np.isnan(indicator) and year > 2009:
        year = year - 1
        distance_2011 = distance_2011 - 1
        #print("while : " + str(row[1][str(year)]))
        indicator = row[1][str(year)]
    print("ligne : " + str(num_row) + ", année : " + str(year) + ", valeur : " + str(indicator))
    
)

This output the first non-NaN value from 2011 to 2009, its value and column year.

But this does not add a new column to my dataset nor help me with the years from 2011 to 2014

Anyone here knows how to solve this? I want the closest non Nan from year 2011 for each row and add it to a new column :) Many Thanks!

Solution

Update

Reorder your columns: 2011, 2010, 2012, 2009, 2013, 2014:

idx = np.argsort(abs(pd.RangeIndex(df.shape[1]) - df.columns.get_loc('2011')))
df['value'] = df.iloc[:, idx].bfill(axis=1)['2011']
print(df)

# Output
   2009  2010  2011  2012  2013  2014  value
0   1.0   2.0   3.0   4.0   5.0   6.0    3.0
1   1.0   NaN   NaN   4.0   5.0   6.0    4.0
2   1.0   2.0   NaN   NaN   5.0   6.0    2.0
3   1.0   NaN   NaN   NaN   5.0   6.0    1.0
4   NaN   NaN   NaN   NaN   5.0   6.0    5.0

Old answer

IIUC:

df['value'] = df.loc[:, '2011':].bfill(axis=1)['2011']
print(df)

# Output
   2009  2010  2011  2012  2013  2014  value
0   1.0   2.0   3.0   4.0   5.0   6.0    3.0
1   1.0   2.0   NaN   4.0   5.0   6.0    4.0
2   1.0   2.0   NaN   NaN   5.0   6.0    5.0
3   1.0   2.0   NaN   NaN   NaN   6.0    6.0
4   1.0   2.0   NaN   NaN   NaN   NaN    NaN

Or more straightforward: df.bfill(axis=1)['2011']

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, April 19, 2022

[FIXED] Adding new column with first non Nan for each row closest to a chosen column from a dataset Python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels