Issue
Hello I want to create a new column from a given dataset (that I call here "df") with the first non-Nan for each row and closest from a given column
For example, I have a data frame with the years 2009 2010 2011 2012 2013 2014. I want to find the first Non-Nan value for each row but starting from year 2011!
So here's the dataset with Nan and values:
import pandas as pd
import numpy as np
data = np.random.randn(6,6)
mask = np.random.choice([1, 0], data.shape, p=[.1, .9]).astype(bool)
data[mask] = np.nan
df = pd.DataFrame(data=data,columns=['2009','2010','2011','2012','2013','2014'])
df
I started to write the following function which gives the first non-NaN values from 2011 to 2009 for each row:
num_row = 0
for row in df.iterrows():
num_row = num_row+1
#print("for :" + str(row[1][str(2015)]))
indicator = float("nan")
distance_2011 = 0
year = 2011
while np.isnan(indicator) and year > 2009:
year = year - 1
distance_2011 = distance_2011 - 1
#print("while : " + str(row[1][str(year)]))
indicator = row[1][str(year)]
print("ligne : " + str(num_row) + ", année : " + str(year) + ", valeur : " + str(indicator))
)
This output the first non-NaN value from 2011 to 2009, its value and column year.
But this does not add a new column to my dataset nor help me with the years from 2011 to 2014
Anyone here knows how to solve this? I want the closest non Nan from year 2011 for each row and add it to a new column :) Many Thanks!
Solution
Update
Reorder your columns: 2011, 2010, 2012, 2009, 2013, 2014:
idx = np.argsort(abs(pd.RangeIndex(df.shape[1]) - df.columns.get_loc('2011')))
df['value'] = df.iloc[:, idx].bfill(axis=1)['2011']
print(df)
# Output
2009 2010 2011 2012 2013 2014 value
0 1.0 2.0 3.0 4.0 5.0 6.0 3.0
1 1.0 NaN NaN 4.0 5.0 6.0 4.0
2 1.0 2.0 NaN NaN 5.0 6.0 2.0
3 1.0 NaN NaN NaN 5.0 6.0 1.0
4 NaN NaN NaN NaN 5.0 6.0 5.0
Old answer
IIUC:
df['value'] = df.loc[:, '2011':].bfill(axis=1)['2011']
print(df)
# Output
2009 2010 2011 2012 2013 2014 value
0 1.0 2.0 3.0 4.0 5.0 6.0 3.0
1 1.0 2.0 NaN 4.0 5.0 6.0 4.0
2 1.0 2.0 NaN NaN 5.0 6.0 5.0
3 1.0 2.0 NaN NaN NaN 6.0 6.0
4 1.0 2.0 NaN NaN NaN NaN NaN
Or more straightforward: df.bfill(axis=1)['2011']
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.