Issue
I have this pandas Dataframe:
My goal is to perform some additions and subtractions based on column value conditions, and store the results inside a new column pl
.
This is the Dataframe I want to have:
The first non-NaN value will be necessarily in the entry
column.
First scenario:
I want that, if the next non-NaN value (after a non-NaN inside "entry" and then a non-NaN inside "tp1") is contained inside "tp2" column, then do this operation: (tp1 - entry) + (tp2 - entry)
Second scenario:
I want that, if the next non-NaN value (after entry) is contained inside the column "sl1" then do this operation: sl1 - entry.
Third scenario:
I want that, if the next non-NaN value (after entry) is contained inside the column "tp1" and there's a non-NaN value inside the column "sl2" then do this operation: tp1 - entry.
This is my code:
import pandas as pd
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27",
"2022-02-28", "2022-02-28","2022-02-28", "2022-02-28", "2022-02-01",
"2022-02-01", "2022-02-01", "2022-02-01"],
"entry" : ["NaN", "NaN", 1.2, "NaN", "NaN","NaN", 1.3, "NaN", "NaN", "NaN", 1.2, "NaN",
"NaN",],
"tp1" : ["NaN", "NaN", "NaN", 1.4, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN",
1.3, "NaN"],
"sl1" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", 1.15, "NaN", "NaN",
"NaN", "NaN"],
"tp2" : ["NaN", "NaN", "NaN", "NaN", 1.5, "NaN","NaN", "NaN", "NaN", "NaN", "NaN",
"NaN", "NaN"],
"sl2" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN","NaN", "NaN", "NaN", "NaN", "NaN",
"NaN", 1.2]}
df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
############## This is the way i'm trying to achive what i want:#########
#this code will only make tp1 - entry, or sl1 - entry, but it's wrong
#bacause it's made based on a dataframe without "sl2,tp2" consideration
group = df['date']
s1 = df['tp1'].fillna(df['sl1']).groupby(group).bfill()
s2 = df['entry'].groupby(group).bfill()
df.loc[~group.duplicated(), 'pl'] = s1-s2
I'm blocked here, I don't understand how to code the other conditions, Any ideas?
Edit The first value inside pl column is wrong, it should be 0.5. Not 0.20
Solution
you can take advatage of numpy ravel()
function to flatten the df without the date column:
import pandas as pd
import numpy as np
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27",
"2022-02-28", "2022-02-28","2022-02-28", "2022-02-28", "2022-02-01",
"2022-02-01", "2022-02-01", "2022-02-01"],
"entry" : ["NaN", "NaN", 1.2, "NaN", "NaN","NaN", 1.3, "NaN", "NaN", "NaN", 1.2, "NaN",
"NaN",],
"tp1" : ["NaN", "NaN", "NaN", 1.4, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN",
1.3, "NaN"],
"sl1" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", 1.15, "NaN", "NaN",
"NaN", "NaN"],
"tp2" : ["NaN", "NaN", "NaN", "NaN", 1.5, "NaN","NaN", "NaN", "NaN", "NaN", "NaN",
"NaN", "NaN"],
"sl2" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN","NaN", "NaN", "NaN", "NaN", "NaN",
"NaN", 1.2]}
df = pd.DataFrame(tbl)
df = df.replace('NaN', np.nan)
df['date'] = pd.to_datetime(df['date'])
def transform(x):
arr = np.empty(x.shape[0])
arr[:] = np.nan
flatten = x.iloc[:, 1:6].values.ravel()
flatten = flatten[~np.isnan(flatten)][:2]
arr[0] = np.diff(flatten)[0]
return pd.DataFrame({"p": arr}, index=x.index)
p = df.groupby("date").apply(transform)
df['p'] = p
df
the resulting dataframe are:
date entry tp1 sl1 tp2 sl2 p
0 2022-02-27 NaN NaN NaN NaN NaN 0.20
1 2022-02-27 NaN NaN NaN NaN NaN NaN
2 2022-02-27 1.2 NaN NaN NaN NaN NaN
3 2022-02-27 NaN 1.4 NaN NaN NaN NaN
4 2022-02-27 NaN NaN NaN 1.5 NaN NaN
5 2022-02-28 NaN NaN NaN NaN NaN -0.15
6 2022-02-28 1.3 NaN NaN NaN NaN NaN
7 2022-02-28 NaN NaN NaN NaN NaN NaN
8 2022-02-28 NaN NaN 1.15NaN NaN NaN
9 2022-02-01 NaN NaN NaN NaN NaN 0.10
10 2022-02-01 1.2 NaN NaN NaN NaN NaN
11 2022-02-01 NaN 1.3 NaN NaN NaN NaN
12 2022-02-01 NaN NaN NaN NaN 1.2 NaN
Answered By - adir abargil
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.