Issue
I have one df :
dfs = pd.read_csv(StringIO("""
datetime ID C_1 C_2 C_3 C_4 C_5 C_6
"18/06/2023 3:51:52" 136 101 2028 61 4 3 18 <-- row 1
"18/06/2023 3:51:53" 24 101 2029 65 0 0 0 <-- row 2
"18/06/2023 3:51:54" 136 102 2045 66 2 3 4 <-- row 3
"18/06/2023 3:51:55" 0 101 2022 89 0 0 0 <-- row 4
"18/06/2023 3:51:33" 136 101 2222 77 0 0 0 <-- row 5
"18/06/2023 3:51:56" 24 102 2022 89 0 0 0 <-- row 6
"18/06/2023 3:51:49" 136 101 2024 90 0 0 0 <-- row 7
"18/06/2023 3:51:57" 24 101 2026 87 0 1 8 <-- row 8
"18/06/2023 3:51:58" 0 102 2045 44 43 42 41 <-- row 9
"18/06/2023 3:51:59" 24 102 2043 33 0 1 8 <-- row 10
"18/06/2023 3:52:88" 136 101 3333 99 0 1 87 <-- row 11
"""), sep="\s+")
Is there a way to read previous and next values of the same column using concat function along with renaming the output dataframe column name.
I am trying below code but not sure how to rename the column-
m = (dfs['ID'].eq('0'))
m1 = dfs['ID'].isin(['0', '24'])
m2 = dfs['ID'].isin(['0', '136'])
cols = ['C_1']
tmp = dfs.mask(m).fillna({'C_1': dfs['C_1']})
out = pd.concat([tmp.loc[m1].groupby(dfs['C_1']).ffill().loc[m, cols+['datetime']],
tmp.loc[m2].groupby(dfs['C_1']).ffill().loc[m, cols+['C_2', 'C_3']],
tmp.loc[m1].groupby(dfs['C_1']).bfill().loc[m, cols+['datetime']],
tmp.loc[m2].groupby(dfs['C_1']).bfill().loc[m, cols+['C_2', 'C_3']],
]).groupby(level=0).first()
In above code first I am checking condition 'ID=0', i.e. row 4 & row 9(for both matches read the C_1 value because we need to read the below column values for same C_1 which is 101 & 102) -
get first (from row 4 & 9 for same C_1 i.e. 101 & 102) previous datetime column value where ID=24, and first previous 'C_2', 'C_3' values where ID=136.
get first (from row 4 & 9 for same C_1 i.e. 101 & 102) next datetime column value where ID=24, and first next 'C_2' and 'C_3' values where ID=136.
Output -
C_1 datetime_prev C_2_prev C_3_prev datetime_next C_2_next C_3_next
101 18/06/2023 3:51:53 2028 61 18/06/2023 3:51:57 2222 77
102 18/06/2023 3:51:56 2045 66 18/06/2023 3:51:59 3333 99
Solution
Ok, I'm giving a shot:
def group_fn(dfs):
mask_0 = dfs["ID"].eq(0)
mask_24 = dfs["ID"].eq(24)
mask_136 = dfs["ID"].eq(136)
dfs["prev_datetime"] = dfs.loc[mask_24, "datetime"]
dfs["prev_datetime"] = dfs["prev_datetime"].ffill()
dfs["next_datetime"] = dfs.loc[mask_24, "datetime"]
dfs["next_datetime"] = dfs["next_datetime"].bfill()
dfs[["prev_C_2", "prev_C_3"]] = dfs.loc[mask_136, ["C_2", "C_3"]]
dfs[["prev_C_2", "prev_C_3"]] = dfs[["prev_C_2", "prev_C_3"]].ffill()
dfs[["next_C_2", "next_C_3"]] = dfs.loc[mask_136, ["C_2", "C_3"]]
dfs[["next_C_2", "next_C_3"]] = dfs[["next_C_2", "next_C_3"]].bfill()
return dfs.loc[
mask_0,
[
"C_1",
"prev_datetime",
"next_datetime",
"prev_C_2",
"prev_C_3",
"next_C_2",
"next_C_3",
],
]
out = dfs.groupby("C_1", group_keys=False).apply(group_fn)
print(out)
Prints:
C_1 prev_datetime next_datetime prev_C_2 prev_C_3 next_C_2 next_C_3
3 101 18/06/2023 3:51:54 18/06/2023 3:51:57 2028.0 61.0 2222.0 77.0
8 102 18/06/2023 3:51:56 18/06/2023 3:51:59 2045.0 66.0 NaN NaN
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.