Issue
I have the following Pandas dataframe (showing top ten rows):
index_x time_x total_def_x index_y time_y total_def_y event_time
0 2 2005.25394 15.72761 3 2005.25667 8.66223 2005.254962
1 4 2005.25941 11.31783 5 2005.26215 2.79943 2005.260101
2 11 2005.27858 8.74810 12 2005.28131 8.50871 2005.279085
3 18 2005.29774 6.31637 19 2005.30048 10.0420 2005.297804
4 52 2005.39083 0.18209 53 2005.39357 4.42270 2005.393209
5 65 2005.42642 2.71002 66 2005.42916 2.61663 2005.428290
6 106 2005.53867 -0.86598 107 2005.54141 0.26263 2005.539240
7 173 2005.72211 7.91387 174 2005.72485 -4.00652 2005.724622
8 201 2005.79877 4.09495 202 2005.80151 8.35356 2005.800502
9 217 2005.84257 6.63870 218 2005.84531 -1.81069 2005.843362
...
What I would like to do is select the times (time_x
or time_y
) and corresponding deformation values (total_def_x
or total_def_y
) for which the times are closest to event_time
and place the values in a data frame. The code I have written thus far to achieve this is as follows:
nearest_df = pd.DataFrame(columns=["time", "total_def"])
for et in new_df["event_time"]:
if abs(et - new_df["time_x"].values) < abs(et - new_df["time_y"].values):
nearest_df.append(new_df["time_x", "total_def_x"])
else:
nearest_df.append(new_df["time_y", "total_def_y"])
However, every attempt I try to rewrite this returns this error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
When I modified the code like this if (abs(et - new_df['time_x'].values) < abs(et - new_df['time_y'].values)).all():
, I get this error:
KeyError: ('time_x', 'total_def_x')
An example of the expected output is a data frame (nearest_df) like this because the smaller of the differences of time_x and time_y from event_time would be selected along with their respective deformations(total_def_x or y):
time total_def
2005.25667 8.66223
2005.25941 11.31783
2005.27858 8.74810
Any help with this would be greatly appreciated.
Solution
You could try this:
# Create temporary columns
df["dist_x"] = (df["event_time"] - df["time_x"]).abs()
df["dist_y"] = (df["event_time"] - df["time_y"]).abs()
# Select proper rows
df_x = df.loc[df["dist_x"] < df["dist_y"], ["time_x", "total_def_x"]]
df_y = df.loc[df["dist_x"] >= df["dist_y"], ["time_y", "total_def_y"]]
# Rename and append results
df_x.columns = df_y.columns = ["time", "total_def"]
new_df = pd.concat(objs=[df_x, df_y]).sort_index()
print(new_df)
# Outputs
time total_def
0 2005.25394 15.72761
1 2005.25941 11.31783
2 2005.27858 8.74810
3 2005.29774 6.31637
4 2005.39357 4.42270
5 2005.42916 2.61663
6 2005.53867 -0.86598
7 2005.72485 -4.00652
8 2005.80151 8.35356
9 2005.84257 6.63870
Answered By - Laurent
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.