Sunday, February 6, 2022

[FIXED] How do I conditionally select rows in a Pandas data frame by

February 06, 2022 dataframe, filter, numpy, pandas, python No comments

Issue

I have the following Pandas dataframe (showing top ten rows):

index_x time_x  total_def_x index_y time_y  total_def_y event_time
0   2   2005.25394  15.72761    3   2005.25667  8.66223 2005.254962
1   4   2005.25941  11.31783    5   2005.26215  2.79943 2005.260101
2   11  2005.27858  8.74810    12   2005.28131  8.50871 2005.279085
3   18  2005.29774  6.31637    19   2005.30048  10.0420 2005.297804
4   52  2005.39083  0.18209    53   2005.39357  4.42270 2005.393209
5   65  2005.42642  2.71002    66   2005.42916  2.61663 2005.428290
6   106 2005.53867 -0.86598   107   2005.54141  0.26263 2005.539240
7   173 2005.72211  7.91387   174   2005.72485 -4.00652 2005.724622
8   201 2005.79877  4.09495   202   2005.80151  8.35356 2005.800502
9   217 2005.84257  6.63870   218   2005.84531 -1.81069 2005.843362
...

What I would like to do is select the times (time_x or time_y) and corresponding deformation values (total_def_x or total_def_y) for which the times are closest to event_time and place the values in a data frame. The code I have written thus far to achieve this is as follows:

nearest_df = pd.DataFrame(columns=["time", "total_def"])

for et in new_df["event_time"]:

    if abs(et - new_df["time_x"].values) < abs(et - new_df["time_y"].values):

        nearest_df.append(new_df["time_x", "total_def_x"])

    else:
        nearest_df.append(new_df["time_y", "total_def_y"])

However, every attempt I try to rewrite this returns this error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

When I modified the code like this if (abs(et - new_df['time_x'].values) < abs(et - new_df['time_y'].values)).all():, I get this error:

KeyError: ('time_x', 'total_def_x')

An example of the expected output is a data frame (nearest_df) like this because the smaller of the differences of time_x and time_y from event_time would be selected along with their respective deformations(total_def_x or y):

time        total_def
2005.25667  8.66223
2005.25941  11.31783
2005.27858  8.74810

Any help with this would be greatly appreciated.

Solution

You could try this:

# Create temporary columns
df["dist_x"] = (df["event_time"] - df["time_x"]).abs()
df["dist_y"] = (df["event_time"] - df["time_y"]).abs()

# Select proper rows
df_x = df.loc[df["dist_x"] < df["dist_y"], ["time_x", "total_def_x"]]
df_y = df.loc[df["dist_x"] >= df["dist_y"], ["time_y", "total_def_y"]]

# Rename and append results
df_x.columns = df_y.columns = ["time", "total_def"]
new_df = pd.concat(objs=[df_x, df_y]).sort_index()

print(new_df)
# Outputs
         time  total_def
0  2005.25394   15.72761
1  2005.25941   11.31783
2  2005.27858    8.74810
3  2005.29774    6.31637
4  2005.39357    4.42270
5  2005.42916    2.61663
6  2005.53867   -0.86598
7  2005.72485   -4.00652
8  2005.80151    8.35356
9  2005.84257    6.63870

Answered By - Laurent

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, February 6, 2022

[FIXED] How do I conditionally select rows in a Pandas data frame by

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels