Issue
In Python, I have a pandas dataframe. I want to filter for one value of column A
.
I am looking for the row, where column A
is the highest value that is smaller than '5' (so if column A
does have values '1', '2', '4', '7', it should be '4'). Another condition exists, too.
The following statement does not work.
How do I have to change it with regards to the maximum condition, so that it is working?
df_new = df[(df['some_other_column'] < XYZ) & max(df['A'] <= '5')]
Solution
Use np.searchsorted
-
df
x
0 1
1 2
2 4
3 7
df.iloc[(np.searchsorted(df.x.values, 5) - 1).clip(0)]
x
2 4
Timings
df = pd.DataFrame({'x' : np.arange(100000)})
%%timeit
x = df.x
g = x[x <= 12345].max()
df[x == g]
1000 loops, best of 3: 1.27 ms per loop
%timeit df.iloc[(np.searchsorted(df.x.values, 12345) - 1).clip(0)]
10000 loops, best of 3: 139 µs per loop
There's no comparison, using searchsorted
is much faster.
Answered By - cs95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.