Issue
Seeking performance in python applications using pandas/numpy often benefits from the use of the pandas/numpy implemented methods other than own implemented code such as through looping. This might be a bad introduction to the question I have, but in the following screenshot (if I hadn't tested it) I was expecting the versions using the series' methods to run faster than the python builtins. Since that's not the case, it means I built a false intuition on this example, but I could not yet find the reason for this. So the question is, why in this case the use of the python builtins has higher performance than the methods applied on the series (am I missing something else?)?
Solution
Pandas has its own functions which are way different than Python's built in functions, therefore if you call Series.max()
you are in fact calling nanops._nanminmax()
which is added via the IndexOpsMixin
instead of builtins.max()
Each behave differently, thus have different performance times.
Similarly for the rest of the methods. If you are curious, check the source code for Series
class and classes it inherits from for the exact differences between builtin functions and Pandas' implementation.
Answered By - Peter Badida
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.