Issue
I have defined a class where its __ge__
method returns an instance of itself, and whose __bool__
method is not allowed to be invoked (similar to a Pandas Series
).
Why is X.__bool__
invoked during np.int8(0) <= x
, but not for any of the other examples? Who is invoking it? I have read the Data Model docs but I haven’t found my answer there.
import numpy as np
import pandas as pd
class X:
def __bool__(self):
print(f"{self}.__bool__")
assert False
def __ge__(self, other):
print(f"{self}.__ge__")
return X()
x = X()
np.int8(0) <= x
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D90>.__bool__
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "<stdin>", line 4, in __bool__
# AssertionError
0 <= x
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5DF0>
x >= np.int8(0)
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D30>
pd_ge = pd.Series.__ge__
def ge_wrapper(self, other):
print("pd.Series.__ge__")
return pd_ge(self, other)
pd.Series.__ge__ = ge_wrapper
pd_bool = pd.Series.__bool__
def bool_wrapper(self):
print("pd.Series.__bool__")
return pd_bool(self)
pd.Series.__bool__ = bool_wrapper
np.int8(0) <= pd.Series([1,2,3])
# Console output:
# pd.Series.__ge__
# 0 True
# 1 True
# 2 True
# dtype: bool
Solution
TL;DR
X.__array_priority__ = 1000
The biggest hint is that it works with a pd.Series
.
First I tried having X
inherit from pd.Series
. This worked (i.e. __bool__
no longer called).
To determine whether NumPy is using an isinstance
check or duck-typing approach, I removed the explicit inheritance and added (based on this answer):
@property
def __class__(self):
return pd.Series
The operation no longer worked (i.e. __bool__
was called).
So now I think we can conclude NumPy is using a duck-typing approach. So I checked to see what attributes are being accessed on X
.
I added the following to X
:
def __getattribute__(self, item):
print("getattr", item)
return object.__getattribute__(self, item)
Again instantiating X
as x
, and invoking np.int8(0) <= x
, we get:
getattr __array_priority__
getattr __array_priority__
getattr __array_priority__
getattr __array_struct__
getattr __array_interface__
getattr __array__
getattr __array_prepare__
<__main__.X object at 0x000002022AB5DBE0>.__ge__
<__main__.X object at 0x000002021A73BE50>.__bool__
getattr __array_struct__
getattr __array_interface__
getattr __array__
Traceback (most recent call last):
File "<stdin>", line 32, in <module>
np.int8(0) <= x
File "<stdin>", line 21, in __bool__
assert False
AssertionError
Ah-ha! What is __array_priority__
? Who cares, really. With a little digging, all we need to know is that NDFrame
(from which pd.Series
inherits) sets this value as 1000
.
If we add X.__array_priority__ = 1000
, it works! __bool__
is no longer called.
What made this so difficult (I believe) is that the NumPy code didn't show up in the call stack because it is written in C. I could investigate further if I tried out the suggestion here.
Answered By - Mike R
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.