Issue
It seems that checking if np.nan is in a list after pulling the list from a pandas dataframe does not correctly return True as expected. I have an example below to demonstrate:
from numpy import nan
import pandas as pd
basic_list = [0.0, nan, 1.0, 2.0]
nan_in_list = (nan in basic_list)
print(f"Is nan in {basic_list}? {nan_in_list}")
df = pd.DataFrame({'test_list': basic_list})
pandas_list = df['test_list'].to_list()
nan_in_pandas_list = (nan in pandas_list)
print(f"Is nan in {pandas_list}? {nan_in_pandas_list}")
I would expect the output of this program to be:
Is nan in [0.0, nan, 1.0, 2.0]? True
Is nan in [0.0, nan, 1.0, 2.0]? True
But instead it is
Is nan in [0.0, nan, 1.0, 2.0]? True
Is nan in [0.0, nan, 1.0, 2.0]? False
What is the cause of this odd behavior or am I missing something?
Edit: Adding on to this, if I run the code:
for item in pandas_list:
print(type(item))
print(item)
it has the exact same output as if I were to swap pandas_list
with basic_list
. However pandas_list == basic_list
evaluates to False.
Solution
TL;DR
pandas
is using different nan
object than np.nan
and in
operator for list
checks if the object is the same.
The in
operator invokes __contains__
magic method of list, here is source code:
static int
list_contains(PyListObject *a, PyObject *el)
{
PyObject *item;
Py_ssize_t i;
int cmp;
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i) {
item = PyList_GET_ITEM(a, i);
Py_INCREF(item);
cmp = PyObject_RichCompareBool(item, el, Py_EQ);
Py_DECREF(item);
}
return cmp;
}
You see there is PyObject_RichCompareBool
called which states:
If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQ and 0 for Py_NE.
So:
basic_list = [0.0, nan, 1.0, 2.0]
for v in basic_list:
print(v == nan, v is nan)
print(nan in basic_list)
Prints:
False False
False True
False False
False False
True
And:
df = pd.DataFrame({"test_list": basic_list})
pandas_list = df["test_list"].to_list()
for v in pandas_list:
print(v == nan, v is nan)
print(nan in pandas_list)
Prints:
False False
False False
False False
False False
False
Evidently, pandas
is using different nan
object.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.