Saturday, November 25, 2023

[FIXED] how to let logical and operator (&) treat the NaN value depends on the other side's value?

November 25, 2023 numpy, pandas, python No comments

Issue

in pandas or numpy's I want something like following: True & NaN == True, False & False == Fase, NaN & NaN == NaN

What is the most efficient way to do this? so far I have to do it as:

(a.fillna(True) & b.fillna(True)).where(~(a.isna() & b.isna()), None)

example:

from itertools import product
a = pd.DataFrame((product([True, False, None], [True, False, None])))
display(a)
display((a[0].fillna(True) & a[1].fillna(True)).where(~(a[0].isna() & a[1].isna()), None))

out put is:

    0   1
0   True    True
1   True    False
2   True    None
3   False   True
4   False   False
5   False   None
6   None    True
7   None    False
8   None    None

0     True
1    False
2     True
3    False
4    False
5    False
6     True
7    False
8     None
dtype: object

I have 2 cases: A. most row has NaN and B. only a few row has NaN I wonder what is the best way to do this in these 2 cases respectively

performance

b = a.sample(int(1e5), weights=[1,1,1,1,1,1,1,1,0.01], ignore_index=True, replace=True)
c = a.sample(int(1e5), weights=[1,1,1,1,1,1,1,1,80], ignore_index=True, replace=True)
display(b.isna().all(axis="columns").sum())
# 117 full NaN row
display(c.isna().all(axis="columns").sum())
# 90879 full NaN rows

import timeit
timeit.timeit(lambda: b.all(1).mask(b.isna().all(1)), number=100)
# 2.4s
timeit.timeit(lambda: c.all(1).mask(c.isna().all(1)), number=100)
# 1.6s
timeit.timeit(lambda: b.stack().groupby(level=0).all().reindex(b.index), number=100)
#3.3s
timeit.timeit(lambda: c.stack().groupby(level=0).all().reindex(c.index), number=100)
#0.9s

So yes as expected, the stack method, first drop all nan before compute, thus it is way faster for most NaN situation.

Solution

Use mask:

df.all(1).mask(df.isna().all(1))

0     True
1    False
2     True
3    False
4    False
5    False
6     True
7    False
8      NaN
dtype: object

another way is to use stack:

df.stack().groupby(level=0).all().reindex(df.index)

0     True
1    False
2     True
3    False
4    False
5    False
6     True
7    False
8      NaN
dtype: object

Answered By - Onyambu

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 25, 2023

[FIXED] how to let logical and operator (&) treat the NaN value depends on the other side's value?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels