Issue
in pandas or numpy's I want something like following: True & NaN == True, False & False == Fase, NaN & NaN == NaN
What is the most efficient way to do this? so far I have to do it as:
(a.fillna(True) & b.fillna(True)).where(~(a.isna() & b.isna()), None)
example:
from itertools import product
a = pd.DataFrame((product([True, False, None], [True, False, None])))
display(a)
display((a[0].fillna(True) & a[1].fillna(True)).where(~(a[0].isna() & a[1].isna()), None))
out put is:
0 1
0 True True
1 True False
2 True None
3 False True
4 False False
5 False None
6 None True
7 None False
8 None None
0 True
1 False
2 True
3 False
4 False
5 False
6 True
7 False
8 None
dtype: object
I have 2 cases: A. most row has NaN and B. only a few row has NaN I wonder what is the best way to do this in these 2 cases respectively
performance
b = a.sample(int(1e5), weights=[1,1,1,1,1,1,1,1,0.01], ignore_index=True, replace=True)
c = a.sample(int(1e5), weights=[1,1,1,1,1,1,1,1,80], ignore_index=True, replace=True)
display(b.isna().all(axis="columns").sum())
# 117 full NaN row
display(c.isna().all(axis="columns").sum())
# 90879 full NaN rows
import timeit
timeit.timeit(lambda: b.all(1).mask(b.isna().all(1)), number=100)
# 2.4s
timeit.timeit(lambda: c.all(1).mask(c.isna().all(1)), number=100)
# 1.6s
timeit.timeit(lambda: b.stack().groupby(level=0).all().reindex(b.index), number=100)
#3.3s
timeit.timeit(lambda: c.stack().groupby(level=0).all().reindex(c.index), number=100)
#0.9s
So yes as expected, the stack method, first drop all nan before compute, thus it is way faster for most NaN situation.
Solution
Use mask
:
df.all(1).mask(df.isna().all(1))
0 True
1 False
2 True
3 False
4 False
5 False
6 True
7 False
8 NaN
dtype: object
another way is to use stack
:
df.stack().groupby(level=0).all().reindex(df.index)
0 True
1 False
2 True
3 False
4 False
5 False
6 True
7 False
8 NaN
dtype: object
Answered By - Onyambu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.