Issue
In Pandas, How can I check how sparse a DataFrame? Is there any function available, or I will need to write my own?
For now, I have this:
df = pd.DataFrame({'a':[1,0,1,1,3], 'b':[0,0,0,0,1], 'c':[4,0,0,0,0], 'd':[0,0,3,0,0]})
a b c d
0 1 0 4 0
1 0 0 0 0
2 1 0 0 3
3 1 0 0 0
4 3 1 0 0
sparsity = sum((df == 0).astype(int).sum())/df.size
Which divides the number of zeros by the total number of elements, in this example it's 0.65.
Wanted to know if there is any better way to do this. And if there is any function which gives more information about the sparsity (like NaNs, any other prominent number like -1).
Solution
One idea for your solution is convert to numpy array, compare and use mean
:
a = (df.to_numpy() == 0).mean()
print (a)
0.65
If want use Sparse
dtypes is possible use:
#convert each column to SparseArray
sparr = df.apply(pd.arrays.SparseArray)
print (sparr)
a b c d
0 1 0 4 0
1 0 0 0 0
2 1 0 0 3
3 1 0 0 0
4 3 1 0 0
print (sparr.dtypes)
a Sparse[int64, 0]
b Sparse[int64, 0]
c Sparse[int64, 0]
d Sparse[int64, 0]
dtype: object
print (sparr.sparse.density)
0.35
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.