Issue
I am trying to move dense matrix operations to be sparse. I was using numpy broadcasting to divide an array of shape (432,) to (591, 432) when they were dense, but how can do I this with sparse matrices?
<591x432 sparse matrix of type '<class 'numpy.int64'>'
with 3876 stored elements in Compressed Sparse Column format>
<1x432 sparse matrix of type '<class 'numpy.int64'>'
with 432 stored elements in COOrdinate format>
When I try with this dummy data below...
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
matrix = CountVectorizer().fit_transform(raw_documents=["test sentence.", "test sent 2.").T
max_w = np.max(matrix, axis=0)
matrix / max_w
I get ValueError: inconsistent shapes
. How can I divide these ?
Solution
If you really want to, you can divide by multiplying by the reciprocal.
import numpy as np
from scipy.sparse import csc_matrix, coo_matrix
A = csc_matrix([[3, 4], [5, 6]])
B = A.max(axis=0)
res = A.multiply(B.power(-1.))
ref = A/B.todense()
np.allclose(res.todense(), ref) # True
But in your case, there may not be a speed advantage compared to dividing by B.todense()
.
import numpy as np
from scipy.sparse import csc_matrix, coo_matrix
rng = np.random.default_rng(452349345693456)
# generate arrays like yours
shape = (591, 432)
nnz = 3876
A = rng.random(size=shape)
b = np.partition(A.ravel(), nnz)[nnz]
A[A >= b] = 0
A = csc_matrix(A)
assert A.nnz == nnz
B = A.max(axis=0)
# compare solutions
res = A.multiply(B.power(-1.))
ref = A/B.todense()
np.allclose(res.todense(), ref) # True
%timeit A.multiply(B.power(-1.))
# 1.3 ms ± 734 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit A/B.todense()
# 306 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Answered By - Matt Haberland
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.