Issue
Trying to calculate cosine similarity of a pandas dataframe column. No problems with calculating with small dataset (e.g., 100 samples). Errors occur when dataset increases size to 190k + rows. Is there an alternative way to calculate this?
No error message comes up, but my kernel keeps dying.
from sklearn.metrics.pairwise import cosine_similarity
sentence_embeddings=np.array(df['summary_tokens'].tolist(), dtype='float32')
similarity = cosine_similarity(sentence_embeddings)
Solution
Solution was found after calculating similarity on smaller np arrays!
Answered By - Brian Phelps
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.