Issue
I keep getting this error when importing top2vec.
TypeError Traceback (most recent call last)
Cell In [1], line 1
----> 1 from top2vec import Top2Vec
File ~\AppData\Roaming\Python\Python39\site-packages\top2vec\__init__.py:1
----> 1 from top2vec.Top2Vec import Top2Vec
3 __version__ = '1.0.27'
File ~\AppData\Roaming\Python\Python39\site-packages\top2vec\Top2Vec.py:12
10 from gensim.models.phrases import Phrases
11 import umap
---> 12 import hdbscan
13 from wordcloud import WordCloud
14 import matplotlib.pyplot as plt
File ~\AppData\Roaming\Python\Python39\site-packages\hdbscan\__init__.py:1
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
3 from .validity import validity_index
File ~\AppData\Roaming\Python\Python39\site-packages\hdbscan\hdbscan_.py:509
494 row_indices = np.where(np.isfinite(matrix).sum(axis=1) == matrix.shape[1])[0]
495 return row_indices
498 def hdbscan(
499 X,
500 min_cluster_size=5,
501 min_samples=None,
502 alpha=1.0,
503 cluster_selection_epsilon=0.0,
504 max_cluster_size=0,
505 metric="minkowski",
506 p=2,
507 leaf_size=40,
508 algorithm="best",
--> 509 memory=Memory(cachedir=None, verbose=0),
510 approx_min_span_tree=True,
511 gen_min_span_tree=False,
512 core_dist_n_jobs=4,
513 cluster_selection_method="eom",
514 allow_single_cluster=False,
515 match_reference_implementation=False,
516 **kwargs
517 ):
518 """Perform HDBSCAN clustering from a vector array or distance matrix.
519
520 Parameters
(...)
672 Density-based Cluster Selection. arxiv preprint 1911.02282.
673 """
674 if min_samples is None:
TypeError: __init__() got an unexpected keyword argument 'cachedir'
Python version: 3.9.7 (64-bit)
Have installed MSBuild
No errors when pip installing this package
Does anyone know a solution to this problem or experienced a similar problem?
Solution
It looks like you are using latest versions of hdbscan
and joblib
packages available on PyPI.
cachedir
was removed from joblib.Memory
in commit on 2 Feb 2022 as depreciated. The latest version on PyPi is 1.2.0 from Sep 16, 2022, i.e. it incorporate this change
The relevant part of hdbscan
source code on GitHub was updated on 16 Sept 2022. Unfortunately the latest hdbscan
release on PyPi is ver. 0.8.28 as of Feb 8, 2022 and still not updated. It still use memory=Memory(cachedir=None, verbose=0)
One possible solution is to force using joblib
version before cachedir
was removed - ver. 1.1.0 as of Oct 7, 2021. However note my edit below.
- UPDATE 29 Sept 2022:
There is open issue on hdbscan
repo.
Note there is vulnerability CVE-2022-21797 when using joblib < 1.2.0
Answered By - buran
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.