Issue
I am not sure how can I describe all the steps that I am taking but basically my question is simple: I use same code, same data from text file, gather some statistics about that data and then use umap for 2D reduction.
Is it normal to have different graphs when I plot the result?
I use scikit-learn, umap-learn, ggplot2.
The continuation of the problem is when I use hdbscan. Because every time I run the code, the plot is different, then cluster size and clusters become different and so on. I am wondering if this is something expected or not, basically.
Solution
Yes it is. Dimensions reduction algorithms like tSNE
and uMAP
are stochastic, so every time you run the clustering and values will be different. If you want to keep the same graph you need to set a common seed. You can achieve that in R by setting the seed (e.g. set.seed(123)
) before calling uMAP
(or set flag if the function allows that). np.random.seed(123)
should work in python scikit.
Answered By - fra
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.