Issue
- I want to cluster big data set (more than 1M records).
- I want to use
dbscan
orhdbscan
algorithms for this clustering task.
When I try to use one of those algorithms, I'm getting memory error.
- Is there a way to fit big data set in parts ? (go with for loop and refit every 1000 records) ?
- If no, is there a better way to cluster big data set, without upgrading the machine memory ?
Solution
If the number of features in your dataset is not too much (below 20-25), you can consider using BIRCH. It's an iterative method that can be used for large datasets. In each iteration it builds a tree with only a small sample of data and put each instance into clusters.
Answered By - Benjamin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.