Wednesday, November 9, 2022

[FIXED] How to do time series clustering with python

November 09, 2022 cluster-analysis, k-means, python, scikit-learn No comments

Issue

I want to group 10 stores into 6 clusters but I have these data in multiple years.

I tried KMeans from sklearn.cluster but I am under impression that it's good for one period only. I came across K-means and Dynamic Time Wrapping https://tslearn.readthedocs.io/en/stable/user_guide/clustering.html and tested on it, but I am having hard time understanding how should I restructure the data and/or the steps required to do prior to running the code.

So my questions are:

By using KMeans from sklearn.cluster, how can I/Is there a way to apply clustering to data series data
By using TimeSeriesKMeans from tslearn.clustering, how should I/what would be the correct data structure before applying this algorithm?

This is the dataframe - I have store 1 to 10 for the year of 2021 and 2022. The goal is group these 10 stores into 6 clusters based on all period. In read data, I have more than 150 stores for 20 years.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from tslearn.clustering import TimeSeriesKMeans

df_full =  pd.DataFrame({'year':[2021,2021,2021,2021,2021,2021,2021,2021,2021,2021,
                            2022,2022,2022,2022,2022,2022,2022,2022,2022,2022],
                     'store':['store1','store2','store3','store4','store5','store6','store7','store8','store9','store10',
                    'store1','store2','store3','store4','store5','store6','store7','store8','store9','store10'],
                'points': [18, 33, 19, 14, 14, 11, 20, 28, 30, 31,
                          35, 33, 29, 25, 25, 27, 29, 30, 19, 23],
               'assists': [3, 3, 4, 5, 4, 7, 8, 7, 6, 9, 12, 14,
                           5, 9, 4, 3, 4, 12, 15, 11],
               'rebounds': [15, 14, 14, 10, 8, 14, 13, 9, 5, 4,
                            11, 6, 5, 5, 3, 8, 12, 7, 6, 5]})

Below I tried to use sklearn Kmeans to group 10 stores into 6 clusters for the year of 2021, but I need to apply the clustering to both 2021 and 2022 data.

# For a single year
df = df_full[df_full['year']==2021].copy()

# Make year and store as index before applying cluster
df.set_index(['year','store'], inplace=True)

scaled_df = StandardScaler().fit_transform(df)

kmeans_kwargs = {
"init": "random",
"n_init": 1,
"random_state": 1}

#create list to hold SSE values for each k
sse = []
for k in range(2, 8):
kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
kmeans.fit(scaled_df)
sse.append(kmeans.inertia_)

#instantiate the k-means class, using optimal number of clusters
kmeans = KMeans(init="random", n_clusters=6 ,n_init=1, random_state=1)

#fit k-means algorithm to data
kmeans.fit(scaled_df)

#view cluster assignments for each observation
kmeans.labels_

df['cluster'] = kmeans.labels_
print(df)

And then I tried to use k-means and Dynamic Time Warping with tslearn. The result may not make sense because each store may be assigned to a different cluster in a different year. How should I restructure the data before applying this algorithm or what would be the pre-processing steps?

df_dtw = df_full.set_index(['year','store'])
model = TimeSeriesKMeans(n_clusters=6, metric="dtw",
                     max_iter=10, random_state=1)
model.fit(df_dtw)
df_dtw['cluster'] = model.labels_
print(df_dtw)

Solution

You can pivot your original dataframe, where you take store as an index, put points, assists and rebounds in column broken down by year, then run the cluster by using sklearn Kmeans. In this case, you still have one record (store) in a row, where columns show the value per year for points, assists and rebounds, respectively.

Answered By - user20013032

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, November 9, 2022

[FIXED] How to do time series clustering with python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels