Sunday, April 10, 2022

[FIXED] How to plot the pricipal vectors of each variable after performing PCA?

April 10, 2022 pca, python, scikit-learn No comments

Issue

My question mainly comes from this post :https://stats.stackexchange.com/questions/53/pca-on-correlation-or-covariance

In the article, the author plotted the vector direction and length of each variable. Based on my understanding, after performing PCA. All we get are the eigenvectors and eigenvalues. For a dataset which has a dimension M x N, each eigenvalue should be a vector as 1 x N. So, my question is maybe the length of the vector is the eigenvalue, but how to find the direction of the vector for each variable mathematical? And what is the physical meaning of the length of the vector?

Also, if it is possible, can I do similar work with scikit PCA function in python?

Thanks!

Solution

This plot is called biplot and it is very useful to understand the PCA results. The length of the vectors it is just the values that each feature/variable has on each Principal Component aka PCA loadings.

Example:

These loadings as accessible through print(pca.components_). Using the Iris Dataset the loadings are:

  [[ 0.52106591, -0.26934744,  0.5804131 ,  0.56485654],
   [ 0.37741762,  0.92329566,  0.02449161,  0.06694199],
   [-0.71956635,  0.24438178,  0.14212637,  0.63427274],
   [-0.26128628,  0.12350962,  0.80144925, -0.52359713]])

Here, each row is one PC and each column corresponds to one variable/feature. So feature/variable 1, has a value 0.52106591 on the PC1 and 0.37741762 on the PC2. These are the values used to plot the vectors that you saw in the biplot. See below the coordinates of Var1. It's exactly those (above) values !!

Finally, to create this plot in python you can use this using sklearn:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

iris = datasets.load_iris()
X = iris.data
y = iris.target

#In general it is a good idea to scale the data
scaler = StandardScaler()
scaler.fit(X)
X=scaler.transform(X)

pca = PCA()
pca.fit(X,y)
x_new = pca.transform(X)   

def myplot(score,coeff,labels=None):
    xs = score[:,0]
    ys = score[:,1]
    n = coeff.shape[0]

    plt.scatter(xs ,ys, c = y) #without scaling
    for i in range(n):
        plt.arrow(0, 0, coeff[i,0], coeff[i,1],color = 'r',alpha = 0.5)
        if labels is None:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, "Var"+str(i+1), color = 'g', ha = 'center', va = 'center')
        else:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, labels[i], color = 'g', ha = 'center', va = 'center')

plt.xlabel("PC{}".format(1))
plt.ylabel("PC{}".format(2))
plt.grid()

#Call the function. 
myplot(x_new[:,0:2], pca.components_.T) 
plt.show()

See also this post: https://stackoverflow.com/a/50845697/5025009

and

https://towardsdatascience.com/pca-clearly-explained-how-when-why-to-use-it-and-feature-importance-a-guide-in-python-7c274582c37e?source=friends_link&sk=65bf5440e444c24aff192fedf9f8b64f

Answered By - seralouk

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, April 10, 2022

[FIXED] How to plot the pricipal vectors of each variable after performing PCA?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels