Issue
Background:
I'm doing research using EigenFaces with Python. I need to extract any principal component of multiple images, and use those selected principal components to do feature reduction and face identification with a training dataset of images.
Problem:
In sklearn
, the PCA function allows specifying only n_components
, which would take the first n
number of principal components. But I need to be able to select any principal component individually, because I need to try using a random combination of multiple principal components, to do the feature reduction and Eigen faces computation. That's part of the research requirement.
I noticed some bespoke implementations here and here, but I'd prefer a more standard library, to avoid errors in the results. Also noticed another PCA library which does not seem to offer low-level functions to obtain more details that I need.
Is there any reliable way to get the individual principal components using Python?
Solution
After you fit a PCA
model, you can access the fitted components in its components_
attribute, which I think it what you're after. E.g:
from sklearn.decomposition import PCA
#Test data
import numpy as np
data = np.random.uniform(size=(100, 10))
#Fit the PCA. The default n_components will keep all components
#Equivalent to PCA(n_components=1.0)
pca = PCA().fit(data)
pca.components_.shape #will be num. of components x n_features
#Access individual components
pca0 = pca.components_[0, :]
pca1 = pca.components_[1, :]
#etc...
That's how to access the components after fitting. Before fitting, you can specify how many components the model should keep, described below. After fitting the model, you'll only be able to access as many components as you told it to keep.
PCA(n_components=1.0)
will keep all of the components, and is equivalent to PCA()
because the default behaviour will return all components.
If you want to keep the components that explain 85% of the variance, use PCA(n_components=0.85)
. If you know exactly how many individual components you want, you can specify an integer instead: PCA(n_components=10)
.
So if n_components
is a fraction (float
type) then it will keep the components explaining that proportion of the variance. If you specify an integer, it'll keep that many individual components. n_components=1
will only keep the first component, whereas n_components=1.0
is interpreted as a proportion (100% in this case).
Answered By - some3128
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.