Issue
I am trying to run a PCA analysis on ocean temperature data using sklearn. First I use StandardScaler to standardize the data, then I run the PCA and create the reconstructions. I can get code to work fine up until that point. However, I cannot figure out how to apply the inverse of the StandardScaler back to the PCA reconstructions so that they are back in the original space and I can compare the reconstructions to the original unstandardized data. I've copied a short excerpt of the code I'm using below, along with the error I receive below it. None of the potential fixes I've found online have actually worked.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
data = array[:,[1,4]] # data has dimensions [88 (depths) x 26 (instances)]
# pre processing the data
scal = StandardScaler()
data_t = scal.fit_transform(data)
# pca analysis
pca = PCA(n_components=2)
principalComponents_2 = pca.fit_transform(np.transpose(data_t)) #find the loadings.
PCAFit_2 = scal.inverse_transform(pca.inverse_transform(principalComponents_2)) #reconstruct the data and then apply the standardscaler inverse tranformation.
Error:
ValueError: operands could not be broadcast together with shapes (26,88) (26,) (26,88)
Solution
IIUC:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
X, _ = load_iris(return_X_y=True)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(2)
X_pca = pca.fit_transform(X_scaled)
X_orig = np.dot(X_pca, pca.components_)
X_orig_backscaled = scaler.inverse_transform(X_orig)
print(" Original:", X[0])
print(" Scaled:", X_scaled[0])
print(" PCA space:", X_pca[0])
print(" Original from PCA:", X_orig[0])
print("Original from PCA backscaled:", X_orig_backscaled[0])
Original: [5.1 3.5 1.4 0.2]
Scaled: [-0.90068117 1.01900435 -1.34022653 -1.3154443 ]
PCA space: [-2.26470281 0.4800266 ]
Original from PCA: [-0.99888895 1.05319838 -1.30270654 -1.24709825]
Original from PCA backscaled: [5.01894899 3.51485426 1.46601281 0.25192199]
Answered By - Sergey Bushmanov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.