Issue
I'm following the answer to this question and this scikit-learn tutorial to remove artifacts from an EEG signal. They seem simple enough, and I'm surely missing something obvious here.
The components extracted don't have the same length as my signal. I have 88 channels of several hours of recordings, so the shape of my signal matrix is (88, 8088516). Yet the output of ICA is (88, 88). In addition to being so short, each component seems to capture very large, noisy-looking deflections (so out of 88 components only a couple actually look like signal, the rest look like noise). I also would have expected only a few components to look noisy. I suspect I'm doing something wrong here?
The matrix of (channels x samples) has shape (88, 8088516).
Sample code (just using a random matrix for minimum working purposes):
import numpy as np
from sklearn.decomposition import FastICA
import matplotlib.pyplot as plt
samples_matrix = np.random.random((88, 8088516))
# Compute ICA
ica = FastICA(n_components=samples_matrix.shape[0]) # Extracting as many components as there are channels, i.e. 88
components = ica.fit_transform(samples_matrix) # Reconstruct signals
A_ = ica.mixing_ # Get estimated mixing matrix
The shape of components is (88, 88). One plotted looks like this:
plt.plot(components[1])
I would have expected that the components be time series of the same length as my original, as shown in the answer to this question. I'm really not sure how to move forward with component removal and signal reconstruction at this point.
Solution
You need to run the fit_transform
on the transpose of your samples_matrix
instead of the samples_matrix
itself (so provide a 8088516 x 88 matrix instead of an 88x8088516 to the method).
ica.fit_transform(samples_matrix.transpose())
or more simply
ica.fit_transform(samples_matrix.T)
which will give you an 8088516 x 88 set of signals (88 components, each of a signal as long as the original) for plotting. As I mentioned below in my comment, due to the large matrix inversions, etc., my setup recommended no more than 64 components.
To back this suggestion up, I'm looking at your tutorial, they set up their toy problem as follows:
n_samples = 2000
time = np.linspace(0, 8, n_samples)
s1 = np.sin(2 * time) # Signal 1 : sinusoidal signal
s2 = np.sign(np.sin(3 * time)) # Signal 2 : square signal
s3 = signal.sawtooth(2 * np.pi * time) # Signal 3: saw tooth signal
S = np.c_[s1, s2, s3]
S += 0.2 * np.random.normal(size=S.shape) # Add noise
S /= S.std(axis=0) # Standardize data
# Mix data
A = np.array([[1, 1, 1], [0.5, 2, 1.0], [1.5, 1.0, 2.0]]) # Mixing matrix
X = np.dot(S, A.T) # Generate observations
which gives an X.shape
of (2000,3)
to separate the 3 components, indicating that this is the preferred format of the matrix for the FastICA
method.
Answered By - jonsca
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.