Issue
Nonnegative matrix factorization is lauded for generating sparse basis sets. However, when I run sklearn.decomposition.NMF
the factors are not sparse. Older versions of NMF had a 'degree of sparseness' parameter beta
. Newer versions do not, but I want my basis matrix W
to actually be sparse. What can I do? (Code to reproduce problem is below).
I have toyed around with increasing various regularization parameters (e.g., alpha
), but am not getting anything very sparse (like in the paper by Lee and Seung (1999) when I apply it to the Olivetti faces dataset. They still basically end up looking like eigenfaces.
My CNM output (not very sparse):
Lee and Seung CNM paper output basis columns (looks sparse to me):
Code to reproduce my problem:
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import NMF
faces, _ = fetch_olivetti_faces(return_X_y=True)
# run nmf on the faces data set
num_nmf_components = 50
estimator = NMF(num_nmf_components,
init='nndsvd',
tol=5e-3,
max_iter=1000,
alpha_W=0.01,
l1_ratio=0)
H = estimator.fit_transform(faces)
W = estimator.components_
# plot the basis faces
n_row, n_col = 6, 4 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
plt.subplot(n_row, n_col, face_id+1)
plt.imshow(face.reshape(image_shape), cmap='gray')
plt.axis('off')
plt.tight_layout()
Is there some combinations of parameters with sklearn.decomposition.NMF()
that lets you dial in sparseness? I have played with different combinations of alpha_W
and l1_ratio
and even tweaked the number of components. I still end up with eigen-face looking things.
Solution
There are a couple of things going on here that we need to disentangle. First, what happened to sparseness
? Second, how do you generate sparse faces using the sklearn function?
Where did the sparseness go?
The sklearn.decomposition.NMF
function went through a major change from versions 0.16
to 0.19
. There are multiple ways to implement nonnetative matrix factorization.
Before 0.16
, NMF used projected gradient descent as described in Hoyer 2004, and included a sparseness parameter (which as OP noted let you adjust the sparseness of the resulting W basis).
Because of various limitations outlined in this extremely thorough issue at sklearn's github repo, it was decided to move on to two additional methods:
- Release 0.16: coordinate descent (PR here which was in version 0.16)
- Release 0.19: multiplicative update (PR here which was in version 0.19)
This was a pretty major undertaking, and the upshot is we now have a great deal more freedom in terms of error functions, initialization, and regularization. You can read about that at the issue. The objective function is now:
You can read more details/explanation at the docs, but to note a few things relevant to the question:
- The
solver
param which takes inmu
for multiplicative update orcd
for coordinate descent. The older projected gradient descent method (with the sparseness parameter) is deprecated. - As you can see in the objective function, there are weights for regularizing W and for H (
alpha_W
andalpha_H
respectively). In theory if you want to reign in W, you should increasealpha_W
. - You can regularize using the L1 or L2 norm, and the ratio between the two is set by
l1_ratio
. The larger you makel1_ratio
, the more you weight the L1 norm over L2 norm. Note: the L1 norm tends to generate more sparse parameter sets, while the L2 norm tends to generate small parameter sets, so in theory if you want sparseness, then set yourl1_ratio
high.
How to generate sparse faces?
The examination of the objective function suggests what to do. Crank up alpha_W
and l1_ratio
. But also note that the Lee and Seung paper used multiplicative update (mu
), so if you wanted to reproduce their results, I would recommend setting solver
to mu
, setting alpha_W
high, and l1_ratio
high, and see what happens.
In the OP's question, they implicitly used the cd
solver (which is the default), and set alpha_W=0.01
and l1_ratio=0
, which I wouldn't necessarily expect to create a sparse basis set.
But things are actually not that simple. I tried some initial runs of coordinate descent with high l1_ratio
and alpha_W
and found very low sparseness. So to quantify some of this, I did a grid search, and used a sparseness measure.
Quantifying sparseness is itself a cottage industry (e.g., see this post, and the paper cited there). I used Hoyer's measure of sparsity, adapted from the one used in the nimfa package:
def sparseness_hoyer(x):
"""
The sparseness of array x is a real number in [0, 1], where sparser array
has value closer to 1. Sparseness is 1 iff the vector contains a single
nonzero component and is equal to 0 iff all components of the vector are
the same
modified from Hoyer 2004: [sqrt(n)-L1/L2]/[sqrt(n)-1]
adapted from nimfa package: https://nimfa.biolab.si/
"""
from math import sqrt # faster than numpy sqrt
eps = np.finfo(x.dtype).eps if 'int' not in str(x.dtype) else 1e-9
n = x.size
# measure is meant for nmf: things get weird for negative values
if np.min(x) < 0:
x -= np.min(x)
# patch for array of zeros
if np.allclose(x, np.zeros(x.shape), atol=1e-6):
return 0.0
L1 = abs(x).sum()
L2 = sqrt(np.multiply(x, x).sum())
sparseness_num = sqrt(n) - (L1 + eps) / (L2 + eps)
sparseness_den = sqrt(n) - 1
return sparseness_num / sparseness_den
What this measures actually quantifies is sort of complicated, but roughly a sparse image is one with only a few pixels active, a non-sparse image has lots of pixels active. If we run PCA on the faces example from the OP, we can see the sparseness values is low around 0.04 for the eigenfaces:
Sparsifying using coordinate descent?
If we run NMF using the params used in the OP (using coordinate descent, with low W_alpha
and l1_ratio
, except with 200 components), the sparseness values are again low:
If you look at the histogram of sparseness values this is verified:
Different, but not super impressive, compared with PCA.
I next did a grid search through W_alpha
and l1_ratio
space, varying them between 0 and 1 (at 0.1 step increments). I found that sparsity was not maximized when they were 1. Surprisingly, contrary to theoretical expectations, I found that sparsity was only high when l1_ratio
was 0 and it dropped of precipitously above 0. And within this slice of parameters, sparsity was maximized when alpha_W
was 0.9:
Intuitively, this is a huge improvement. There is still a lot of variation in the distribution of sparseness values, but they are much higher:
However, maybe in order to replicate the Lee and Seung results, and better control sparseness, we should be using multiplicative update (which is what they used). Let's try that next.
Sparsifying using multiplicative update
For the next attempt, I used multiplicative update, and this behaved much more as expected, with sparse, parts-based representations emerging:
You can see the drastic difference, and this is reflected in the histogram of sparseness values:
Note the code to generate this is below.
One final interesting thing to note: the sparseness values with this method seem to increase with the component number. I plotted sparseness as a function of component, and this is (roughly) born out, and was born out consistently over all my runs of the algorithm:
I have not seen this discussed elsewhere, so thought I'd mention it.
Code to generate sparse representation of faces using the mu
NMF algorithm:
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import NMF
faces, _ = fetch_olivetti_faces(return_X_y=True)
num_nmf_components = 200
alph_W = 0.9 # cd: .9, mu: .9
L1_ratio = 0.9 # cd: 0, L1_ratio: 0.9
try:
del estimator
except:
print("first run")
estimator = NMF(num_nmf_components,
init='nndsvdar', # nndsvd
solver='mu',
max_iter=50,
alpha_W=alph_W,
alpha_H=0, zeros
l1_ratio=L1_ratio,
shuffle=True)
H = estimator.fit_transform(faces)
W = estimator.components_
# plot the basis faces
n_row, n_col = 5, 7 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
plt.subplot(n_row, n_col, face_id+1)
face_sparseness = sparseness_hoyer(face)
plt.imshow(face.reshape(image_shape), cmap='gray')
plt.title(f"{face_sparseness: 0.2f}")
plt.axis('off')
plt.suptitle('NMF', fontsize=16, y=1)
plt.tight_layout()
Answered By - eric
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.