Friday, October 21, 2022

[FIXED] Scikit learn NMF how to adjust sparseness of resulting factorization?

October 21, 2022 linear-algebra, python, scikit-learn, sparse-matrix No comments

Issue

Nonnegative matrix factorization is lauded for generating sparse basis sets. However, when I run sklearn.decomposition.NMF the factors are not sparse. Older versions of NMF had a 'degree of sparseness' parameter beta. Newer versions do not, but I want my basis matrix W to actually be sparse. What can I do? (Code to reproduce problem is below).

I have toyed around with increasing various regularization parameters (e.g., alpha), but am not getting anything very sparse (like in the paper by Lee and Seung (1999) when I apply it to the Olivetti faces dataset. They still basically end up looking like eigenfaces.

My CNM output (not very sparse):

Lee and Seung CNM paper output basis columns (looks sparse to me):

Code to reproduce my problem:

from sklearn.datasets import fetch_olivetti_faces 
import matplotlib.pyplot as plt 
import numpy as np
from sklearn.decomposition import NMF

faces, _ = fetch_olivetti_faces(return_X_y=True)    

# run nmf on the faces data set
num_nmf_components = 50
estimator = NMF(num_nmf_components, 
                init='nndsvd',
                tol=5e-3,
                max_iter=1000,
                alpha_W=0.01,
                l1_ratio=0)
H = estimator.fit_transform(faces)
W = estimator.components_

# plot the basis faces
n_row, n_col = 6, 4 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
  plt.subplot(n_row, n_col, face_id+1)
  plt.imshow(face.reshape(image_shape), cmap='gray')
  plt.axis('off')
plt.tight_layout()

Is there some combinations of parameters with sklearn.decomposition.NMF() that lets you dial in sparseness? I have played with different combinations of alpha_W and l1_ratio and even tweaked the number of components. I still end up with eigen-face looking things.

Solution

There are a couple of things going on here that we need to disentangle. First, what happened to sparseness? Second, how do you generate sparse faces using the sklearn function?

Where did the sparseness go?

The sklearn.decomposition.NMF function went through a major change from versions 0.16 to 0.19. There are multiple ways to implement nonnetative matrix factorization.

Before 0.16, NMF used projected gradient descent as described in Hoyer 2004, and included a sparseness parameter (which as OP noted let you adjust the sparseness of the resulting W basis).

Because of various limitations outlined in this extremely thorough issue at sklearn's github repo, it was decided to move on to two additional methods:

Release 0.16: coordinate descent (PR here which was in version 0.16)
Release 0.19: multiplicative update (PR here which was in version 0.19)

This was a pretty major undertaking, and the upshot is we now have a great deal more freedom in terms of error functions, initialization, and regularization. You can read about that at the issue. The objective function is now:

You can read more details/explanation at the docs, but to note a few things relevant to the question:

The solver param which takes in mu for multiplicative update or cd for coordinate descent. The older projected gradient descent method (with the sparseness parameter) is deprecated.
As you can see in the objective function, there are weights for regularizing W and for H (alpha_W and alpha_H respectively). In theory if you want to reign in W, you should increase alpha_W.
You can regularize using the L1 or L2 norm, and the ratio between the two is set by l1_ratio. The larger you make l1_ratio, the more you weight the L1 norm over L2 norm. Note: the L1 norm tends to generate more sparse parameter sets, while the L2 norm tends to generate small parameter sets, so in theory if you want sparseness, then set your l1_ratio high.

How to generate sparse faces?

The examination of the objective function suggests what to do. Crank up alpha_W and l1_ratio. But also note that the Lee and Seung paper used multiplicative update (mu), so if you wanted to reproduce their results, I would recommend setting solver to mu, setting alpha_W high, and l1_ratio high, and see what happens.

In the OP's question, they implicitly used the cd solver (which is the default), and set alpha_W=0.01 and l1_ratio=0, which I wouldn't necessarily expect to create a sparse basis set.

But things are actually not that simple. I tried some initial runs of coordinate descent with high l1_ratio and alpha_W and found very low sparseness. So to quantify some of this, I did a grid search, and used a sparseness measure.

Quantifying sparseness is itself a cottage industry (e.g., see this post, and the paper cited there). I used Hoyer's measure of sparsity, adapted from the one used in the nimfa package:

def sparseness_hoyer(x):
    """
    The sparseness of array x is a real number in [0, 1], where sparser array
    has value closer to 1. Sparseness is 1 iff the vector contains a single
    nonzero component and is equal to 0 iff all components of the vector are 
    the same
        
    modified from Hoyer 2004: [sqrt(n)-L1/L2]/[sqrt(n)-1]
    
    adapted from nimfa package: https://nimfa.biolab.si/
    """
    from math import sqrt # faster than numpy sqrt 
    eps = np.finfo(x.dtype).eps if 'int' not in str(x.dtype) else 1e-9
    
    n = x.size

    # measure is meant for nmf: things get weird for negative values
    if np.min(x) < 0:
        x -= np.min(x)
        
    # patch for array of zeros
    if np.allclose(x, np.zeros(x.shape), atol=1e-6):
        return 0.0
    
    L1 = abs(x).sum()
    L2 = sqrt(np.multiply(x, x).sum())
    sparseness_num = sqrt(n) - (L1 + eps) / (L2 + eps)
    sparseness_den = sqrt(n) - 1
    
    return sparseness_num / sparseness_den

What this measures actually quantifies is sort of complicated, but roughly a sparse image is one with only a few pixels active, a non-sparse image has lots of pixels active. If we run PCA on the faces example from the OP, we can see the sparseness values is low around 0.04 for the eigenfaces:

Sparsifying using coordinate descent?

If we run NMF using the params used in the OP (using coordinate descent, with low W_alpha and l1_ratio, except with 200 components), the sparseness values are again low:

If you look at the histogram of sparseness values this is verified:

Different, but not super impressive, compared with PCA.

I next did a grid search through W_alpha and l1_ratio space, varying them between 0 and 1 (at 0.1 step increments). I found that sparsity was not maximized when they were 1. Surprisingly, contrary to theoretical expectations, I found that sparsity was only high when l1_ratio was 0 and it dropped of precipitously above 0. And within this slice of parameters, sparsity was maximized when alpha_W was 0.9:

Intuitively, this is a huge improvement. There is still a lot of variation in the distribution of sparseness values, but they are much higher:

However, maybe in order to replicate the Lee and Seung results, and better control sparseness, we should be using multiplicative update (which is what they used). Let's try that next.

Sparsifying using multiplicative update

For the next attempt, I used multiplicative update, and this behaved much more as expected, with sparse, parts-based representations emerging:

You can see the drastic difference, and this is reflected in the histogram of sparseness values:

Note the code to generate this is below.

One final interesting thing to note: the sparseness values with this method seem to increase with the component number. I plotted sparseness as a function of component, and this is (roughly) born out, and was born out consistently over all my runs of the algorithm:

I have not seen this discussed elsewhere, so thought I'd mention it.

Code to generate sparse representation of faces using the `mu` NMF algorithm:

from sklearn.datasets import fetch_olivetti_faces 
import matplotlib.pyplot as plt 
import numpy as np
from sklearn.decomposition import NMF

faces, _ = fetch_olivetti_faces(return_X_y=True) 

num_nmf_components = 200
alph_W = 0.9  # cd: .9, mu: .9
L1_ratio = 0.9 # cd: 0, L1_ratio: 0.9

try:
    del estimator
except:
    print("first run")
    
estimator = NMF(num_nmf_components, 
                init='nndsvdar', # nndsvd
                solver='mu', 
                max_iter=50,
                alpha_W=alph_W,
                alpha_H=0, zeros
                l1_ratio=L1_ratio,
                shuffle=True)

H = estimator.fit_transform(faces)
W = estimator.components_

# plot the basis faces
n_row, n_col = 5, 7 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
    plt.subplot(n_row, n_col, face_id+1)
    face_sparseness = sparseness_hoyer(face)
    plt.imshow(face.reshape(image_shape), cmap='gray')
    plt.title(f"{face_sparseness: 0.2f}")
    plt.axis('off')
plt.suptitle('NMF', fontsize=16, y=1)
plt.tight_layout()

Answered By - eric

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, October 21, 2022

[FIXED] Scikit learn NMF how to adjust sparseness of resulting factorization?

Issue

Solution

Where did the sparseness go?

How to generate sparse faces?

Sparsifying using coordinate descent?

Sparsifying using multiplicative update

Code to generate sparse representation of faces using the `mu` NMF algorithm:

0 comments:

Post a Comment

Popular Posts

Labels

Friday, October 21, 2022

Issue

Solution

Where did the sparseness go?

How to generate sparse faces?

Sparsifying using coordinate descent?

Sparsifying using multiplicative update

Code to generate sparse representation of faces using the mu NMF algorithm:

0 comments:

Post a Comment

Popular Posts

Labels

Code to generate sparse representation of faces using the `mu` NMF algorithm: