Monday, January 24, 2022

[FIXED] partial tucker decomposition

January 24, 2022 machine-learning, python, tensorflow, tensorly No comments

Issue

I want to apply a partial tucker decomposition algorithm to minimize MNIST image tensor dataset of (60000,28,28), in order to conserve its features when applying another machine algorithm afterwards like SVM. I have this code that minimizes the second and third dimension of the tensor

i = 16
j = 10
core, factors = partial_tucker(train_data_mnist, modes=[1,2],tol=10e-5, rank=[i,j])
train_datapartial_tucker = tl.tenalg.multi_mode_dot(train_data_mnist, factors, 
                              modes=modes, transpose=True)
test_data_partial_tucker = tl.tenalg.multi_mode_dot(test_data_mnist, factors, 
                              modes=modes, transpose=True)

How to find the best rank [i,j] when I'm using partial_tucker in tensorly that will give the best dimension reduction for the image while conserving as much data?

Solution

Just like principal component analysis the partial tucker decomposition will give better results as we increase the rank, in the sense that the optimal mean square residual of the reconstruction is smaller.

In general, features (the core tensor) that enables accurate reconstructions of the original data, can be used to make similar predictions (given any model we can prepend a transformation that reconstruct the original data from the core features).

import mxnet as mx
import numpy as np
import tensorly as tl
import matplotlib.pyplot as plt
import tensorly.decomposition

# Load data
mnist = mx.test_utils.get_mnist()
train_data = mnist['train_data'][:,0]


err = np.zeros([28,28]) # here I will save the errors for each rank
batch = train_data[::100] # process only 1% of the data to go faster
for i in range(1,28):
  for j in range(1,28):
    if err[i,j] == 0:
      # Decompose the data
      core, factors = tl.decomposition.partial_tucker(
                        batch, modes=[1,2], tol=10e-5, rank=[i,j])
      # Reconstruct data from features
      c = tl.tenalg.multi_mode_dot(core, factors, modes=[1,2]);
      # Calculate the RMS error and save
      err[i,j] = np.sqrt(np.mean((c - batch)**2));

# Plot the statistics
plt.figure(figsize=(9,6))
CS = plt.contour(np.log2(err), levels=np.arange(-6, 0));
plt.clabel(CS, CS.levels, inline=True, fmt='$2^{%d}$', fontsize=16)
plt.xlabel('rank 2')
plt.ylabel('rank 1')
plt.grid()
plt.title('Reconstruction RMS error');

Usually you have better result with a balanced rank, i.e. i and j not very different from each other.

As we increase the error we can get better compression, we can rank the (i,j) by the error, and plot only where the error is minimum for a given feature dimension i * j, like this

X = np.zeros([28, 28])
X[...] = np.nan;
p = 28 * 28;
for e,i,j in sorted([(err[i,j], i, j) for i in range(1, 28) for j in range(1, 28)]):
  if p < i * j:
    # we can achieve this error with some better compression
    pass
  else:
    p = i * j;
    X[i,j] = e;
plt.imshow(X)

Anywhere in the white region you are wasting resources, the choice

Answered By - Bob

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, January 24, 2022

[FIXED] partial tucker decomposition

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels