Wednesday, December 13, 2023

[FIXED] numpy.where narrowing array dimensions

December 13, 2023 numpy, python No comments

Issue

I want to use some classes from the CIFAR-100 dataset to train a model. I used NumPy where to filter the dataset, but it’s narrowing image arrays dimensions.

import numpy as np
from tensorflow.keras.datasets import cifar100

(x_train, y_train), (x_test, y_test) = cifar100.load_data()
index = np.where((y_train == 1) | (y_train == 2))
print('Images Shape: {}'.format(x_train.shape))
X_train = x_train[index]
Y_train = y_train[index]
print('Images Shape: {}'.format(X_train.shape))

Prints:

Images Shape: (50000, 32, 32, 3)

Images Shape: (1000, 32, 3)

What I tried so far:

After filtering I tried to convert results to the shape of an image like this:

index = np.asarray(index).reshape(x_train.shape[0])

But then I get this error:

ValueError: cannot reshape array of size 2000 into shape (50000,)

I want to train a model using only 10 classes from the CIFAR-100 dataset.

Here is my model:

import numpy as np
from tensorflow.keras.datasets import cifar100
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPool2D
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(Conv2D(16, (3, 3),
                 # strides=(1, 1),
                 activation='relu',
                 padding='same',  # 'valid',
                 input_shape=(32, 32, 3)))
model.add(MaxPool2D((2, 2)))
model.add(Conv2D(32, (3, 3),
                 # strides=(1, 1),
                 activation='relu'))
model.add(MaxPool2D((2, 2)))
model.add(Conv2D(32, (3, 3),
                 # strides=(1, 1),
                 activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.summary()

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train,
          Y_train,
          epochs=5,
          batch_size=64)

Solution

I'm not sure that training on a subset of the y_train cases is the right way to go. Usually training is done on a 'random' subset of all cases. keras and such have split functions that split a dataset, both X and y into train and test sets.

But as follow on to my comment, to explain the behaviour that you see.

With a (n,1) y_train:

In [203]: y = np.arange(10)[:,None]
In [204]: idx = np.nonzero((y<4)|(y>7))
In [205]: idx
Out[205]: (array([0, 1, 2, 3, 8, 9]), array([0, 0, 0, 0, 0, 0]))

nonzero/where returns a tuple of index arrays, one array per dimension. Since the 2nd dimension is 1, idx[1] is all 0, and doesn't provide any useful information.

When used to index the 4d x_train, idx selects 6 values on the first dimension, and just 1, the first, on the 2nd:

In [206]: x = np.ones((10,2,3,4),int)
In [207]: x[idx].shape
Out[207]: (6, 3, 4)

Indexing with just the first array retains the 2nd dimension:

In [208]: x[idx[0]].shape
Out[208]: (6, 2, 3, 4)

I have no idea what you were trying to do with:

index = np.asarray(index).reshape(x_train.shape[0])

In my example x.shape[0] is 10, but idx[0] is (6,). It doesn't make sense to reshape idx[0], much less both arrays.

In [209]: np.array(idx)
Out[209]: 
array([[0, 1, 2, 3, 8, 9],
       [0, 0, 0, 0, 0, 0]])

Answered By - hpaulj

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 13, 2023

[FIXED] numpy.where narrowing array dimensions

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels