Issue
I am currently learning how to perform data augmentation with Keras ImageDataGenerator from "Deep learning with Keras" by François Chollet.
I now have 1000 (Dogs) & 1000 (Cats) images in training dataset.
I also have 500(Dogs) & 500(Cats) images in validation dataset.
The book defined the batch size as 32 for both training and validation data in the Generator to perform data augmentation with both "step_per_epoch" and "epoch" in fitting the model.
Hpwever, when I train the model, I received the Tensorflow Warning, "Your input ran out of data..." and stopped the training process.
I searched online and many solutions mentioned that the step_per_epoch should be,
steps_per_epoch = len(train_dataset) // batch_size
& steps_per_epoch = len(validation_dataset) // batch_size
I understand the logic above and there is no warning in the training.
But I am wondering, originally I have 2000 training samples. This is too little so that I need to perform data augmentation to increase the numbers of training images.
If the steps_per_epoch = len(train_dataset) // batch_size
is applied, since the len(train_dataset)
is only 2000. Isn't that I am still using 2000 samples to train the model instead of adding more augmented images to the model?
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=32,
class_mode='binary')
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=validation_generator,
validation_steps=50)
Solution
The fact that, imagedatagenerator does not increase the size of the training set. All augmentations are done in memory. So an original image is augmented randomly, then its augmented version is returned. If you want to have a look to augmented images you need set these parameters for the function flow_from_directory:
save_to_dir=path,
save_prefix="",
save_format="png",
Now you have 2000 images and with a batch size of 32, you would have 2000 // 32 = 62 steps per epoch, but you are trying to have 100 steps which is causing the error.
If you have a dataset which does not generate batches and want to use all data points, then you should set:
steps_per_epoch = len(train_dataset) // batch_size
But when you use flow_from_directory
, it generates batches, so there is no need to set steps_per_epoch
unless you want to use less data points than generated batches.
Answered By - Frightera
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.