Sunday, September 4, 2022

[FIXED] Accuracy of same validation dataset differs between last epoch and after fit

September 04, 2022 deep-learning, keras, python, tensorflow No comments

Issue

The following code gives a log ending with

Epoch 19/20
1/1 [==============================] - 0s 473ms/step - loss: 1.4018 - accuracy: 0.8750 - val_loss: 1.8656 - val_accuracy: 0.8900
Epoch 20/20
1/1 [==============================] - 0s 444ms/step - loss: 0.5904 - accuracy: 0.8750 - val_loss: 2.1255 - val_accuracy: 0.8700
get_dataset: validation
Found 1000 files belonging to 2 classes.
Using 100 files for validation.
4/4 [==============================] - 1s 81ms/step
eval acc: 0.81

My question is:

Why is the val_accuracy after the last epoch (0.87) different from the eval acc (0.81) after the fit?

In my code, I try to use the same dataset for the validation of each epoch during fit and the additional validation afterwards.

[Update 1, 2022-07-19:

Obviously, the two accuracy calculations don't really use the same data. How can I debug which data is actually used?
[Update 3, 2022-07-20: I have followed the data into TensorFlow. The last thing I see is that in Model.evaluate (during fit) and Model.predict the x.filenames are equal. I did not manage to debug much further, because soon in quick_execute the __inference_test_function_248219 resp. the __inference_predict_function_231438 are evaluated outside Python, and the arguments are tensors with dtype=resource, whose contents I cannot see.]
I have deliberately removed my class balancing code to keep my example small. I know that this makes the accuracies less useful, but I don't care about that for now.
Note that get_dataset('validation') is only called once at the beginning of the fit, not at each epoch.
I have now also set max_queue_size=0, use_multiprocessing=False, workers=0 (as seen here, found via this related SO question about TensorFlow 1), but this did not make the accuracies equal.

]

Code:

import tensorflow as tf
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.preprocessing import image_dataset_from_directory
    
inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
base_output = base_model(inputs)
base_model.trainable = False
out = Flatten(name='flat')(base_output)
out = Dense(1, activation='sigmoid')(out)
model = Model(inputs=inputs, outputs=out)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

def get_dataset(subset):
    print('get_dataset:', subset)
    return image_dataset_from_directory(
        'data-nodup-1000',
        labels="inferred",
        label_mode='binary',
        color_mode="rgb",
        image_size=(224, 224),
        shuffle=True,
        seed=1,
        validation_split=0.1,
        subset=subset,
        crop_to_aspect_ratio=False,
    )

model.fit(
    get_dataset('training'),
    steps_per_epoch=1,
    epochs=20,
    validation_data=get_dataset('validation'),
    max_queue_size=0,
    use_multiprocessing=False,
    workers=0,
)

val_dataset = get_dataset('validation')
true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))

[Update 2, 2022-07-19: I can also reproduce the behavior with the deprecated ImageDataGenerator, using

from tensorflow.keras.applications.resnet50 import preprocess_input
from keras_preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    validation_split=0.1,
)

def get_dataset(subset):
    print('get_dataset:', subset)
    return datagen.flow_from_directory(
        'data-nodup-1000',
        class_mode='binary',
        target_size=(224, 224),
        shuffle=True,
        seed=1,
        subset=subset,
    )

and

true_class = val_dataset.labels

]

[Update 4, 2022-07-21: Note that deactivating shuffling of validation data by setting shuffle=(subset == 'training') makes the two validation accuracies equal. This is not a workaround, however, because the validation set then consists only of class 1, since flow_from_directory doesn't do stratification. ]

My environment:

I am using all up-to-date libraries, like tensorflow 2.9.1 and sklearn 1.1.1 (via pip-compile -U).
The folder data-nodup-1000 contains one subfolder with 113 files of class 0, and one subfolder with 887 files of class 1.

Solution

I have now found out that in TensorFlow 2.9.1 model.predict uses the second iteration of the dataset, which is shuffled differently than the first iteration!

Therefore, the entries of true_class and pred do not match.

Switching to TensorFlow 2.10.0-rc3 and its tf.keras.utils.split_dataset makes the accuracies equal.

Here's the updated code:

import tensorflow as tf
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.preprocessing import image_dataset_from_directory
    
inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
base_output = base_model(inputs)
base_model.trainable = False
out = Flatten(name='flat')(base_output)
out = Dense(1, activation='sigmoid')(out)
model = Model(inputs=inputs, outputs=out)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

dataset = image_dataset_from_directory(
    'data-synthetic',
    labels="inferred",
    label_mode='binary',
    color_mode="rgb",
    image_size=(224, 224),
    shuffle=True,
    seed=1,
    crop_to_aspect_ratio=False,
)
train_dataset, val_dataset = tf.keras.utils.split_dataset(dataset, right_size=0.1)

model.fit(
    train_dataset,
    steps_per_epoch=1,
    epochs=20,
    validation_data=val_dataset,
    max_queue_size=0,
    use_multiprocessing=False,
    workers=0,
)

true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))

which correctly yields:

Epoch 19/20
1/1 [==============================] - 0s 438ms/step - loss: 0.4426 - accuracy: 0.9062 - val_loss: 0.4658 - val_accuracy: 0.8800
Epoch 20/20
1/1 [==============================] - 0s 444ms/step - loss: 2.1619 - accuracy: 0.8438 - val_loss: 0.5886 - val_accuracy: 0.8900
4/4 [==============================] - 1s 87ms/step
eval acc: 0.89

Answered By - Robert Pollak

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, September 4, 2022

[FIXED] Accuracy of same validation dataset differs between last epoch and after fit

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels