Issue
I am running the Keras example on knowledge distillation from the keras example and my question is: The resulting compressed model that I can use to do predictions is the distiller or the student model? And in such case, how do I add back the softmax classification layer and run predictions using the resulting model?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_train = np.reshape(x_train, (-1, 28, 28, 1))
x_test = x_test.astype("float32") / 255.0
x_test = np.reshape(x_test, (-1, 28, 28, 1))
teacher = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(256, (3, 3), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
layers.Conv2D(512, (3, 3), strides=(2, 2), padding="same"),
layers.Flatten(),
layers.Dense(10),
],
name="teacher",
)
# Create the student
student = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(16, (3, 3), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
layers.Conv2D(32, (3, 3), strides=(2, 2), padding="same"),
layers.Flatten(),
layers.Dense(10),
],
name="student",
)
teacher.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
)
teacher.fit(x_train, y_train, epochs=5)
teacher.evaluate(x_test, y_test)
distiller = Distiller(student=student, teacher=teacher)
distiller.compile(
optimizer=keras.optimizers.Adam(),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
distillation_loss_fn=keras.losses.KLDivergence(),
alpha=0.1,
temperature=10,
)
# Distill teacher to student
distiller.fit(x_train, y_train, epochs=3)
# Evaluate student on test dataset
distiller.evaluate(x_test, y_test)
Despite being able to run the example, I don't think these informations are clear to me, I would like to test the model on unseen data, therefore I was wondering, how do I build a model from Knowledge Distillation and perform predictions and check its classification report?
Solution
The "compressed" model is the student model. The Distiller
is just the wrapper for training the student to try and mimic the teacher, as opposed to training the student to try and estimate the ground-truth labels.
The page you linked has a section comparing the distillation results to the equivalent light-weight student architecture with "from scratch" training against actual labels, so predictions are rather straight-forward from the tutorial.
Note that the teacher and student both only have a dense
layer at their end, and the training therefor assumes that the loss should be calculated by viewing the model outputs as logits
. So both the teacher and the student outputs just need a simple tf.nn.softmax
for getting the standard categorical score.
Don't forget to recalibrate the softmax temperature if necessary.
Answered By - ShlomiF
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.