Issue
As the title suggests, I'm trying train a model based on the SimCLR framework (seen in this paper: https://arxiv.org/pdf/2002.05709.pdf - the NT_Xent loss is stated in equation (1) and Algorithm 1).
I have managed to create a numpy version of the loss function, but this is not suitable to train the model on, as numpy arrays cannot store the required information for back propagation. I am having difficulty converting my numpy code over to Tensorflow. Here is my numpy version:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = np.concatenate((zi, zj), 0)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
numerator_ij = np.exp(sim_ij / tau)
numerator_ji = np.exp(sim_ji / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
denominator_ik = np.sum(np.exp(sim_ik / tau))
denominator_jk = np.sum(np.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - np.log(numerator_ij / denominator_ik)
loss_ji = - np.log(numerator_ji / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
I am fairly confident that this function produces the correct results (albeit slowly, as I have seen other implementations of it online that were vectorised versions - such as this one for Pytorch: https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py (my code produces the same result for identical inputs), but I do not see how their version is mathematically equivalent to the formula in the paper, hence why I am trying to build my own).
As a first try I have converted the numpy functions to their TF equivalents (tf.concat, tf.reshape, tf.math.exp, tf.range, etc.), but I believe my only/main problem is that sklearn's cosine_similarity function returns a numpy array, and I do not know how to build this function myself in Tensorflow. Any ideas?
Solution
I managed to figure it out myself! I did not realise there was a Tensorflow implementation of the cosine similarity function "tf.keras.losses.CosineSimilarity"
Here is my code:
import tensorflow as tf
# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
(This is the Tensorflow implementation of the standard numpy version found
in the NT_Xent function).
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
# Instantiate the cosine similarity loss function
cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
numerator = tf.math.exp(sim / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - tf.math.log(numerator / denominator_ik)
loss_ji = - tf.math.log(numerator / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
As you can see, I have essentially just swapped out the numpy functions for the TF equivalents. One main point of note is that I had to use "reduction=tf.keras.losses.Reduction.NONE" within the "cosine_sim" function, this was to keep the shapes consistent in the "sim_ik" and "sim_jk", because otherwise the resulting loss did not match up with my original numpy implementation.
I also noticed that individually calculating the numerator for i,j and j,i was redundant as the answers were the same, so I have removed one instance of that calculation.
Of course if anybody has a quicker implementation I am more than happy to hear about it!
Answered By - AlexP
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.