Issue
I´m trying to create a Siamese model with Keras
which learns to recognize differences in Mel-Spectrograms.
The dataset I´m using is the ESC-50 dataset.
I split it in training files (40 classes a 40 files) and test files (5 classes a 40 files).
I generate positive and negative pairs.
I generate Mel-Spectrograms with 64 Mel-Bands -> shape of mel-spectrogram (64,626)
.
For example the arrays feat_train_1
and feat_train_2
are of the shape (3200,64,626).
Mel-Spectrograms:
I this picture the first two spectrograms are feat_train_1[i]
and feat_train_2[i]
.
pair_labels_train[i]=1 (positive pair)
The 3rd and 4th spectrogram are feat_train_1[i+1]
and feat_train_2[i+1]
with pair_labels_train[i+1]=0
.
I then expand the feature arrays with a channel dimension and broadcast them to 3 channels.
I´m using the VGG16 network to extract embeddings out of the features. The euclidian distance of the two embeddings gets calculated.
The problem is that the accuracy (as well as val_accuarcy) is stuck at 50% while the loss slowly decreases. You can see the whole script here:
BATCH_SIZE = 20
EPOCHS = 10
#Load the audio files and split them
audio_data, labels = utiltiy_functions.read_audio_files('esc-50-master/audio_conv', 'esc-50-master/meta')
idx_training, idx_test, idx_eval = utiltiy_functions.split_data(audio_data, labels)
pair_idx_train , pair_labels_train = utiltiy_functions.generate_pairs(labels, idx_training)
pair_idx_test , pair_labels_test = utiltiy_functions.generate_pairs(labels, idx_test)
pair_idx_eval , pair_labels_eval = utiltiy_functions.generate_pairs(labels, idx_eval)
audio_data_train_1 = audio_data[pair_idx_train[:,0]]
audio_data_train_2 = audio_data[pair_idx_train[:,1]]
audio_data_test_1 = audio_data[pair_idx_test[:,0]]
audio_data_test_2 = audio_data[pair_idx_test[:,1]]
audio_data_eval_1 = audio_data[pair_idx_eval[:,0]]
audio_data_eval_2 = audio_data[pair_idx_eval[:,1]]
#Calculate Features and reshape
def get_librosa_melspecs(audio_array, name):
melspecs = np.zeros((audio_array.shape[0],64,626))
for i,audio in enumerate(audio_array):
mel = librosa.feature.melspectrogram(y=audio, n_mels=64, n_fft = 1024, hop_length=128, sr=16000)
mel[mel!=0] = np.log(mel[mel!=0])
#melnormalized = librosa.util.normalize(mellog)
melspecs[i]=mel
np.save(name, melspecs)
return melspecs
feat_test_1 = get_librosa_melspecs(audio_data_test_1, "features_vgg_test1.npy")
feat_test_2 = get_librosa_melspecs(audio_data_test_2, "features_vgg_test2.npy")
feat_train_1 = get_librosa_melspecs(audio_data_train_1, "features_vgg_train1.npy")
feat_train_2 = get_librosa_melspecs(audio_data_train_2, "features_vgg_train2.npy")
feat_test_1 = np.expand_dims(feat_test_1, 3)
feat_test_2 = np.expand_dims(feat_test_2, 3)
feat_train_1 = np.expand_dims(feat_train_1, 3)
feat_train_2 = np.expand_dims(feat_train_2, 3)
feat_test_1 = tf.reshape(tf.broadcast_to(feat_test_1, (400,64,626,3)), (400,64,626,3))
feat_test_2 = tf.reshape(tf.broadcast_to(feat_test_2, (400,64,626,3)), (400,64,626,3))
feat_train_1 = tf.reshape(tf.broadcast_to(feat_train_1, (3200,64,626,3)), (3200,64,626,3))
feat_train_2 = tf.reshape(tf.broadcast_to(feat_train_2, (3200,64,626,3)), (3200,64,626,3))
#Build siamese net
#inputs
feat_1 = Input(shape=(64,626,3))
feat_2 = Input(shape=(64,626,3))
#vgg16
model_vgg = VGG16(weights="imagenet", include_top=False, input_shape=(64,626,3))
for layer in model_vgg.layers:
layer.trainable = True
pre_emb1 = model_vgg(feat_1)
pre_emb2 = model_vgg(feat_2)
#flatten and dense layers
flatten = Flatten()
dense_1 = Dense(4096, activation="relu")
dense_2 = Dense(4096, activation="relu")
dense_3 = Dense(512, activation="relu")
flatten1 = flatten(pre_emb1)
flatten2 = flatten(pre_emb2)
dense1_1 = dense_1(flatten1)
dense2_1 = dense_1(flatten2)
dense1_2 = dense_2(dense1_1)
dense2_2 = dense_2(dense2_1)
dense1_3 = dense_3(dense1_2)
dense2_3 = dense_3(dense2_2)
#Distance
distance = Lambda(utiltiy_functions.eucl_distance)([dense1_3, dense2_3])
#Output Layer
outputs = Dense(1, activation="sigmoid")(distance)
#model definition
model = Model(inputs=[feat_1, feat_2], outputs=outputs)
print(model.summary())
#compile
opt = Adam(learning_rate=0.001)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])
early_stopping = EarlyStopping(monitor='val_loss', patience=3, mode='auto', restore_best_weights=True)
#Model trainieren
print("Siamesisches Model trainieren.\n")
model.fit(
[feat_train_1[:], feat_train_2[:]], pair_labels_train[:],
validation_data=([feat_test_1[:], feat_test_2[:]], pair_labels_test[:]),
batch_size=BATCH_SIZE,
epochs=EPOCHS,
shuffle=True,
callbacks=[early_stopping]
)
model.save_weights("siamese_weights.h5")
Output:
2022-07-01 12:33:55.261913: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-01 12:33:55.262999: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WAV-Dateien einlesen...
WAV-Dateien splitten...
Trainings-Paare bilden...
Test-Paare bilden...
Evaluierungs-Paare bilden...
Berechnete Features aus Dateien laden...
2022-07-01 12:34:56.888225: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-01 12:34:56.895846: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2022-07-01 12:34:56.896817: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2022-07-01 12:34:56.897344: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2022-07-01 12:34:56.898131: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2022-07-01 12:34:56.898859: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found
2022-07-01 12:34:56.900598: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2022-07-01 12:34:56.903538: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2022-07-01 12:34:56.904142: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-07-01 12:34:56.940548: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-01 12:34:57.018051: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 384614400 exceeds 10% of free system memory.
2022-07-01 12:34:57.230756: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 384614400 exceeds 10% of free system memory.
2022-07-01 12:34:57.592132: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 3076915200 exceeds 10% of free system memory.
2022-07-01 12:35:06.909754: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 3076915200 exceeds 10% of free system memory.
Siamesisches Netzwerk erstellen...
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 64, 626, 3)] 0
block1_conv1 (Conv2D) (None, 64, 626, 64) 1792
block1_conv2 (Conv2D) (None, 64, 626, 64) 36928
block1_pool (MaxPooling2D) (None, 32, 313, 64) 0
block2_conv1 (Conv2D) (None, 32, 313, 128) 73856
block2_conv2 (Conv2D) (None, 32, 313, 128) 147584
block2_pool (MaxPooling2D) (None, 16, 156, 128) 0
block3_conv1 (Conv2D) (None, 16, 156, 256) 295168
block3_conv2 (Conv2D) (None, 16, 156, 256) 590080
block3_conv3 (Conv2D) (None, 16, 156, 256) 590080
block3_pool (MaxPooling2D) (None, 8, 78, 256) 0
block4_conv1 (Conv2D) (None, 8, 78, 512) 1180160
block4_conv2 (Conv2D) (None, 8, 78, 512) 2359808
block4_conv3 (Conv2D) (None, 8, 78, 512) 2359808
block4_pool (MaxPooling2D) (None, 4, 39, 512) 0
block5_conv1 (Conv2D) (None, 4, 39, 512) 2359808
block5_conv2 (Conv2D) (None, 4, 39, 512) 2359808
block5_conv3 (Conv2D) (None, 4, 39, 512) 2359808
block5_pool (MaxPooling2D) (None, 2, 19, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
None
2022-07-01 12:35:22.586993: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 318767104 exceeds 10% of free system memory.
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 64, 626, 3) 0 []
]
input_2 (InputLayer) [(None, 64, 626, 3) 0 []
]
vgg16 (Functional) (None, 2, 19, 512) 14714688 ['input_1[0][0]',
'input_2[0][0]']
flatten (Flatten) (None, 19456) 0 ['vgg16[0][0]',
'vgg16[1][0]']
dense (Dense) (None, 4096) 79695872 ['flatten[0][0]',
'flatten[1][0]']
dense_1 (Dense) (None, 4096) 16781312 ['dense[0][0]',
'dense[1][0]']
dense_2 (Dense) (None, 512) 2097664 ['dense_1[0][0]',
'dense_1[1][0]']
lambda (Lambda) (None, 1) 0 ['dense_2[0][0]',
'dense_2[1][0]']
dense_3 (Dense) (None, 1) 2 ['lambda[0][0]']
==================================================================================================
Total params: 113,289,538
Trainable params: 113,289,538
Non-trainable params: 0
__________________________________________________________________________________________________
None
Siamesisches Netzwerk traineren...
Siamesisches Model trainieren.
Epoch 1/10
1/160 [..............................] - ETA: 1:43:33 - loss: 1.6736 - accuracy: 0.5000
156/160 [============================>.] - ETA: 2:02 - loss: 0.7549 - accuracy: 0.4622
157/160 [============================>.] - ETA: 1:31 - loss: 0.7550 - accuracy: 0.4608
158/160 [============================>.] - ETA: 1:01 - loss: 0.7546 - accuracy: 0.4604
159/160 [============================>.] - ETA: 30s - loss: 0.7542 - accuracy: 0.4613
160/160 [==============================] - ETA: 0s - loss: 0.7538 - accuracy: 0.4619
160/160 [==============================] - 5059s 32s/step - loss: 0.7538 - accuracy: 0.4619 - val_loss: 0.7172 - val_accuracy: 0.4725
Epoch 2/10
1/160 [..............................] - ETA: 1:20:48 - loss: 0.7224 - accuracy: 0.4500
2/160 [..............................] - ETA: 1:19:53 - loss: 0.7171 - accuracy: 0.4500
3/160 [..............................] - ETA: 1:19:26 - loss: 0.7145 - accuracy: 0.4500
4/160 [..............................] - ETA: 1:19:04 - loss: 0.7090 - accuracy: 0.4875
5/160 [..............................] - ETA: 1:18:32 - loss: 0.7086 - accuracy: 0.4600
6/160 [>.............................] - ETA: 1:18:34 - loss: 0.7055 - accuracy: 0.4750
155/160 [============================>.] - ETA: 2:33 - loss: 0.7006 - accuracy: 0.4677
156/160 [============================>.] - ETA: 2:02 - loss: 0.7005 - accuracy: 0.4683
157/160 [============================>.] - ETA: 1:32 - loss: 0.7005 - accuracy: 0.4688
158/160 [============================>.] - ETA: 1:01 - loss: 0.7004 - accuracy: 0.4690
159/160 [============================>.] - ETA: 30s - loss: 0.7004 - accuracy: 0.4682
160/160 [==============================] - ETA: 0s - loss: 0.7003 - accuracy: 0.4694
160/160 [==============================] - 5075s 32s/step - loss: 0.7003 - accuracy: 0.4694 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 3/10
1/160 [..............................] - ETA: 1:21:04 - loss: 0.6919 - accuracy: 0.5500
2/160 [..............................] - ETA: 1:20:37 - loss: 0.6933 - accuracy: 0.5000
3/160 [..............................] - ETA: 1:19:54 - loss: 0.6932 - accuracy: 0.5000
4/160 [..............................] - ETA: 1:19:19 - loss: 0.6940 - accuracy: 0.4750
5/160 [..............................] - ETA: 1:18:47 - loss: 0.6935 - accuracy: 0.4900
6/160 [>.............................] - ETA: 1:18:13 - loss: 0.6932 - accuracy: 0.5000852
62/160 [==========>...................] - ETA: 49:52 - loss: 0.6936 - accuracy: 0.4839
63/160 [==========>...................] - ETA: 49:21 - loss: 0.6937 - accuracy: 0.4825
64/160 [===========>..................] - ETA: 48:50 - loss: 0.6936 - accuracy: 0.4844
65/160 [===========>..................] - ETA: 48:20 - loss: 0.6936 - accuracy: 0.4854
66/160 [===========>..................] - ETA: 47:49 - loss: 0.6936 - accuracy: 0.4848
67/160 [===========>..................] - ETA: 47:18 - loss: 0.6936 - accuracy: 0.4836
68/160 [===========>..................] - ETA: 46:48 - loss: 0.6936 - accuracy: 0.4838
69/160 [===========>..................] - ETA: 46:17 - loss: 0.6936 - accuracy: 0.4855
The accuarcy wont change. The loss slowly decreases. I´ve tried training it for several hours.
I´ve already tried using different losses like constrastive
loss and different networks like MobileNet
or VGGish
.
Its always stuck at 50%.
I hope you can help me. Since this is my first post here feel free to ask more questions.
Solution
I could change that by changing the last activation function from sigmoid to relu:
#Output Layer
outputs = Dense(1, activation="relu")(distance)
Answered By - logame
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.