Issue
I have implemented the following metric to look at Precision and Recall of the classes I deem relevant.
metrics=[tf.keras.metrics.Recall(class_id=1, name='Bkwd_R'),tf.keras.metrics.Recall(class_id=2, name='Fwd_R'),tf.keras.metrics.Precision(class_id=1, name='Bkwd_P'),tf.keras.metrics.Precision(class_id=2, name='Fwd_P')]
How can I implement the same in Tensorflow 2.5 for F1 score (i.e specifically for class 1 and class 2, and not class 0, without a custom function.
Update
Using this metric setup:
tfa.metrics.F1Score(num_classes = 3, average = None, name = f1_name)
I get the following during training:
13367/13367 [==============================] 465s 34ms/step - loss: 0.1683 - f1_score: 0.5842 - val_loss: 0.0943 - val_f1_score: 0.3314
and when I do model.evaluate:
224/224 [==============================] - 11s 34ms/step - loss: 0.0665 - f1_score: 0.3325
and the scoring =
Score: [0.06653735041618347, array([0.99740255, 0. , 0. ], dtype=float32)]
The problem is that this is training based on the average, but I would like to train on the F1 score of a sensible averaging/each of the last two values/classes in the array (which are 0 in this case)
Edit
Will accept a non tensorflow specific function that gives the desired result (with full function and call during fit code) but was really hoping for something using the exisiting tensorflow code if it exists)
Solution
As is mentioned in David Harris' comment, a neural network model is trained on loss functions, not on metric scores. Losses help drive the model towards a solution to provide accurate labels via backpropagation. Metrics help to provide a comparable evaluation of that model's performance that are a lot more human-legible.
So, that being said, I feel like what you're saying in your question is that "there are three classes, and I want the model to care more about the last two of the three". I want to
IF that's the case, one approach you can take is to weight your samples by label. Let's say that you have labels in an array y_train
.
# Which classes are you wanting to focus on
classes_i_care_about = [1, 2]
# Initialize all weights to 1.0
sample_weights = np.ones(shape=(len(y_train),))
# Give the classes you care about 50% more weight
sample_weight[np.isin(y_train, classes_i_care_about)] = 1.5
...
model.fit(
x=X_train,
y=y_train,
sample_weight=sample_weight,
epochs=5
)
This is the best advice I can offer without knowing more. If you're looking for other info on how you can have your model do better on certain classes, other info could be useful, such as:
- What's the proportions of labels in your dataset?
- What is the last layer of your model architecture?
Dense(3, activation="softmax")
? - What loss are you using?
Here's a more complete, reproducible example that shows what I'm talking about with the sample weights:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import tensorflow_addons as tfa
iris_data = load_iris() # load the iris dataset
x = iris_data.data
y_ = iris_data.target.reshape(-1, 1) # Convert data to a single column
# One Hot encode the class labels
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y_)
# Split the data for training and testing
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.20)
# Build the model
def get_model():
model = Sequential()
model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
model.add(Dense(10, activation='relu', name='fc2'))
model.add(Dense(3, activation='softmax', name='output'))
# Adam optimizer with learning rate of 0.001
optimizer = Adam(lr=0.001)
model.compile(
optimizer,
loss='categorical_crossentropy',
metrics=[
'accuracy',
tfa.metrics.F1Score(
num_classes=3,
average=None,
)
]
)
return model
model = get_model()
model.fit(
train_x,
train_y,
verbose=2,
batch_size=5,
epochs=25,
)
results = model.evaluate(test_x, test_y)
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))
print('Final test F1 scores: {}'.format(results[2]))
Final test set loss: 0.585964
Final test set accuracy: 0.633333
Final test F1 scores: [1. 0.15384616 0.6206897 ]
Now, we add weight to classes 1 and 2:
sample_weight = np.ones(shape=(len(train_y),))
sample_weight[
(train_y[:, 1] == 1) | (train_y[:, 2] == 1)
] = 1.5
model = get_model()
model.fit(
train_x,
train_y,
sample_weight=sample_weight,
verbose=2,
batch_size=5,
epochs=25,
)
results = model.evaluate(test_x, test_y)
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))
print('Final test F1 scores: {}'.format(results[2]))
Final test set loss: 0.437623
Final test set accuracy: 0.900000
Final test F1 scores: [1. 0.8571429 0.8571429]
Here, the model has emphasized learning these, and their respective performance is improved.
Answered By - Bryan Dannowitz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.