Saturday, January 15, 2022

[FIXED] Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification

January 15, 2022 huggingface-transformers, pytorch No comments

Issue

I implemented my custom Bert Binary Classification Model class, by adding a classifier layer on top of Bert Model (attached below). However, the accuracy/metrics are significantly different when I train with the official BertForSequenceClassification model, which makes me wonder if I am missing somehting in my class.

Few Doubts I have:

While loading the official BertForSequenceClassification from_pretrained are the classifiers weight initialized as well from pretrained model or they are randomly initialized? Because in my custom class they are randomly initialized.

class MyCustomBertClassification(nn.Module):
    def __init__(self, encoder='bert-base-uncased',
                        num_labels,
                        hidden_dropout_prob):

    super(MyCustomBertClassification, self).__init__()
    self.config  = AutoConfig.from_pretrained(encoder)
    self.encoder = AutoModel.from_config(self.config)
    self.dropout = nn.Dropout(hidden_dropout_prob)
    self.classifier = nn.Linear(self.config.hidden_size, num_labels)

def forward(self, input_sent):
    outputs = self.encoder(input_ids=input_sent['input_ids'],
                          attention_mask=input_sent['attention_mask'],
                          token_type_ids=input_sent['token_type_ids'],
                          return_dict=True)
    
    pooled_output = self.dropout(outputs[1])
    # for both tasks
    logits = self.classifier(pooled_output)

    return logits

Solution

Each model tells you via a warning message which layers are randomly initialized when you use the method from_pretrained:

from transformers import BertForSequenceClassification

b = BertForSequenceClassification.from_pretrained('bert-base-uncased')

Output:

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']

The difference between your implementation and the BertForSequenceClassification is that you do not use any pretrained weights at all. The method from_config does not load the pretrained weights from a state_dict:

import torch
from transformers import AutoModelForSequenceClassification, AutoConfig

b2 = AutoModelForSequenceClassification.from_config(AutoConfig.from_pretrained('bert-base-uncased'))
b3 = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')

print("Does from_config provides pretrained weights: {}".format(torch.equal(b.bert.embeddings.word_embeddings.weight, b2.base_model.embeddings.word_embeddings.weight)))
print("Does from_pretrained provides pretrained weights: {}".format(torch.equal(b.bert.embeddings.word_embeddings.weight, b3.base_model.embeddings.word_embeddings.weight)))

Output:

Does from_config provides pretrained weights: False
Does from_pretrained provides pretrained weights: True

Therefore you probably want to change your class to:

class MyCustomBertClassification(nn.Module):
    def __init__(self, encoder='bert-base-uncased',
                       num_labels=2,
                       hidden_dropout_prob=0.1):

       super(MyCustomBertClassification, self).__init__()
       self.config  = AutoConfig.from_pretrained(encoder)
       self.encoder = AutoModel.from_pretrained(encoder)
       self.dropout = nn.Dropout(hidden_dropout_prob)
       self.classifier = nn.Linear(self.config.hidden_size, num_labels)

    def forward(self, input_sent):
       outputs = self.encoder(input_ids=input_sent['input_ids'],
                         attention_mask=input_sent['attention_mask'],
                         token_type_ids=input_sent['token_type_ids'],
                         return_dict=True)
    
       pooled_output = self.dropout(outputs[1])
       # for both tasks
       logits = self.classifier(pooled_output)

       return logits

myB = MyCustomBertClassification()

print(torch.equal(b.bert.embeddings.word_embeddings.weight, myB.encoder.embeddings.word_embeddings.weight))

Output:

True

Answered By - cronoik

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 15, 2022

[FIXED] Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels