Issue
I implemented my custom Bert Binary Classification Model class, by adding a classifier layer on top of Bert Model (attached below). However, the accuracy/metrics are significantly different when I train with the official BertForSequenceClassification model, which makes me wonder if I am missing somehting in my class.
Few Doubts I have:
While loading the official BertForSequenceClassification
from_pretrained
are the classifiers weight initialized as well from pretrained model or they are randomly initialized? Because in my custom class they are randomly initialized.
class MyCustomBertClassification(nn.Module):
def __init__(self, encoder='bert-base-uncased',
num_labels,
hidden_dropout_prob):
super(MyCustomBertClassification, self).__init__()
self.config = AutoConfig.from_pretrained(encoder)
self.encoder = AutoModel.from_config(self.config)
self.dropout = nn.Dropout(hidden_dropout_prob)
self.classifier = nn.Linear(self.config.hidden_size, num_labels)
def forward(self, input_sent):
outputs = self.encoder(input_ids=input_sent['input_ids'],
attention_mask=input_sent['attention_mask'],
token_type_ids=input_sent['token_type_ids'],
return_dict=True)
pooled_output = self.dropout(outputs[1])
# for both tasks
logits = self.classifier(pooled_output)
return logits
Solution
Each model tells you via a warning message which layers are randomly initialized when you use the method from_pretrained:
from transformers import BertForSequenceClassification
b = BertForSequenceClassification.from_pretrained('bert-base-uncased')
Output:
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
The difference between your implementation and the BertForSequenceClassification
is that you do not use any pretrained weights at all. The method from_config does not load the pretrained weights from a state_dict
:
import torch
from transformers import AutoModelForSequenceClassification, AutoConfig
b2 = AutoModelForSequenceClassification.from_config(AutoConfig.from_pretrained('bert-base-uncased'))
b3 = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')
print("Does from_config provides pretrained weights: {}".format(torch.equal(b.bert.embeddings.word_embeddings.weight, b2.base_model.embeddings.word_embeddings.weight)))
print("Does from_pretrained provides pretrained weights: {}".format(torch.equal(b.bert.embeddings.word_embeddings.weight, b3.base_model.embeddings.word_embeddings.weight)))
Output:
Does from_config provides pretrained weights: False
Does from_pretrained provides pretrained weights: True
Therefore you probably want to change your class to:
class MyCustomBertClassification(nn.Module):
def __init__(self, encoder='bert-base-uncased',
num_labels=2,
hidden_dropout_prob=0.1):
super(MyCustomBertClassification, self).__init__()
self.config = AutoConfig.from_pretrained(encoder)
self.encoder = AutoModel.from_pretrained(encoder)
self.dropout = nn.Dropout(hidden_dropout_prob)
self.classifier = nn.Linear(self.config.hidden_size, num_labels)
def forward(self, input_sent):
outputs = self.encoder(input_ids=input_sent['input_ids'],
attention_mask=input_sent['attention_mask'],
token_type_ids=input_sent['token_type_ids'],
return_dict=True)
pooled_output = self.dropout(outputs[1])
# for both tasks
logits = self.classifier(pooled_output)
return logits
myB = MyCustomBertClassification()
print(torch.equal(b.bert.embeddings.word_embeddings.weight, myB.encoder.embeddings.word_embeddings.weight))
Output:
True
Answered By - cronoik
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.