Issue
I use the following classification model from Huggingface:
model = AutoModelForSequenceClassification.from_pretrained("dbmdz/bert-base-german-cased", num_labels=2).to(device)
As I understand, this adds a dense layer at the end of the pre-trained model which has 2 output nodes. But are all the pre-trained layers before that frozen? Or are they also updated when fine-tuning? I can't find information about that in the docs...
So do I still have to do something like this?:
for param in model.bert.parameters():
param.requires_grad = False
Solution
They are not frozen. All parameters are trainable by default. You can also check that with:
for name, param in model.named_parameters():
print(name, param.requires_grad)
Output:
bert.embeddings.word_embeddings.weight True
bert.embeddings.position_embeddings.weight True
bert.embeddings.token_type_embeddings.weight True
bert.embeddings.LayerNorm.weight True
bert.embeddings.LayerNorm.bias True
bert.encoder.layer.0.attention.self.query.weight True
bert.encoder.layer.0.attention.self.query.bias True
bert.encoder.layer.0.attention.self.key.weight True
bert.encoder.layer.0.attention.self.key.bias True
bert.encoder.layer.0.attention.self.value.weight True
bert.encoder.layer.0.attention.self.value.bias True
bert.encoder.layer.0.attention.output.dense.weight True
bert.encoder.layer.0.attention.output.dense.bias True
bert.encoder.layer.0.attention.output.LayerNorm.weight True
bert.encoder.layer.0.attention.output.LayerNorm.bias True
bert.encoder.layer.0.intermediate.dense.weight True
bert.encoder.layer.0.intermediate.dense.bias True
bert.encoder.layer.0.output.dense.weight True
bert.encoder.layer.0.output.dense.bias True
bert.encoder.layer.0.output.LayerNorm.weight True
bert.encoder.layer.0.output.LayerNorm.bias True
bert.encoder.layer.1.attention.self.query.weight True
bert.encoder.layer.1.attention.self.query.bias True
bert.encoder.layer.1.attention.self.key.weight True
bert.encoder.layer.1.attention.self.key.bias True
bert.encoder.layer.1.attention.self.value.weight True
bert.encoder.layer.1.attention.self.value.bias True
bert.encoder.layer.1.attention.output.dense.weight True
bert.encoder.layer.1.attention.output.dense.bias True
bert.encoder.layer.1.attention.output.LayerNorm.weight True
bert.encoder.layer.1.attention.output.LayerNorm.bias True
bert.encoder.layer.1.intermediate.dense.weight True
bert.encoder.layer.1.intermediate.dense.bias True
bert.encoder.layer.1.output.dense.weight True
bert.encoder.layer.1.output.dense.bias True
bert.encoder.layer.1.output.LayerNorm.weight True
bert.encoder.layer.1.output.LayerNorm.bias True
bert.encoder.layer.2.attention.self.query.weight True
bert.encoder.layer.2.attention.self.query.bias True
bert.encoder.layer.2.attention.self.key.weight True
bert.encoder.layer.2.attention.self.key.bias True
bert.encoder.layer.2.attention.self.value.weight True
bert.encoder.layer.2.attention.self.value.bias True
bert.encoder.layer.2.attention.output.dense.weight True
bert.encoder.layer.2.attention.output.dense.bias True
bert.encoder.layer.2.attention.output.LayerNorm.weight True
bert.encoder.layer.2.attention.output.LayerNorm.bias True
bert.encoder.layer.2.intermediate.dense.weight True
bert.encoder.layer.2.intermediate.dense.bias True
bert.encoder.layer.2.output.dense.weight True
bert.encoder.layer.2.output.dense.bias True
bert.encoder.layer.2.output.LayerNorm.weight True
bert.encoder.layer.2.output.LayerNorm.bias True
bert.encoder.layer.3.attention.self.query.weight True
bert.encoder.layer.3.attention.self.query.bias True
bert.encoder.layer.3.attention.self.key.weight True
bert.encoder.layer.3.attention.self.key.bias True
bert.encoder.layer.3.attention.self.value.weight True
bert.encoder.layer.3.attention.self.value.bias True
bert.encoder.layer.3.attention.output.dense.weight True
bert.encoder.layer.3.attention.output.dense.bias True
bert.encoder.layer.3.attention.output.LayerNorm.weight True
bert.encoder.layer.3.attention.output.LayerNorm.bias True
bert.encoder.layer.3.intermediate.dense.weight True
bert.encoder.layer.3.intermediate.dense.bias True
bert.encoder.layer.3.output.dense.weight True
bert.encoder.layer.3.output.dense.bias True
bert.encoder.layer.3.output.LayerNorm.weight True
bert.encoder.layer.3.output.LayerNorm.bias True
bert.encoder.layer.4.attention.self.query.weight True
bert.encoder.layer.4.attention.self.query.bias True
bert.encoder.layer.4.attention.self.key.weight True
bert.encoder.layer.4.attention.self.key.bias True
bert.encoder.layer.4.attention.self.value.weight True
bert.encoder.layer.4.attention.self.value.bias True
bert.encoder.layer.4.attention.output.dense.weight True
bert.encoder.layer.4.attention.output.dense.bias True
bert.encoder.layer.4.attention.output.LayerNorm.weight True
bert.encoder.layer.4.attention.output.LayerNorm.bias True
bert.encoder.layer.4.intermediate.dense.weight True
bert.encoder.layer.4.intermediate.dense.bias True
bert.encoder.layer.4.output.dense.weight True
bert.encoder.layer.4.output.dense.bias True
bert.encoder.layer.4.output.LayerNorm.weight True
bert.encoder.layer.4.output.LayerNorm.bias True
bert.encoder.layer.5.attention.self.query.weight True
bert.encoder.layer.5.attention.self.query.bias True
bert.encoder.layer.5.attention.self.key.weight True
bert.encoder.layer.5.attention.self.key.bias True
bert.encoder.layer.5.attention.self.value.weight True
bert.encoder.layer.5.attention.self.value.bias True
bert.encoder.layer.5.attention.output.dense.weight True
bert.encoder.layer.5.attention.output.dense.bias True
bert.encoder.layer.5.attention.output.LayerNorm.weight True
bert.encoder.layer.5.attention.output.LayerNorm.bias True
bert.encoder.layer.5.intermediate.dense.weight True
bert.encoder.layer.5.intermediate.dense.bias True
bert.encoder.layer.5.output.dense.weight True
bert.encoder.layer.5.output.dense.bias True
bert.encoder.layer.5.output.LayerNorm.weight True
bert.encoder.layer.5.output.LayerNorm.bias True
bert.encoder.layer.6.attention.self.query.weight True
bert.encoder.layer.6.attention.self.query.bias True
bert.encoder.layer.6.attention.self.key.weight True
bert.encoder.layer.6.attention.self.key.bias True
bert.encoder.layer.6.attention.self.value.weight True
bert.encoder.layer.6.attention.self.value.bias True
bert.encoder.layer.6.attention.output.dense.weight True
bert.encoder.layer.6.attention.output.dense.bias True
bert.encoder.layer.6.attention.output.LayerNorm.weight True
bert.encoder.layer.6.attention.output.LayerNorm.bias True
bert.encoder.layer.6.intermediate.dense.weight True
bert.encoder.layer.6.intermediate.dense.bias True
bert.encoder.layer.6.output.dense.weight True
bert.encoder.layer.6.output.dense.bias True
bert.encoder.layer.6.output.LayerNorm.weight True
bert.encoder.layer.6.output.LayerNorm.bias True
bert.encoder.layer.7.attention.self.query.weight True
bert.encoder.layer.7.attention.self.query.bias True
bert.encoder.layer.7.attention.self.key.weight True
bert.encoder.layer.7.attention.self.key.bias True
bert.encoder.layer.7.attention.self.value.weight True
bert.encoder.layer.7.attention.self.value.bias True
bert.encoder.layer.7.attention.output.dense.weight True
bert.encoder.layer.7.attention.output.dense.bias True
bert.encoder.layer.7.attention.output.LayerNorm.weight True
bert.encoder.layer.7.attention.output.LayerNorm.bias True
bert.encoder.layer.7.intermediate.dense.weight True
bert.encoder.layer.7.intermediate.dense.bias True
bert.encoder.layer.7.output.dense.weight True
bert.encoder.layer.7.output.dense.bias True
bert.encoder.layer.7.output.LayerNorm.weight True
bert.encoder.layer.7.output.LayerNorm.bias True
bert.encoder.layer.8.attention.self.query.weight True
bert.encoder.layer.8.attention.self.query.bias True
bert.encoder.layer.8.attention.self.key.weight True
bert.encoder.layer.8.attention.self.key.bias True
bert.encoder.layer.8.attention.self.value.weight True
bert.encoder.layer.8.attention.self.value.bias True
bert.encoder.layer.8.attention.output.dense.weight True
bert.encoder.layer.8.attention.output.dense.bias True
bert.encoder.layer.8.attention.output.LayerNorm.weight True
bert.encoder.layer.8.attention.output.LayerNorm.bias True
bert.encoder.layer.8.intermediate.dense.weight True
bert.encoder.layer.8.intermediate.dense.bias True
bert.encoder.layer.8.output.dense.weight True
bert.encoder.layer.8.output.dense.bias True
bert.encoder.layer.8.output.LayerNorm.weight True
bert.encoder.layer.8.output.LayerNorm.bias True
bert.encoder.layer.9.attention.self.query.weight True
bert.encoder.layer.9.attention.self.query.bias True
bert.encoder.layer.9.attention.self.key.weight True
bert.encoder.layer.9.attention.self.key.bias True
bert.encoder.layer.9.attention.self.value.weight True
bert.encoder.layer.9.attention.self.value.bias True
bert.encoder.layer.9.attention.output.dense.weight True
bert.encoder.layer.9.attention.output.dense.bias True
bert.encoder.layer.9.attention.output.LayerNorm.weight True
bert.encoder.layer.9.attention.output.LayerNorm.bias True
bert.encoder.layer.9.intermediate.dense.weight True
bert.encoder.layer.9.intermediate.dense.bias True
bert.encoder.layer.9.output.dense.weight True
bert.encoder.layer.9.output.dense.bias True
bert.encoder.layer.9.output.LayerNorm.weight True
bert.encoder.layer.9.output.LayerNorm.bias True
bert.encoder.layer.10.attention.self.query.weight True
bert.encoder.layer.10.attention.self.query.bias True
bert.encoder.layer.10.attention.self.key.weight True
bert.encoder.layer.10.attention.self.key.bias True
bert.encoder.layer.10.attention.self.value.weight True
bert.encoder.layer.10.attention.self.value.bias True
bert.encoder.layer.10.attention.output.dense.weight True
bert.encoder.layer.10.attention.output.dense.bias True
bert.encoder.layer.10.attention.output.LayerNorm.weight True
bert.encoder.layer.10.attention.output.LayerNorm.bias True
bert.encoder.layer.10.intermediate.dense.weight True
bert.encoder.layer.10.intermediate.dense.bias True
bert.encoder.layer.10.output.dense.weight True
bert.encoder.layer.10.output.dense.bias True
bert.encoder.layer.10.output.LayerNorm.weight True
bert.encoder.layer.10.output.LayerNorm.bias True
bert.encoder.layer.11.attention.self.query.weight True
bert.encoder.layer.11.attention.self.query.bias True
bert.encoder.layer.11.attention.self.key.weight True
bert.encoder.layer.11.attention.self.key.bias True
bert.encoder.layer.11.attention.self.value.weight True
bert.encoder.layer.11.attention.self.value.bias True
bert.encoder.layer.11.attention.output.dense.weight True
bert.encoder.layer.11.attention.output.dense.bias True
bert.encoder.layer.11.attention.output.LayerNorm.weight True
bert.encoder.layer.11.attention.output.LayerNorm.bias True
bert.encoder.layer.11.intermediate.dense.weight True
bert.encoder.layer.11.intermediate.dense.bias True
bert.encoder.layer.11.output.dense.weight True
bert.encoder.layer.11.output.dense.bias True
bert.encoder.layer.11.output.LayerNorm.weight True
bert.encoder.layer.11.output.LayerNorm.bias True
bert.pooler.dense.weight True
bert.pooler.dense.bias True
classifier.weight True
classifier.bias True
Answered By - cronoik
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.