Issue
Currently, I'm building a new transformer-based model with huggingface-transformers, where attention layer is different from the original one. I used run_glue.py
to check performance of my model on GLUE benchmark. However, I found that Trainer class of huggingface-transformers saves all the checkpoints that I set, where I can set the maximum number of checkpoints to save. However, I want to save only the weight (or other stuff like optimizers) with best performance on validation dataset, and current Trainer class doesn't seem to provide such thing. (If we set the maximum number of checkpoints, then it removes older checkpoints, not ones with worse performances). Someone already asked about same question on Github, but I can't figure out how to modify the script and do what I want. Currently, I'm thinking about making a custom Trainer class that inherits original one and change the train()
method, and it would be great if there's an easy and simple way to do this. Thanks in advance.
Solution
You may try the following parameters from trainer in the huggingface
training_args = TrainingArguments(
output_dir='/content/drive/results', # output directory
do_predict= True,
num_train_epochs=3, # total number of training epochs
**per_device_train_batch_size=4, # batch size per device during training
per_device_eval_batch_size=2**, # batch size for evaluation
warmup_steps=1000, # number of warmup steps for learning rate
save_steps=1000,
save_total_limit=10,
load_best_model_at_end= True,
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=0, evaluate_during_training=True)
There may be better ways to avoid too many checkpoints and selecting the best model. So far you can not save only the best model, but you check when the evaluation yields better results than the previous one.
Answered By - Shaina Raza
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.