Saturday, June 25, 2022

[FIXED] How to test my trained huggingface model on the test dataset?

June 25, 2022 huggingface-datasets, huggingface-transformers, machine-learning, pytorch No comments

Issue

I was following the huggingface tutorial on training a multiple choice QA model and trained my model with

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_qa["train"],
    eval_dataset=tokenized_qa["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
    compute_metrics=compute_metrics
)

trainer.train()

Afterwards I can load the model with:

# load trained model for testing
model = AutoModelForMultipleChoice.from_pretrained('results/checkpoint-1000')

But how can I test it on the testing dataset?

The dataset looks like:

DatasetDict({
    train: Dataset({
        features: ['id', 'sent1', 'sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label', 'input_ids', 'attention_mask'],
        num_rows: 10178
    })
    test: Dataset({
        features: ['id', 'sent1', 'sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label', 'input_ids', 'attention_mask'],
        num_rows: 1273
    })
    validation: Dataset({
        features: ['id', 'sent1', 'sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label', 'input_ids', 'attention_mask'],
        num_rows: 1272
    })
})

I have quite a bit of code so if there's more information needed please do let me know.

Solution

Okay figured it out and adding an answer for completion. Seems like the training arguments from the trainer class are not needed:

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
    compute_metrics=compute_metrics
)

Put in evaluation mode:

model.eval() # put in testing mode (dropout modules are deactivated)

And then call:

trainer.predict(tokenized_qa["test"])

PredictionOutput(predictions=array([[-1.284791 , -1.2848296, -1.2848794, -1.2848705],
       [-1.2848867, -1.2849237, -1.2848233, -1.2848446],
       [-1.284851 , -1.2847253, -1.2849066, -1.2848204],
       ...,
       [-1.284877 , -1.2848783, -1.284853 , -1.284804 ],
       [-1.2848401, -1.2848557, -1.2847972, -1.2848665],
       [-1.2848748, -1.2848799, -1.2848252, -1.2848618]], dtype=float32), label_ids=array([1, 3, 1, ..., 1, 2, 2]), metrics={'test_loss': 1.386292576789856, 'test_accuracy': 0.25727773406766324, 'test_runtime': 16.0096, 'test_samples_per_second': 79.39, 'test_steps_per_second': 9.932})

Answered By - Penguin

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, June 25, 2022

[FIXED] How to test my trained huggingface model on the test dataset?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels