Issue
I am training a Faster RCNN neural network on COCO dataset with Pytorch.
I have followed next tutorial: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
The training results are as follows:
Epoch: [6] [ 0/119] eta: 0:01:16 lr: 0.000050 loss: 0.3780 (0.3780) loss_classifier: 0.1290 (0.1290) loss_box_reg: 0.1848 (0.1848) loss_objectness: 0.0239 (0.0239) loss_rpn_box_reg: 0.0403 (0.0403) time: 0.6451 data: 0.1165 max mem: 3105
Epoch: [6] [ 10/119] eta: 0:01:13 lr: 0.000050 loss: 0.4129 (0.4104) loss_classifier: 0.1277 (0.1263) loss_box_reg: 0.2164 (0.2059) loss_objectness: 0.0244 (0.0309) loss_rpn_box_reg: 0.0487 (0.0473) time: 0.6770 data: 0.1253 max mem: 3105
Epoch: [6] [ 20/119] eta: 0:01:07 lr: 0.000050 loss: 0.4165 (0.4302) loss_classifier: 0.1277 (0.1290) loss_box_reg: 0.2180 (0.2136) loss_objectness: 0.0353 (0.0385) loss_rpn_box_reg: 0.0499 (0.0491) time: 0.6843 data: 0.1265 max mem: 3105
Epoch: [6] [ 30/119] eta: 0:01:00 lr: 0.000050 loss: 0.4205 (0.4228) loss_classifier: 0.1271 (0.1277) loss_box_reg: 0.2125 (0.2093) loss_objectness: 0.0334 (0.0374) loss_rpn_box_reg: 0.0499 (0.0484) time: 0.6819 data: 0.1274 max mem: 3105
Epoch: [6] [ 40/119] eta: 0:00:53 lr: 0.000050 loss: 0.4127 (0.4205) loss_classifier: 0.1209 (0.1265) loss_box_reg: 0.2102 (0.2085) loss_objectness: 0.0315 (0.0376) loss_rpn_box_reg: 0.0475 (0.0479) time: 0.6748 data: 0.1282 max mem: 3105
Epoch: [6] [ 50/119] eta: 0:00:46 lr: 0.000050 loss: 0.3973 (0.4123) loss_classifier: 0.1202 (0.1248) loss_box_reg: 0.1947 (0.2039) loss_objectness: 0.0315 (0.0366) loss_rpn_box_reg: 0.0459 (0.0470) time: 0.6730 data: 0.1297 max mem: 3105
Epoch: [6] [ 60/119] eta: 0:00:39 lr: 0.000050 loss: 0.3900 (0.4109) loss_classifier: 0.1206 (0.1248) loss_box_reg: 0.1876 (0.2030) loss_objectness: 0.0345 (0.0365) loss_rpn_box_reg: 0.0431 (0.0467) time: 0.6692 data: 0.1276 max mem: 3105
Epoch: [6] [ 70/119] eta: 0:00:33 lr: 0.000050 loss: 0.3984 (0.4085) loss_classifier: 0.1172 (0.1242) loss_box_reg: 0.2069 (0.2024) loss_objectness: 0.0328 (0.0354) loss_rpn_box_reg: 0.0458 (0.0464) time: 0.6707 data: 0.1252 max mem: 3105
Epoch: [6] [ 80/119] eta: 0:00:26 lr: 0.000050 loss: 0.4153 (0.4113) loss_classifier: 0.1178 (0.1246) loss_box_reg: 0.2123 (0.2036) loss_objectness: 0.0328 (0.0364) loss_rpn_box_reg: 0.0480 (0.0468) time: 0.6744 data: 0.1264 max mem: 3105
Epoch: [6] [ 90/119] eta: 0:00:19 lr: 0.000050 loss: 0.4294 (0.4107) loss_classifier: 0.1178 (0.1238) loss_box_reg: 0.2098 (0.2021) loss_objectness: 0.0418 (0.0381) loss_rpn_box_reg: 0.0495 (0.0466) time: 0.6856 data: 0.1302 max mem: 3105
Epoch: [6] [100/119] eta: 0:00:12 lr: 0.000050 loss: 0.4295 (0.4135) loss_classifier: 0.1171 (0.1235) loss_box_reg: 0.2124 (0.2034) loss_objectness: 0.0460 (0.0397) loss_rpn_box_reg: 0.0498 (0.0469) time: 0.6955 data: 0.1345 max mem: 3105
Epoch: [6] [110/119] eta: 0:00:06 lr: 0.000050 loss: 0.4126 (0.4117) loss_classifier: 0.1229 (0.1233) loss_box_reg: 0.2119 (0.2024) loss_objectness: 0.0430 (0.0394) loss_rpn_box_reg: 0.0481 (0.0466) time: 0.6822 data: 0.1306 max mem: 3105
Epoch: [6] [118/119] eta: 0:00:00 lr: 0.000050 loss: 0.4006 (0.4113) loss_classifier: 0.1171 (0.1227) loss_box_reg: 0.2028 (0.2028) loss_objectness: 0.0366 (0.0391) loss_rpn_box_reg: 0.0481 (0.0466) time: 0.6583 data: 0.1230 max mem: 3105
Epoch: [6] Total time: 0:01:20 (0.6760 s / it)
creating index...
index created!
Test: [ 0/59] eta: 0:00:15 model_time: 0.1188 (0.1188) evaluator_time: 0.0697 (0.0697) time: 0.2561 data: 0.0634 max mem: 3105
Test: [58/59] eta: 0:00:00 model_time: 0.1086 (0.1092) evaluator_time: 0.0439 (0.0607) time: 0.2361 data: 0.0629 max mem: 3105
Test: Total time: 0:00:14 (0.2378 s / it)
Averaged stats: model_time: 0.1086 (0.1092) evaluator_time: 0.0439 (0.0607)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.643
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.079
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.096
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Epoch: [7] [ 0/119] eta: 0:01:16 lr: 0.000050 loss: 0.3851 (0.3851) loss_classifier: 0.1334 (0.1334) loss_box_reg: 0.1845 (0.1845) loss_objectness: 0.0287 (0.0287) loss_rpn_box_reg: 0.0385 (0.0385) time: 0.6433 data: 0.1150 max mem: 3105
Epoch: [7] [ 10/119] eta: 0:01:12 lr: 0.000050 loss: 0.3997 (0.4045) loss_classifier: 0.1250 (0.1259) loss_box_reg: 0.1973 (0.2023) loss_objectness: 0.0292 (0.0303) loss_rpn_box_reg: 0.0479 (0.0459) time: 0.6692 data: 0.1252 max mem: 3105
Epoch: [7] [ 20/119] eta: 0:01:07 lr: 0.000050 loss: 0.4224 (0.4219) loss_classifier: 0.1250 (0.1262) loss_box_reg: 0.2143 (0.2101) loss_objectness: 0.0333 (0.0373) loss_rpn_box_reg: 0.0493 (0.0484) time: 0.6809 data: 0.1286 max mem: 3105
Epoch: [7] [ 30/119] eta: 0:01:00 lr: 0.000050 loss: 0.4120 (0.4140) loss_classifier: 0.1191 (0.1221) loss_box_reg: 0.2113 (0.2070) loss_objectness: 0.0357 (0.0374) loss_rpn_box_reg: 0.0506 (0.0475) time: 0.6834 data: 0.1316 max mem: 3105
Epoch: [7] [ 40/119] eta: 0:00:53 lr: 0.000050 loss: 0.4013 (0.4117) loss_classifier: 0.1118 (0.1210) loss_box_reg: 0.2079 (0.2063) loss_objectness: 0.0357 (0.0371) loss_rpn_box_reg: 0.0471 (0.0473) time: 0.6780 data: 0.1304 max mem: 3105
Epoch: [7] [ 50/119] eta: 0:00:46 lr: 0.000050 loss: 0.3911 (0.4035) loss_classifier: 0.1172 (0.1198) loss_box_reg: 0.1912 (0.2017) loss_objectness: 0.0341 (0.0356) loss_rpn_box_reg: 0.0449 (0.0464) time: 0.6768 data: 0.1314 max mem: 3105
Epoch: [7] [ 60/119] eta: 0:00:39 lr: 0.000050 loss: 0.3911 (0.4048) loss_classifier: 0.1186 (0.1213) loss_box_reg: 0.1859 (0.2013) loss_objectness: 0.0334 (0.0360) loss_rpn_box_reg: 0.0412 (0.0462) time: 0.6729 data: 0.1306 max mem: 3105
Epoch: [7] [ 70/119] eta: 0:00:33 lr: 0.000050 loss: 0.4046 (0.4030) loss_classifier: 0.1177 (0.1209) loss_box_reg: 0.2105 (0.2008) loss_objectness: 0.0359 (0.0354) loss_rpn_box_reg: 0.0462 (0.0459) time: 0.6718 data: 0.1282 max mem: 3105
Epoch: [7] [ 80/119] eta: 0:00:26 lr: 0.000050 loss: 0.4125 (0.4067) loss_classifier: 0.1187 (0.1221) loss_box_reg: 0.2105 (0.2022) loss_objectness: 0.0362 (0.0362) loss_rpn_box_reg: 0.0469 (0.0462) time: 0.6725 data: 0.1285 max mem: 3105
Epoch: [7] [ 90/119] eta: 0:00:19 lr: 0.000050 loss: 0.4289 (0.4068) loss_classifier: 0.1288 (0.1223) loss_box_reg: 0.2097 (0.2009) loss_objectness: 0.0434 (0.0375) loss_rpn_box_reg: 0.0479 (0.0461) time: 0.6874 data: 0.1327 max mem: 3105
Epoch: [7] [100/119] eta: 0:00:12 lr: 0.000050 loss: 0.4222 (0.4086) loss_classifier: 0.1223 (0.1221) loss_box_reg: 0.2101 (0.2021) loss_objectness: 0.0405 (0.0381) loss_rpn_box_reg: 0.0483 (0.0463) time: 0.6941 data: 0.1348 max mem: 3105
Epoch: [7] [110/119] eta: 0:00:06 lr: 0.000050 loss: 0.4082 (0.4072) loss_classifier: 0.1196 (0.1220) loss_box_reg: 0.2081 (0.2013) loss_objectness: 0.0350 (0.0379) loss_rpn_box_reg: 0.0475 (0.0461) time: 0.6792 data: 0.1301 max mem: 3105
Epoch: [7] [118/119] eta: 0:00:00 lr: 0.000050 loss: 0.4070 (0.4076) loss_classifier: 0.1196 (0.1223) loss_box_reg: 0.2063 (0.2016) loss_objectness: 0.0313 (0.0375) loss_rpn_box_reg: 0.0475 (0.0462) time: 0.6599 data: 0.1255 max mem: 3105
Epoch: [7] Total time: 0:01:20 (0.6763 s / it)
creating index...
index created!
Test: [ 0/59] eta: 0:00:14 model_time: 0.1194 (0.1194) evaluator_time: 0.0633 (0.0633) time: 0.2511 data: 0.0642 max mem: 3105
Test: [58/59] eta: 0:00:00 model_time: 0.1098 (0.1102) evaluator_time: 0.0481 (0.0590) time: 0.2353 data: 0.0625 max mem: 3105
Test: Total time: 0:00:13 (0.2371 s / it)
Averaged stats: model_time: 0.1098 (0.1102) evaluator_time: 0.0481 (0.0590)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.649
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.079
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.095
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
I have two questions:
Overfitting: I don't know if my model is overfitting or underfitting. How I can find out looking the metrics?
Save the best model of all epochs: How I can save the best model trained during the differents epochs? Which is the best epoch according to the results?
Thank you!
Solution
You need to keep track of loss on test dataset (or some other metric like recall). Draw your attention to this part of code:
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
train_one_epoch
and evaluate
are defined here. Evaluate function returns object of type CocoEvaluator
, but you can modify the code so that it returns test loss (you need to either extract metrics from CocoEvaluator
object somehow, or write your own metric evaluation).
So, the answers are:
- Keep track of test loss, it will tell you about overfitting.
- Save the model state after every epoch until test loss begins to increase. Tutorial about saving models is here.
Answered By - roman
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.