Issue
I fully understand the use of a training set separate from the test set.
I also understand why you would shuffle batches in the training set to compute the gradient over mini-batches.
However, ads mentioned in the pyTorch tutorial, I do not understand why you would use a shuffling of the test set like in:
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
In what case would that be useful?
Solution
I shuffled my test set due to batch wise statistics. I was computing the roc_auc_score for each batch. I was doing a binary classification task and positive and negative examples were loaded from different locations on the disk. The generated file list was therefore something like [0,0,0,1,1,1]
when looking at the classes. Without shuffling it can then happen that you only have one class in a batch and the batchwise score computation fails. I shuffled it therefore but got a warning that it is strongly recommended to not do so. As a matter of fact i observed randomness in summary statistics whenever i rerun the model on the test set. Thanks to the first poster I assume I now know why.
Answered By - Sebastian Salzmann
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.