Issue
I used SubsetRandomSampler
to split the training data to train (80%) and validation data (20%). But it is showing the same number of images for both after the split (4996):
>>> print('len(train_data): ', len(train_loader.dataset))
>>> print('len(valid_data): ', len(validation_loader.dataset))
len(train_data): 4996
len(valid_data): 4996
Full code:
import numpy as np
import torch
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler
train_transforms = transforms.Compose([transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
dataset = datasets.ImageFolder( '/data/images/train', transform=train_transforms )
validation_split = .2
shuffle_dataset = True
random_seed= 42
batch_size = 20
dataset_size = len(dataset) #4996
indices = list(range(dataset_size))
split = int(np.floor(validation_split * dataset_size))
if shuffle_dataset :
np.random.seed(random_seed)
np.random.shuffle(indices)
train_indices, val_indices = indices[split:], indices[:split]
train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=train_sampler)
validation_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=valid_sampler)
Solution
train_loader.dataset
and validation_loader.dataset
are methods which return the underlying original dataset that the loaders sample from (i.e. the original dataset of size 4996).
If you iterate through the loaders themselves you will see they only return as many samples (acounting for batching) as you have included in the index for each sampler, however.
Answered By - iacob
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.