Issue
I am trying to visualize several PyTorch datasets. For the IMDb dataset I am getting only negative training samples. In the original dataset the positive and the negative samples are balanced.
This is the code I am using. It is based on the T5 Tutorial
from torch.utils.data import DataLoader
from functools import partial
from torchtext.datasets import IMDB
imdb_datapipe = IMDB(split='test')
labels = {"1": "negative", "2": "positive"}
def process_labels(labels, x):
return x[1], labels[str(x[0])]
imdb_datapipe = imdb_datapipe.map(partial(process_labels, labels))
imdb_datapipe = imdb_datapipe.batch(2)
imdb_datapipe = imdb_datapipe.shuffle()
imdb_datapipe = imdb_datapipe.rows2columnar(["text", "label"])
imdb_dataloader = DataLoader(imdb_datapipe, batch_size=None)
it = iter(imdb_dataloader)
for _ in range(10):
sample = next(it)
for text,label in zip(sample['text'], sample['label']):
print(f"{label}: {text[:100]}")
What am I missing?
Solution
Ran your code in a clean (Colab) environment and everything works, getting both positive and negative examples: output screenshot
It could be an environment issue. Perhaps, try to reinstall torchtext and run your code again. torchtext==0.15.2
with torch==2.0.1
works for me.
Answered By - Vadym Hadetskyi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.