Issue
I have a folder with the name processed_data
.
In this file I have multiple .pt
files which are named data_0.pt
, data_1.pt
, data_2.pt
, data_3.pt
, data_4.pt
, data_5.pt
, ....., data_998.pt
, data_999.pt
, data_1000.pt
, data_1001.pt
.
All these .pt files are representing a graph which was created using pytorch-geometric.
My question is, how do I save load all these files to create my training dataset so that I can use them in DataLoader?
Solution
A torch DataLoader needs a Dataset object. When defining your Dataset class, you need to implement __init__
, __len__
, and __getitem__
.
__init__
is easy, but also dependent on your exact use case/context. Assuming the simplest possible situation, I'd define init to take in the data folder and a file which contains names of the training set files (one per line). Then, I'd store each file name in a list as a member of the class. So we'd have:
def __init__(self, data_folder, data_list_filename):
self.data_folder = data_folder
with open(data_list_filename, 'r') as f:
self.data_file_list = f.read().splitlines()
Ok, now we have two things stored in your Dataloader: 1) the data folder and 2) a list of data filenames. That makes __len__
especially easy:
def __len__(self):
return len(self.data_file_list)
And lastly, we just need to deal with __get_item__
:
def __getitem__(self, idx):
filename = self.data_file_list[idx]
data, label = extract_data_from_file(filename) # this is arbitrary because I don't know how you need to do this
return data, label
Then put all of this together under a class:
class MyDataset(Dataset):
def __init__(self, data_folder, data_list_filename):
self.data_folder = data_folder
self.data_file_list = open(data_list_filename, 'r').read().splitlines()
def __len__(self):
return len(self.data_file_list)
def __getitem__(self, idx):
filename = self.data_file_list[idx]
data, label = extract_data_from_file(filename) # idk how you plan to do this
return data, label
Obviously, your exact use will look different. But this should get you started.
Answered By - jhschwartz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.