Issue
I am trying to load a dataset of 7000 .mat files in Python as a 7000-d tensor with each entry of shape 100 x 100 x 100 x 3. The whole dataset is less than 80 MB. I am using Spyder. The code is as follows
dataDir = "/Users/..."
data= []
x_train = np.empty([7165, 100*100*100*3])
x_train = x_train.reshape([7165, 100, 100, 100, 3])
i = 0;
for file in sorted_alphanumeric(os.listdir( dataDir )):
data = scipy.io.loadmat( dataDir+file ) #store data as LIST
x_train[i] = np.array(data['tensor'])
i = i + 1
However after about 2300 lines read, the kernel dies and the program stops running. Why does the kernel die? How can I store the dataset? It seems to me that the dataset is not that huge and the "Memory" key in the Console is always around 76%.
Solution
Do not load the whole dataset at once to not run out of RAM. This cannot be avoided even on online tools such as Google Colab and dividing the datasets in multiple segments.
The way to deal with big datasets is via batch training (i.e. training the model by loading batches of the datasets at a time).
Answered By - BNQ
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.