Issue
I have a mnist dataset as a .mat
file, and want to split train and test data with sklearn. sklearn reads the .mat file as below:
{'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sat Oct 8 18:13:47 2016',
'__version__': '1.0',
'__globals__': [],
'train_fea1': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
'train_gnd1': array([[ 1],
[ 1],
[ 1],
...,
[10],
[10],
[10]], dtype=uint8),
'test_fea1': array([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
...,
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 64, 0, 0],
[ 0, 0, 0, ..., 25, 0, 0]], dtype=uint8),
'test_gnd1': array([[ 1],
[ 1],
[ 1],
...,
[10],
[10],
[10]], dtype=uint8)}
How to do that?
Solution
I am guessing you meant you loaded the .mat
data file into Python using scipy
not sklearn
. Essentially, a .mat
data file can be loaded like so:
import scipy.io
scipy.io.loadmat('your_dot_mat_file')
scipy
reads this as a Python dictionary. So in your case, the data you read is split into train: train_fea1
, having train-label train_gnd1
and test: test_fea1
having test-label test_gnd1
.
To access your data, you can:
import scipy.io as sio
data = sio.loadmat('filename.mat')
train = data['train_fea1']
trainlabel = data['train_gnd1']
test = data['test_fea1']
testlabel = data['test_gnd1']
If you however, what to split your data using sklearn
's train-test-split
, you can first combine features and labels from your data, then randomly split like so (after loading data as above):
import numpy as np
from sklearn.model_selection import train_test_split
X = np.vstack((train,test))
y = np.vstack((trainlabel, testlabel))
X_train, X_test, y_train, y_test = train_test_split(X, y, \
test_size=0.2, random_state=42) #random seed for reproducible split
Answered By - arilwan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.