Issue
I'm doing CNN and trying split my data into training and testing datasets. After splitting, I want to use sklearn.preprocessing.StandardScaler
to scale my testing data with the parameters of training data.
So before scaling, I need to split the data. I'm gonna use sklearn.model_selection.train_test_split
, but to use that method I have to convert my data into a pandas.DataFrame
. Since my data are for CNN, their lengths don't meet the requirements of a DataFrame
print(x.shape, delta.shape, z.shape, y.shape, non_spatial_data.shape, p.shape, g.shape)
# (15000, 175) (15000, 175) (15000, 175) (15000, 1225) (15000, 264) (15000, 175) (15000, 175)
The above are the sizes of my data after being flattened. 15000 is the sample size. You can see the lengths of different data are different, which makes me unable to convert them into DataFrame. So how can I do the splitting only using numpy? Or is there any other method to do the whole splitting and scaling process?
PS: The data I am using for CNN are not really images. They are some data with spatial properties.
Solution
Here's working example:
import pandas as pd
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
b = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
c = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
n = 0.2
spl = None
for arr in [a, b, c]:
if spl is None:
rand_ind = np.random.choice(range(len(arr)), len(arr))
spl, remaining = np.split(rand_ind, [int(n * len(rand_ind))])
print([arr[i] for i in spl])
Answered By - svfat
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.