Issue
I am not able to understand why train_test_split is throwing Type error. Upon checking docs, it requires an array, which is what y is "numpy array".
from sklearn.model_selection import train_test_split
# create X and y
X = cvd_patient_data.drop("CVDriskindicator",axis=1)
y = tf.one_hot(cvd_patient_data["CVDriskindicator"],depth=5)
# Create train nd test data
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
X_train.shape, y_train.shape
here's op by checking datatype for "y":
<tf.Tensor: shape=(302, 5), dtype=float32, numpy=
array([[0., 0., 1., 0., 0.],
[0., 1., 0., 0., 0.],
[1., 0., 0., 0., 0.],
...,
[0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0.],
[1., 0., 0., 0., 0.]], dtype=float32)>
Error description from train_test_split:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-d0bc4bd8803a> in <module>
1 # Create train nd test data
----> 2 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
3 X_train.shape, y_train.shape
5 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py in _check_index(idx)
905 # TODO(slebedev): IndexError seems more appropriate here, but it
906 # will break `_slice_helper` contract.
--> 907 raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx))
908
909
TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got array([132, 202, 196, 75, 176, 59, 93, 6, 177, 30, 22, 258, 56,
242, 114, 286, 281, 197, 158, 164, 244, 84, 66, 113, 167, 250,
19, 143, 79, 144, 124, 72, 15, 10, 163, 155, 97, 68, 229,
37, 16, 126, 290, 272, 67, 108, 69, 31, 178, 154, 230, 294,
18, 185, 96, 183, 148, 86, 253, 288, 206, 287, 170, 234, 211,
55, 186, 297, 210, 129, 38, 239, 173, 140, 112, 172, 117, 279,
273, 165, 180, 182, 2, 115, 147, 181, 120, 215, 262, 127, 74,
29, 83, 248, 107, 157, 208, 133, 194, 221, 65, 203, 85, 218,
159, 12, 35, 28, 142, 195, 131, 226, 51, 95, 213, 225, 41,
89, 222, 136, 26, 295, 141, 238, 0, 285, 274, 100, 261, 103,
171, 98, 36, 61, 150, 264, 233, 247, 11, 298, 200, 269, 27,
224, 4, 122, 32, 209, 162, 237, 259, 138, 62, 135, 128, 292,
8, 70, 266, 64, 44, 240, 156, 40, 123, 277, 216, 153, 23,
263, 110, 81, 207, 212, 39, 245, 293, 260, 199, 14, 47, 94,
265, 227, 275, 201, 161, 43, 217, 145, 190, 220, 256, 3, 105,
53, 1, 49, 80, 205, 34, 91, 52, 241, 13, 88, 166, 296,
134, 289, 243, 54, 50, 174, 189, 300, 187, 169, 58, 48, 235,
252, 21, 160, 276, 191, 257, 149, 130, 151, 99, 87, 214, 121,
301, 20, 188, 71, 106, 270, 102])
Solution
y
is not a numpy array but a tf tensor. Try:
y = tf.one_hot(cvd_patient_data["CVDriskindicator"],depth=5).numpy()
Answered By - LucG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.