Issue
Official Guide
- I am trying to use the official scikitlern up to date example code for StratifiedKFold
>>> import numpy as np
>>> from sklearn.model_selection import StratifiedKFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> skf = StratifiedKFold(n_splits=2)
>>> skf.get_n_splits(X, y)
2
>>> print(skf)
StratifiedKFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in skf.split(X, y):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 3] TEST: [0 2]
TRAIN: [0 2] TEST: [1 3]
MY CODE
- I keep all my date in 2 pandas data frame X,y in integer and float values
skf = StratifiedKFold(n_splits=4) # shuffle=True, random_state=1
for train_index, test_index in skf.split(X, y):
X_train = X[train_index]
X_test = X[test_index]
y_train = y[train_index]
y_test = y[test_index]
print("TRAIN:", train_index, "TEST:", test_index)
ERROR
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-12-2776afce57e9> in <module>
2
3 for train_index, test_index in skf.split(X, y):
----> 4 X_train = X[train_index]
5 X_test = X[test_index]
6 y_train = y[train_index]
~/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
2906 if is_iterator(key):
2907 key = list(key)
-> 2908 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
2909
2910 # take() does not accept boolean indexers
~/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1253
-> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1255 return keyarr, indexer
1256
~/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1296 if missing == len(indexer):
1297 axis_name = self.obj._get_axis_name(axis)
-> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1299
1300 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Int64Index([ 785015, 785016, 785017, 785018, 785019, 785020, 785021,\n 785022, 785023, 785024,\n ...\n 3140252, 3140253, 3140254, 3140255, 3140256, 3140257, 3140258,\n 3140259, 3140260, 3140261],\n dtype='int64', length=2355196)] are in the [columns]"
Solutions that I have tried
- he has the error in different place - Key Error: None of [Int64Index...] dtype='int64] are in the columns
- no answer and no error message - KeyError: "None of [Int64Index([2, 3], dtype='int64')] are in the [columns]"
- different code, differnet, data storage at the end - Separate pandas dataframe using sklearn's KFold
Solution
In this post they answer it in a different way a bit but one of the comments answers my questions.
Receiving KeyError: "None of [Int64Index([ ... dtype='int64', length=1323)] are in the [columns]" @bubble
It have to be Numpy vectorized not a data frame when you load in your data.
X = mydataframe.drop(['acol','bcol'], axis=1).values
y = mydataframe['targetvalue'].values
Answered By - sogu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.