Issue
I'm trying to pass a numpy array to TSNE in order compress that to 2 columns and after that plotting with seaborn. result is a dataframe that i've read from a csv.
arr=result.to_numpy()
n_components = 2
tsne = TSNE(n_components).fit_transform(arr)
arr.shape
arr's output is like this
'00012_0' array([0.21321961620469082, 0.9433962264150944, 20.0, 0.0, 0.0, 0.0, 0.1984126984126984, 0.014925373134328358, 0.0], dtype=object) 'Resnet' 'Lime' 'Real']
I get the following errors:
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
Input In [11], in <cell line: 30>()
28 #comprimo con TSNE a due colonne
29 n_components = 2
---> 30 tsne = TSNE(n_components).fit_transform(arr)
31 arr.shape
File ~\anaconda3\lib\site-packages\sklearn\manifold\_t_sne.py:1108, in TSNE.fit_transform(self, X, y)
1088 def fit_transform(self, X, y=None):
1089 """Fit X into an embedded space and return that transformed output.
1090
1091 Parameters
(...)
1106 Embedding of the training data in low-dimensional space.
1107 """
-> 1108 embedding = self._fit(X)
1109 self.embedding_ = embedding
1110 return self.embedding_
File ~\anaconda3\lib\site-packages\sklearn\manifold\_t_sne.py:830, in TSNE._fit(self, X, skip_num_points)
819 warnings.warn(
820 "'square_distances' has been introduced in 0.24 to help phase "
821 "out legacy squaring behavior. The 'legacy' setting will be "
(...)
827 FutureWarning,
828 )
829 if self.method == "barnes_hut":
--> 830 X = self._validate_data(
831 X,
832 accept_sparse=["csr"],
833 ensure_min_samples=2,
834 dtype=[np.float32, np.float64],
835 )
836 else:
837 X = self._validate_data(
838 X, accept_sparse=["csr", "csc", "coo"], dtype=[np.float32, np.float64]
839 )
File ~\anaconda3\lib\site-packages\sklearn\base.py:566, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
564 raise ValueError("Validation should be done on X, y or both.")
565 elif not no_val_X and no_val_y:
--> 566 X = check_array(X, **check_params)
567 out = X
568 elif no_val_X and not no_val_y:
File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
744 array = array.astype(dtype, casting="unsafe", copy=False)
745 else:
--> 746 array = np.asarray(array, order=order, dtype=dtype)
747 except ComplexWarning as complex_warning:
748 raise ValueError(
749 "Complex data not supported\n{}\n".format(array)
750 ) from complex_warning
ValueError: setting an array element with a sequence.
I understand that it might be that i'm passing a sequence of values to a single slot but i don't know how change it in order to make it work
Solution
You are right. TSNE will break if you try to pass an array as one element. You should transform all of the values as numbers before passing to TSNE.
Basically if one row has values
['00012_0', array([0.21321961620469082, 0.9433962264150944, 20.0, 0.0, 0.0, 0.0, 0.1984126984126984, 0.014925373134328358, 0.0], dtype=object), 'Resnet', 'Lime', 'Real']
You should process it into something like
[0, 0.21321961620469082, 0.9433962264150944, 20.0, 0.0, 0.0, 0.0, 0.1984126984126984, 0.014925373134328358, 0.0, 0, 0, 0]
where categorical variables have been one-hot-encoded. You can also use some consideration and if there are some variables that are related to id or are constant for the whole data, they can be left out.
Answered By - paloman
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.