Wednesday, December 15, 2021

[FIXED] One Hot Encoder- Classification by categories

December 15, 2021 numpy, one-hot-encoding, python No comments

Issue

For model training, I have a vector with repeating values (numbers) I want to divide this vector into 10 different categories by number proximity (a kind of clustring) so that my output will be N * 10 (N is the number of values in the vector) of sparse matrices, which each time I get 1 in the index of the correct category.

Here is my code:

a = np.array([1, 8, 7, 6, 5, 8,
              95, 44, 778, 12, 25, 
              12, 12, 65, 47, 85,
              32, 99, 88])
a = a.reshape(-1, 1)

max_a = np.max(a)  # 99
min_a = np.min(a)  # 1

div = (max_a - min_a) / 10  # 9.8

for i in range(a.shape[0]):
    x = 1
    while a[i] > (min_a + x * div):
        x = x + 1 
    a[i] = x
# a = [1,1,1,1,1,1,1,10,5,8,2,3,2,2,7,5,9,4,10,9]

onehot_a = OneHotEncoder(sparse=False)
a = onehot_a.fit_transform(a)

print(a.shape)  # (20,9)

But I want the shape of the output to be (20,10). Where am I wrong?

Solution

using np.digitize and this answer:

a = np.array([1, 8, 7, 6, 5, 8,
              95, 44, 78, 12, 25, #had a typo in this line
              12, 12, 65, 47, 85,
              32, 99, 88])

def onehot(a, bins = 10):
    return np.eye(bins)[np.digitize(a, np.linspace(a.min(), a.max(), bins))-1]

onehot(a)
Out[]: 
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])

Answered By - Daniel F

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 15, 2021

[FIXED] One Hot Encoder- Classification by categories

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels