Sunday, November 20, 2022

[FIXED] Target encoding multiple columns in pandas python

November 20, 2022 pandas, python No comments

Issue

I have the following table.

id col1 col2 col3 col4  target
1    A    B  A    101   1
2    B    B  A    191   1
3    A    B  A     81   0 
4    C    B  C     67   1
5    B    C  C      3   0

I want to target encode every column except col4.

Expected Output:

e1    e2     e3     target
0.5   0.75   0.667    1
0.5   0.75   0.667    1
0.5   0.75   0.667    0
1.0   0.75   0.5      1
0.5   0.00   0.5      0

EDIT: For each column of col1, col2, col3 I want to get the target encodings.

For example, in col3, A appears 3 times and 2/3 times it has a target of 1. thus the encoding will be 0.667 for A. Similarly for C it will be 0.5 in col3.

I've tried something like this one for one column:

encodings = df.groupby('col1')['target'].mean().reset_index()
df = df.merge(encodings, how = 'left', on = 'col1')
df.drop('col1', axis = 1, inplace = TRUE)

Solution

update after clarification:

You need to use the same approach as in your original attempt, but using map

df.update(df[['col1', 'col2', 'col3']]
          .apply(lambda s: s.map(df['target'].groupby(s).mean()))
          )

output:

   id col1  col2      col3  col4  target
0   1  0.5  0.75  0.666667   101       1
1   2  0.5  0.75  0.666667   191       1
2   3  0.5  0.75  0.666667    81       0
3   4  1.0  0.75       0.5    67       1
4   5  0.5   0.0       0.5     3       0

older answer prior to OP clarification

IIUC, you want to map the normalized value_counts:

df[['col1', 'col2', 'col3']].apply(lambda s: s.map(s.value_counts(normalize=True)))

output:

   col1  col2  col3
0   0.4   0.8   0.6
1   0.4   0.8   0.6
2   0.4   0.8   0.6
3   0.2   0.8   0.4
4   0.4   0.2   0.4

updating the data in place:

df.update(df[['col1', 'col2', 'col3']]
          .apply(lambda s: s.map(s.value_counts(normalize=True)))
          )

updated DataFrame:

   id col1 col2 col3  col4  target
0   1  0.4  0.8  0.6   101       1
1   2  0.4  0.8  0.6   191       1
2   3  0.4  0.8  0.6    81       0
3   4  0.2  0.8  0.4    67       1
4   5  0.4  0.2  0.4     3       0

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 20, 2022

[FIXED] Target encoding multiple columns in pandas python

Issue

Solution

update after clarification:

older answer prior to OP clarification

updating the data in place:

0 comments:

Post a Comment

Popular Posts

Labels