Issue
I have the following table.
id col1 col2 col3 col4 target
1 A B A 101 1
2 B B A 191 1
3 A B A 81 0
4 C B C 67 1
5 B C C 3 0
I want to target encode every column except col4
.
Expected Output:
e1 e2 e3 target
0.5 0.75 0.667 1
0.5 0.75 0.667 1
0.5 0.75 0.667 0
1.0 0.75 0.5 1
0.5 0.00 0.5 0
EDIT:
For each column of col1
, col2
, col3
I want to get the target encodings.
For example, in col3, A appears 3 times and 2/3 times it has a target of 1. thus the encoding will be 0.667 for A. Similarly for C it will be 0.5 in col3.
I've tried something like this one for one column:
encodings = df.groupby('col1')['target'].mean().reset_index()
df = df.merge(encodings, how = 'left', on = 'col1')
df.drop('col1', axis = 1, inplace = TRUE)
Solution
update after clarification:
You need to use the same approach as in your original attempt, but using map
df.update(df[['col1', 'col2', 'col3']]
.apply(lambda s: s.map(df['target'].groupby(s).mean()))
)
output:
id col1 col2 col3 col4 target
0 1 0.5 0.75 0.666667 101 1
1 2 0.5 0.75 0.666667 191 1
2 3 0.5 0.75 0.666667 81 0
3 4 1.0 0.75 0.5 67 1
4 5 0.5 0.0 0.5 3 0
older answer prior to OP clarification
IIUC, you want to map
the normalized value_counts
:
df[['col1', 'col2', 'col3']].apply(lambda s: s.map(s.value_counts(normalize=True)))
output:
col1 col2 col3
0 0.4 0.8 0.6
1 0.4 0.8 0.6
2 0.4 0.8 0.6
3 0.2 0.8 0.4
4 0.4 0.2 0.4
updating the data in place:
df.update(df[['col1', 'col2', 'col3']]
.apply(lambda s: s.map(s.value_counts(normalize=True)))
)
updated DataFrame:
id col1 col2 col3 col4 target
0 1 0.4 0.8 0.6 101 1
1 2 0.4 0.8 0.6 191 1
2 3 0.4 0.8 0.6 81 0
3 4 0.2 0.8 0.4 67 1
4 5 0.4 0.2 0.4 3 0
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.