Issue
I'm using the category encoder package in Python to use the Weight of Evidence encoder.
After I define an encoder object and fit it to data, the columns I wanted to encode are correctly replaced by their Weight of Evidence (WoE) values, according to which category they belong to.
So my question is, how can I obtain the mapping defined by the encoder? For example, let's say I have a variable with categories "A", "B" and "C". The respective WoE values could be 0.2, -0.4 and 0.02. But how can I know that 0.2 corresponds to the category "A"?
I tried acessing the "mapping" attribute, by using:
encoder = category_encoders.WOEEncoder().fit(X=data[cols], y=data[label_col])
print(encoder.mapping)
It gives me the mapping, but I'm not sure in what order the WoE values are presented. It looks like it's in decreasing order, but that still doesn't answer the category name for each level.
Solution
From the source, you can see that an OrdinalEncoder
(the category_encoder
version, not sklearn
) is used to convert from categories to integers before doing the WoE-encoding. That object is available through the attribute ordinal_encoder
. And those themselves have an attribute mapping
(or category_mapping
) that is a dictionary with the appropriate mapping.
The format of those mapping attributes isn't particularly pleasant, but here's a stab at "composing" the two for a given feature:
from category_encoders import WOEEncoder
from sklearn.datasets import fetch_openml
titanic = fetch_openml('titanic', version=1, as_frame=True)
df = titanic['frame']
woe = WOEEncoder().fit(df[['sex', 'embarked']], df['survived'])
column = "embarked"
woe_map = woe.mapping[column]
ord_map = [
d for d in woe.ordinal_encoder.mapping if d['col'] == column
][0]['mapping']
ord_map.map(woe_map)
# outputs:
# S -0.215117
# C 0.701157
# NaN 1.578280
# Q -0.095696
# dtype: float64
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.