Issue
The following code
from sklearn.impute import SimpleImputer
import pandas as pd
df = pd.DataFrame(dict(
x=[1, 2, np.nan],
y=[2, np.nan, 0]
))
SimpleImputer().fit_transform(df)
Returns
array([[1. , 2. ],
[2. , 1. ],
[1.5, 0. ]])
Is there a way to use an imputer that returns a pandas dataframe instead of a numpy array? Is there a scikit-learn implementation for that?
Solution
If you want to keep the columns (e.g. for using ColumnTransformers in a later step), you can create a wrapper around SimpleImputer
:
df = pd.DataFrame({"A": [1, 2, np.NaN], "B": [3, np.NaN, 4], "C": [np.NaN, 5, 6]})
class PandasSimpleImputer(SimpleImputer):
"""A wrapper around `SimpleImputer` to return data frames with columns.
"""
def fit(self, X, y=None):
self.columns = X.columns
return super().fit(X, y)
def transform(self, X):
return pd.DataFrame(super().transform(X), columns=self.columns)
PandasSimpleImputer().fit_transform(df)
>>>
A B C
0 1.0 3.0 5.5
1 2.0 3.5 5.0
2 1.5 4.0 6.0
Answered By - nocibambi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.