Issue
I'm unable to use polars dataframes with scikitlearn for ML training.
Currently I'm doing all the dataframe preprocessing in polars and during model training i'm converting it into a pandas one in order for it to work.
Is there any method to directly use polars dataframe as it is for ML training without changing it to pandas?
Solution
You must call to_numpy
when passing a DataFrame
to sklearn. Though sometimes sklearn
can work on polars Series
it is still good type hygiene to transform to the type the host library expects.
import polars as pl
from sklearn.linear_model import LinearRegression
data = pl.DataFrame(
np.random.randn(100, 5)
)
x = data.select([
pl.all().exclude("column_0"),
])
y = data.select(pl.col("column_0").alias("y"))
x_train = x[:80]
y_train = y[:80]
x_test = x[80:]
y_test = y[80:]
m = LinearRegression()
m.fit(X=x_train.to_numpy(), y=y_train.to_numpy())
m.predict(x_test.to_numpy())
Answered By - ritchie46
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.