Issue
I want to use the same in Python. Is there any way to groupby a df in pandas and then add a column of values from the regression model to the df all at once? What is the smartest way?
# This is R. Very smart and useful.
library(tidyverse)
iris_lm <- iris %>% group_by(Species) %>%
do(lm.res = lm(Petal.Length ~ Sepal.Length, data = .)) %>%
mutate(coe = lm.res$coefficients[2])
iris_lm
# A tibble: 3 x 3
# Rowwise:
Species lm.res coe
<fct> <list> <dbl>
1 setosa <lm> 0.132
2 versicolor <lm> 0.686
3 virginica <lm> 0.750
Solution
You can do this similary with statsmodels
using the most common pattern in pandas
: groupby()
.apply()
Setting up the dataset from sklearn
import pandas as pd
from sklearn.datasets import load_iris
import statsmodels.formula.api as sm
iris = load_iris()
df = pd.DataFrame(iris.data, columns=['sepal_length','sepal_width','petal_length','petal_width'])
Using the sm.ols
api
(df.assign(species = iris.target_names[iris.target])
.groupby('species')
.apply(lambda x: sm.ols('petal_length ~ sepal_length', x).fit().params[1])
.reset_index(name='coe'))
Output
species coe
0 setosa 0.131632
1 versicolor 0.686470
2 virginica 0.750081
Answered By - Michael Szczesny
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.