Issue
I have 50 variables in my dataframe. 46 are dependant variables and 4 are independandt variables (precipitation, temperature, dew, snow). I want to calculate the mutual information of my dependant variables agaisnt my independant.
So in the end i want a dataframe like this
Right now i am calculating it using the following but it's taking so long because i have to change my y each time
X = df[['Temperature', 'Precipitation','Dew','Snow']] # Features
y = df[['N0037']] #target
from sklearn.feature_selection import mutual_info_regression
mi = mutual_info_regression(X, y)
mi /= np.max(mi)
mi = pd.Series(mi)
mi.index = X.columns
mi.sort_values(ascending=False)
mi
Solution
Using list comprehension:
indep_vars = ['Temperature', 'Precipitation', 'Dew', 'Snow'] # set independent vars
dep_vars = df.columns.difference(indep_vars).tolist() # set dependent vars
from sklearn.feature_selection import mutual_info_regression as mi_reg
df_mi = pd.DataFrame([mi_reg(df[indep_vars], df[dep_var]) for dep_var in dep_vars], index = dep_vars, columns = indep_vars).apply(lambda x: x / x.max(), axis = 1)
Answered By - Always Right Never Left
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.