Issue
I am currently working on a logistic regression model to predict the outcome of certain trades. This model classifies trades in the test set as good/bad (1/0). I want to see which trades are being classified in each group and multiply the trades classified as "good" by its profit/loss to find out if the logistic regression model is actually profitable. Is there any way I am able to view row-specific info of the entries that the model classifies as True/False?
This is what my code looks like for my data scaling and splitting into train/test set:
x = df[x_train_features]
y = df["y"]
y = y.astype("int")
# scale data
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)
# split training data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
# instantiate the model (using the default parameters)
logreg = LogisticRegression()
# fit the model with data
logreg.fit(X_train, y_train)
y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)
I tried to use df.loc[y_pred_test == True]
, but I get the error:
Boolean index has wrong length: 720 instead of 2880
most likely because the test set is smaller than the whole sample set.
Solution
The error is because you haven't concatenated your prediction values with the df. You might try this:
y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis =1)
This will combine your prediction values with the ground truth. Then you can try the following:
X_test.iloc[y_pred_test == True]
And as you haven't predicted on the whole dataset (df) that's why you are getting the error that the number of rows in y_pred_test are 720 and not 2880.
Answered By - Harshad Patil
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.