Issue
I am having some troubles to plot the results from a One-class SVM that I have programmed. I have tried different examples found on the web, but with no good results at all. I have the following small dataset where id is the identification of a sample and f1 to f9 are certain features:
id,f1,f2,f3,f4,f5,f6,f7,f8,f9
d1,0,0,0,0,0,0,0,0.045454545,0
d2,0.047619048,0,0,0.047619048,0,0.047619048,0,0.047619048,0.047619048
d3,0,0,0,0.045454545,0,0,0,0,0
d4,0,0.045454545,0,0.045454545,0,0,0,0.045454545,0.045454545
d5,0,0,0,0,0,0,0,0,0
d6,0,0.045454545,0,0,0,0,0,0.045454545,0
d7,0,0,0,0,0,0,0.045454545,0,0
d8,0,0,0,0.045454545,0,0,0,0,0
d9,0,0,0,0.045454545,0,0,0,0,0
d10,0,0,0,0.045454545,0,0,0,0,0
d11,0,0,0,0.045454545,0,0,0,0,0
d12,0.045454545,0,0,0.045454545,0.045454545,0.045454545,0,0.045454545,0
d13,0,0,0,0.045454545,0,0,0,0.045454545,0.045454545
d14,0,0,0,0.045454545,0.045454545,0,0,0,0
d15,0,0,0,0,0,0,0,0.047619048,0.047619048
d16,0,0,0,0,0,0,0,0.045454545,0
d17,0,0,0.045454545,0,0,0,0,0,0.045454545
d18,0,0,0,0,0,0,0,0,0
d19,0.045454545,0,0.090909091,0,0,0,0.090909091,0,0
d20,0,0,0,0.090909091,0,0,0.045454545,0.045454545,0.045454545
d21,0,0,0.045454545,0.045454545,0,0.045454545,0.045454545,0,0
d22,0,0.090909091,0,0,0,0.045454545,0,0,0.045454545
d23,0,0.047619048,0,0.047619048,0,0,0,0.047619048,0.095238095
d24,0,0,0,0,0,0.045454545,0.045454545,0.045454545,0
d25,0,0,0,0,0,0,0,0.043478261,0
d26,0,0,0,0,0.043478261,0,0.043478261,0.043478261,0
d27,0.043478261,0,0,0.043478261,0,0,0.043478261,0.043478261,0
My code is the following:
import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
from sklearn.svm import OneClassSVM
from sklearn import preprocessing
listDrop=['id']
df1=df.drop(listDrop,axis="columns")
colNames=list(df1.columns.values)
min_max_scaler=preprocessing.MinMaxScaler()
x_scaled=min_max_scaler.fit_transform(df1)
df1[colNames]=x_scaled
svm = OneClassSVM(kernel='rbf', nu=0.2, gamma=1e-04)
svm.fit(df1)
pred=svm.predict(df1)
listA=[i+1 for i,x in enumerate(pred) if x == -1]
listB=[i+1 for i,x in enumerate(pred) if x == 1]
xx, yy = np.meshgrid(np.linspace(-5, 5, 1), np.linspace(-5, 5, 7500))
Xpred=np.array([xx.ravel(),yy.ravel()]+ [np.repeat(0, xx.ravel().size) for _ in range(7)]).T
Z = svm.decision_function(Xpred).reshape(xx.shape)
assert len(Z) == (len(xx) * len(yy))
Z = np.array(Z)
Z = Z.reshape(xx.shape)((len(xx), len(yy)))
a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.Blues_r)
b1 = plt.scatter(pred[:, 0], pred[:, 1], c='red')
b3 = plt.scatter(listB[:,0], listB[:, 1], c="green")
plt.legend([a.collections[0],b1,b3],
["learned frontier", "test","outliers"],
loc="lower right",
prop=matplotlib.font_manager.FontProperties(size=11))
I would like to get a plot like the following:
I found this code on the web, and I was playing with the following lines:
Xpred=np.array([xx.ravel(),yy.ravel()]+ [np.repeat(0, xx.ravel().size) for _ in range(7)]).T
This because it was throwing me an error about the dimensions, and I read that because it is a 2d plot and I have 9 features I should fill the remaining ones with any data.
Also I added the part of the assert, but I got an error:
assert len(Z) == (len(xx) * len(yy))
AssertionError
How can I plot the results from this one class SVM, it only returns an array composed of 1 and -1 like the following:
[ 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1 -1 -1 -1 -1 -1 -1
1 -1 -1]
Solution
The standard approach is to use t-SNE to reduce the dimensionality of the data for visualization purposes. Once you have reduced the data to two dimensions you can easily replicate the visualization in the scikit-learn tutorial, see the code below for an example.
import pandas as pd
import numpy as np
from sklearn.svm import OneClassSVM
from sklearn.preprocessing import MinMaxScaler
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# load the data
df = pd.read_csv('data.csv')
x = df.drop(labels='id', axis=1).values
# rescale the data
x_scaled = MinMaxScaler().fit_transform(x)
# reduce the data to 2 dimensions using t-SNE
x_reduced = TSNE(n_components=2, random_state=0).fit_transform(x_scaled)
# fit the model to the reduced data
svm = OneClassSVM(kernel='rbf', nu=0.2, gamma=1e-04)
svm.fit(x_reduced)
# extract the model predictions
x_predicted = svm.predict(x_reduced)
# define the meshgrid
x_min, x_max = x_reduced[:, 0].min() - 5, x_reduced[:, 0].max() + 5
y_min, y_max = x_reduced[:, 1].min() - 5, x_reduced[:, 1].max() + 5
x_ = np.linspace(x_min, x_max, 500)
y_ = np.linspace(y_min, y_max, 500)
xx, yy = np.meshgrid(x_, y_)
# evaluate the decision function on the meshgrid
z = svm.decision_function(np.c_[xx.ravel(), yy.ravel()])
z = z.reshape(xx.shape)
# plot the decision function and the reduced data
plt.contourf(xx, yy, z, cmap=plt.cm.PuBu)
a = plt.contour(xx, yy, z, levels=[0], linewidths=2, colors='darkred')
b = plt.scatter(x_reduced[x_predicted == 1, 0], x_reduced[x_predicted == 1, 1], c='white', edgecolors='k')
c = plt.scatter(x_reduced[x_predicted == -1, 0], x_reduced[x_predicted == -1, 1], c='gold', edgecolors='k')
plt.legend([a.collections[0], b, c], ['learned frontier', 'regular observations', 'abnormal observations'], bbox_to_anchor=(1.05, 1))
plt.axis('tight')
plt.show()
Answered By - Flavia Giammarino
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.