Issue
In theory, an MLP with a single hidden layer with just 3 neurons is enough to predict xor correctly. It could sometimes fail to converge properly, but 4 neurons are a safe bet.
Here's an example
I've tried to reproduce this using sklearn.neural_network.MLPClassifier:
from sklearn import neural_network
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np
x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1
model = neural_network.MLPClassifier(
hidden_layer_sizes=(3,), n_iter_no_change=100,
learning_rate_init=0.01, max_iter=1000
).fit(x_train, y_train)
x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1
prediction = model.predict(x_test)
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
I only get around 0.75 accuracy, while the tensorflow playground model is perfect, any idea what makes the difference?
Tried also using tensorflow:
model = tf.keras.Sequential(layers=[
tf.keras.layers.Input(shape=(2,)),
tf.keras.layers.Dense(4, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.binary_crossentropy)
x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = (tmp[:, 0] ^ tmp[:, 1])
model.fit(x=x_train, y=y_train)
x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = (tmp[:, 0] ^ tmp[:, 1])
prediction = model.predict(x_test) > 0.5
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
With this model I get similar results to the scikit-learn model... So it's not just a scikit-learn issue - am I missing some important hyper-parameter?
Edit
Ok, changed the loss to mean squared error instead of cross-entropy, and now I get with the tensorflow example 0.92 accuracy. I guess that's the problem with the MLPClassifier?
Solution
Increasing the learning rate and/or maximum iterations seems to make the sklearn version work. Probably different solvers need different values for these, and it's not clear to me what the tf playground is using.
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.