Issue
Guys I am new to deep learning. I was training a DNN on US-Adult Income dataset
Where did I actually go wrong? and another question I want to test my model on a different data set How do I actually do it ?
This is my code:
import pandas as pd
input_data = pd.read_csv('adult.data.csv')
def label_fix(label):
if label == '<=50K':
return 0
else:
return 1
input_data['Income'] = input_data['Income'].apply(label_fix)
from sklearn.model_selection import train_test_split
x_data = input_data.drop('Income',axis = 1)
y_labels = input_data['Income']
X_train,X_test,y_train,y_test = train_test_split(x_data,y_labels,test_size= 0.3,random_state=101)
import tensorflow as tf
Age = tf.feature_column.numeric_column('Age')
Job_class = tf.feature_column.categorical_column_with_hash_bucket('Job-
Class',hash_bucket_size=1000)
fnlwgt = tf.feature_column.numeric_column('fnlwgt')
Education = tf.feature_column.categorical_column_with_hash_bucket('Education',hash_bucket_size=1000)
Education_num = tf.feature_column.numeric_column('Education-num')
Status = tf.feature_column.categorical_column_with_hash_bucket('Status',hash_bucket_size=1000)
Designation = tf.feature_column.categorical_column_with_hash_bucket('Designation',hash_bucket_size=1000)
Marital = tf.feature_column.categorical_column_with_hash_bucket('Marital',hash_bucket_size=1000)
Colour = tf.feature_column.categorical_column_with_vocabulary_list('Colour',['White', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other', 'Black'])
Gender = tf.feature_column.categorical_column_with_vocabulary_list('Gender',['Male','Female'])
Capital_gain = tf.feature_column.numeric_column('capital-gain')
Capital_loss = tf.feature_column.numeric_column('capital-loss')
Hours = tf.feature_column.numeric_column('hours-per-week')
Native_country = tf.feature_column.categorical_column_with_hash_bucket('Native-Country',hash_bucket_size=1000)
Income = tf.feature_column.numeric_column('Income')
feats_cols = [Age,Job_class,fnlwgt,Education,Education_num,Status,Designation,Marital,Colour,Gender,Capital_gain,Capital_loss,Hours,Native_country]
model = tf.estimator.LinearClassifier(feature_columns=feats_cols)
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=100,num_epochs=None,shuffle=True)
model.train(input_fn=input_func,steps = 5000)
INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Saving checkpoints for 1 into C:\Users\micha\AppData\Local\Temp\tmpj2usekuf\model.ckpt. INFO:tensorflow:loss = 69.31474, step = 1 INFO:tensorflow:global_step/sec: 149.21 INFO:tensorflow:loss = 0.0, step = 101 (0.676 sec) INFO:tensorflow:global_step/sec: 189.379 INFO:tensorflow:loss = 0.0, step = 201 (0.528 sec) INFO:tensorflow:global_step/sec: 179.441 INFO:tensorflow:loss = 0.0, step = 301 (0.551 sec) INFO:tensorflow:global_step/sec: 170.941 INFO:tensorflow:loss = 0.0, step = 401 (0.585 sec) INFO:tensorflow:global_step/sec: 176.699 INFO:tensorflow:loss = 0.0, step = 501 (0.574 sec) INFO:tensorflow:global_step/sec: 196.918 INFO:tensorflow:loss = 0.0, step = 601 (0.505 sec) INFO:tensorflow:global_step/sec: 186.552 INFO:tensorflow:loss = 0.0, step = 701 (0.536 sec) INFO:tensorflow:global_step/sec: 195.329 INFO:tensorflow:loss = 0.0, step = 801 (0.515 sec) INFO:tensorflow:global_step/sec: 174.856 INFO:tensorflow:loss = 0.0, step = 901 (0.569 sec) INFO:tensorflow:global_step/sec: 176.354 INFO:tensorflow:loss = 0.0, step = 1001 (0.562 sec) INFO:tensorflow:global_step/sec: 168.888 INFO:tensorflow:loss = 0.0, step = 1101 (0.592 sec) INFO:tensorflow:global_step/sec: 171.54 INFO:tensorflow:loss = 0.0, step = 1201 (0.600 sec) INFO:tensorflow:global_step/sec: 171.716 INFO:tensorflow:loss = 0.0, step = 1301 (0.573 sec) INFO:tensorflow:global_step/sec: 178.132 INFO:tensorflow:loss = 0.0, step = 1401 (0.558 sec) INFO:tensorflow:global_step/sec: 180.651 INFO:tensorflow:loss = 0.0, step = 1501 (0.549 sec) INFO:tensorflow:global_step/sec: 175.073 INFO:tensorflow:loss = 0.0, step = 1601 (0.580 sec) INFO:tensorflow:global_step/sec: 177.171 INFO:tensorflow:loss = 0.0, step = 1701 (0.556 sec) INFO:tensorflow:global_step/sec: 173.214 INFO:tensorflow:loss = 0.0, step = 1801 (0.594 sec) INFO:tensorflow:global_step/sec: 165.829 INFO:tensorflow:loss = 0.0, step = 1901 (0.586 sec) INFO:tensorflow:global_step/sec: 175.255 INFO:tensorflow:loss = 0.0, step = 2001 (0.571 sec) INFO:tensorflow:global_step/sec: 171.048 INFO:tensorflow:loss = 0.0, step = 2101 (0.593 sec) INFO:tensorflow:global_step/sec: 181.424 INFO:tensorflow:loss = 0.0, step = 2201 (0.548 sec) INFO:tensorflow:global_step/sec: 175.714 INFO:tensorflow:loss = 0.0, step = 2301 (0.569 sec) INFO:tensorflow:global_step/sec: 166.801 INFO:tensorflow:loss = 0.0, step = 2401 (0.594 sec) INFO:tensorflow:global_step/sec: 173.364 INFO:tensorflow:loss = 0.0, step = 2501 (0.580 sec) INFO:tensorflow:global_step/sec: 169.802 INFO:tensorflow:loss = 0.0, step = 2601 (0.587 sec) INFO:tensorflow:global_step/sec: 175.314 INFO:tensorflow:loss = 0.0, step = 2701 (0.569 sec) INFO:tensorflow:global_step/sec: 172.503 INFO:tensorflow:loss = 0.0, step = 2801 (0.585 sec) INFO:tensorflow:global_step/sec: 184.231 INFO:tensorflow:loss = 0.0, step = 2901 (0.545 sec) INFO:tensorflow:global_step/sec: 184.926 INFO:tensorflow:loss = 0.0, step = 3001 (0.537 sec) INFO:tensorflow:global_step/sec: 189.303 INFO:tensorflow:loss = 0.0, step = 3101 (0.526 sec) INFO:tensorflow:global_step/sec: 188.679 INFO:tensorflow:loss = 0.0, step = 3201 (0.536 sec) INFO:tensorflow:global_step/sec: 184.756 INFO:tensorflow:loss = 0.0, step = 3301 (0.552 sec) INFO:tensorflow:global_step/sec: 184.09 INFO:tensorflow:loss = 0.0, step = 3401 (0.534 sec) INFO:tensorflow:global_step/sec: 176.366 INFO:tensorflow:loss = 0.0, step = 3501 (0.559 sec) INFO:tensorflow:global_step/sec: 178.401 INFO:tensorflow:loss = 0.0, step = 3601 (0.567 sec) INFO:tensorflow:global_step/sec: 192.295 INFO:tensorflow:loss = 0.0, step = 3701 (0.523 sec) INFO:tensorflow:global_step/sec: 190.446 INFO:tensorflow:loss = 0.0, step = 3801 (0.526 sec) INFO:tensorflow:global_step/sec: 181.776 INFO:tensorflow:loss = 0.0, step = 3901 (0.546 sec) INFO:tensorflow:global_step/sec: 174.088 INFO:tensorflow:loss = 0.0, step = 4001 (0.577 sec) INFO:tensorflow:global_step/sec: 182.692 INFO:tensorflow:loss = 0.0, step = 4101 (0.546 sec) INFO:tensorflow:global_step/sec: 189.383 INFO:tensorflow:loss = 0.0, step = 4201 (0.526 sec) INFO:tensorflow:global_step/sec: 183.433 INFO:tensorflow:loss = 0.0, step = 4301 (0.556 sec) INFO:tensorflow:global_step/sec: 169.08 INFO:tensorflow:loss = 0.0, step = 4401 (0.576 sec) INFO:tensorflow:global_step/sec: 170.028 INFO:tensorflow:loss = 0.0, step = 4501 (0.594 sec) INFO:tensorflow:global_step/sec: 173.793 INFO:tensorflow:loss = 0.0, step = 4601 (0.574 sec) INFO:tensorflow:global_step/sec: 177.173 INFO:tensorflow:loss = 0.0, step = 4701 (0.561 sec) INFO:tensorflow:global_step/sec: 172.853 INFO:tensorflow:loss = 0.0, step = 4801 (0.583 sec) INFO:tensorflow:global_step/sec: 179.073 INFO:tensorflow:loss = 0.0, step = 4901 (0.561 sec) INFO:tensorflow:Saving checkpoints for 5000 into C:\Users\micha\AppData\Local\Temp\tmpj2usekuf\model.ckpt. INFO:tensorflow:Loss for final step: 0.0. Out[127]:
pred_fn = tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=len(X_test),shuffle=False)
predictions = list(model.predict(input_fn=pred_fn))
final_preds=[]
for pred in predictions:
final_preds.append(pred['class_ids'][0])
from sklearn.metrics import classification_report
print(classification_report(y_test,final_preds))
precision recall f1-score support 1 1.00 1.00 1.00 9769
avg / total 1.00 1.00 1.00 9769
Solution
There is a bug in your method label_fix
. Because the <=50K
value is always prefixed with a space (<=50K
), label_fix
will always return 1 resulting in perfect recall and precision. If you fix your method to handle the leading space, you will get more reasonable precision and recalls
def label_fix(label):
if label.strip().strip('.') == '<=50K':
return 0
else:
return 1
After fitting your model, you can then use it to predict the income for the adult.test
data file as follows:
test_data = pd.read_csv('adult.test.csv')
test_data['income'] = test_data['income'].apply(label_fix)
y_test_data = test_data['income']
pred_fn = tf.estimator.inputs.pandas_input_fn(x=test_data,batch_size=len(test_data),shuffle=False)
predictions = list(model.predict(input_fn=pred_fn))
final_preds_test_data=[]
for pred in predictions:
final_preds_test_data.append(pred['class_ids'][0])
print(classification_report(y_test_data,final_preds_test_data))
Please note that I had to add strip('.')
to label_fix
method as the test file has a slightly different format for the income column:
Answered By - Alex
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.