Issue
I have been a Tensorflow user and start to use Pytorch. As a trial, I implemented simple classification tasks with both libraries.
However, PyTorch is much slower than Tensorflow: Pytorch takes 42min while TensorFlow 11min. I referred to PyTorch official Tutorial, and made only little change from it.
Could anyone share some advice for this problem?
Here is the summary what I tried.
environment: Colab Pro+
dataset: Cifar10
classifier: VGG16
optimizer: Adam
loss: crossentropy
batch size: 32
PyTorch
Code:
import torch, torchvision
from torch import nn
from torchvision import transforms, models
from tqdm import tqdm
import time, copy
trans = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),])
data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'), transform=trans, download=True) for phase in ['train', 'test']}
dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}
def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'test']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in tqdm(iter(dataloaders[phase])):
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloaders[phase])
epoch_acc = running_corrects.double() / len(dataloaders[phase])
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'test' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = models.vgg16(pretrained=False)
model = model.to(device)
model = train_model(model=model,
criterion=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters(), lr=0.001),
dataloaders=dataloaders,
device=device,
)
Result:
Epoch 0/4
----------
0%| | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
100%|██████████| 1563/1563 [07:50<00:00, 3.32it/s]
train Loss: 75.5199 Acc: 3.2809
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.7274 Acc: 3.1949
Epoch 1/4
----------
100%|██████████| 1563/1563 [07:50<00:00, 3.33it/s]
train Loss: 73.8162 Acc: 3.2514
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]
test Loss: 73.6114 Acc: 3.1949
Epoch 2/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7741 Acc: 3.1369
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.5873 Acc: 3.1949
Epoch 3/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7493 Acc: 3.1331
100%|██████████| 313/313 [00:38<00:00, 8.12it/s]
test Loss: 73.6191 Acc: 3.1949
Epoch 4/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7289 Acc: 3.1939
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]test Loss: 73.5955 Acc: 3.1949
Training complete in 42m 22s
Best val Acc: 3.194888
Tensorflow
Code:
import tensorflow_datasets as tfds
from tensorflow.keras import applications, models
import tensorflow as tf
import time
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
image = tf.expand_dims(image,0)
label = tf.one_hot(label,10)
label = tf.expand_dims(label,0)
return (image, label)
ds_train_ = ds_train.map(resize)
ds_test_ = ds_test.map(resize)
model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])
batch_size = 32
since = time.time()
history = model.fit(ds_train_,
batch_size = batch_size,
steps_per_epoch = len(ds_train)//batch_size,
epochs = 5,
validation_steps = len(ds_test),
validation_data = ds_test_,
shuffle = True,)
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))
Result:
Epoch 1/5
1562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 2/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000
Epoch 3/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 4/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000
Epoch 5/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000
Training complete in 11m 23s
Solution
It is because in your tensorflow codes, the data pipeline is feeding a batch of 1 image into the model per step instead of a batch of 32 images.
Passing batch_size
into model.fit
does not really control the batch size when the data is in the form of datasets. The reason why it showed a seemingly correct steps per epoch from the log is that you passed steps_per_epoch
into model.fit
.
To correctly set the batch size:
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
label = tf.one_hot(label,10)
return (image, label)
train_size=len(ds_train)
test_size=len(ds_test)
ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)
ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)
model.fit
call:
history = model.fit(ds_train_,
epochs = 1,
validation_data = ds_test_)
After fixed the problem, tensorflow got similar speed performance with pytorch. In my machine, pytorch took ~27 minutes per epoch while tensorflow took ~24 minutes per epoch.
According to the benchmarks from NVIDIA, pytorch and tensorflow had similar speed performance in most popular deep learning applications with real-world datasets and problem size. (Reference: https://developer.nvidia.com/deep-learning-performance-training-inference)
Answered By - Laplace Ricky
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.