Issue
As far as I know, when training and validating a model with GPU, GPU memory is mainly used for loading data, forward & backward. and to what I know, I think GPU memory usage should be same 1)before training, 2)after training, 3)before validation, 4)after validation. But in my case, GPU memory used in the validation phase is still occupied in the training phase and vice versa. It is not increasing per epoch so I'm sure it is not a common mistake like loss.item().
Here is the summary of my question
- Shouldn't the GPU memory used in one phase be cleaned up before another(except for model weights)?
- If it should, are there any noob mistakes I'm making here..?
Thank you for your help.
Here is the code for training loop
eval_result = evaluate(model,val_loader,True,True)
print(eval_result)
print('start training')
for epoch in range(num_epoch):
model.train()
time_ = datetime.datetime.now()
for iter_, data in enumerate(tr_loader):
x, y = data
x = x.to(device).view(x.shape[0],1,*(x.shape[1:]))
y = y.to(device).long()
pred = model.forward(x)
loss = loss_fn(pred,y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# print
print_iter = 16
if (iter_+1) % print_iter == 0:
elapsed = datetime.datetime.now() - time_
expected = elapsed * (num_batches / print_iter)
_epoch = epoch + ((iter_ + 1) / num_batches)
print('\rTRAIN [{:.3f}/{}] loss({}) '
'elapsed {} expected per epoch {}'.format(
_epoch,num_epoch, loss.item(), elapsed, expected)
,end="\t\t\t")
time_ = datetime.datetime.now()
print()
eval_result = evaluate(model,val_loader,True,True)
print(eval_result)
scheduler.step(eval_result[0])
if (epoch+1) %1 == 0:
save_model(model, optimizer, scheduler)
I've read about how making a validation phase a function helps since python is function scoping language.
so the evaluate() is
def evaluate(model, val_loader, get_acc = True, get_IOU = True):
"""
pred: Tensor of shape B C D H W
label Tensor of shape B D H W
"""
val_loss = 0
val_acc = 0
val_IOU = 0
with torch.no_grad():
model.eval()
for data in tqdm(val_loader):
x, y = data
x = x.to(device).view(x.shape[0],1,*(x.shape[1:]))
y = y.to(device).long()
pred = model.forward(x)
loss = loss_fn(pred,y)
val_loss += loss.item()
pred = torch.argmax(pred, dim=1)
if get_acc:
total = np.prod(y.shape)
total = total if total != 0 else 1
val_acc += torch.sum((pred == y)).cpu().item()/total
if get_IOU:
iou = 0
for class_num in range(1,8):
iou += torch.sum((pred==class_num)&(y==class_num)).cpu().item()\
/ torch.sum((pred==class_num)|(y==class_num)).cpu().item()
val_IOU += iou/7
val_loss /= len(val_loader)
val_acc /= len(val_loader)
val_IOU /= len(val_loader)
return (val_loss, val_acc, val_IOU)
and here is GPU usage in colab. 1 is the point where the evaluate() is first called, and 2 is when the train started.
Solution
Allocating GPU memory is slow. PyTorch retains the GPU memory it allocates, even after no more tensors referencing that memory remain. You can call torch.cuda.empty_cache()
to free any GPU memory that isn't accessible.
Answered By - jodag
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.