Issue
U-Net code:
class UNet(nn.Module):
def __init__(self):
super(UNet, self).__init__()
self.c1 = convBlock(1, 64).to('cuda:0')
self.d1 = downSample(64).to('cuda:0')
self.c2 = convBlock(64, 128).to('cuda:0')
self.d2 = downSample(128).to('cuda:0')
self.c3 = convBlock(128, 256).to('cuda:0')
self.d3 = downSample(256).to('cuda:1')
self.c4 = convBlock(256, 512).to('cuda:1')
self.d4 = downSample(512).to('cuda:1')
self.c5 = convBlock(512, 1024).to('cuda:1')
self.u1 = upSample(1024).to('cuda:1')
self.c6 = convBlock(1024, 512).to('cuda:1')
self.u2 = upSample(512).to('cuda:1')
self.c7 = convBlock(512, 256).to('cuda:1')
self.u3 = upSample(256).to('cuda:1')
self.c8 = convBlock(256, 128).to('cuda:1')
self.u4 = upSample(128).to('cuda:0')
self.c9 = convBlock(128, 64).to('cuda:0')
self.out = nn.Conv3d(64, 1, 3, 1, 1).to('cuda:0')
self.th = nn.Sigmoid().to('cuda:0')
def forward(self, x):
L1 = self.c1(x.to('cuda:0'))
L2 = self.c2(self.d1(L1.to('cuda:0')).to('cuda:0'))
L3 = self.c3(self.d2(L2.to('cuda:0')).to('cuda:0'))
L4 = self.c4(self.d3(L3.to('cuda:1')).to('cuda:1'))
L5 = self.c5(self.d4(L4.to('cuda:1')).to('cuda:1'))
R4 = self.c6(self.u1(L5.to('cuda:1'), L4.to('cuda:1')).to('cuda:1'))
R3 = self.c7(self.u2(R4.to('cuda:1'), L3.to('cuda:1')).to('cuda:1'))
R2 = self.c8(self.u3(R3.to('cuda:1'), L2.to('cuda:1')).to('cuda:1'))
R1 = self.c9(self.u4(R2.to('cuda:0'), L1.to('cuda:0')).to('cuda:0'))
return self.th(self.out(R1.to('cuda:0')).to('cuda:0'))
convBlock, downSample, upSample is layer in my own code.
I want to train 3DU-Net, but the GPU memory is not enough, so I want to use multiple GPUs to train this model.
I assign different U-net layers to different GPUs.
I want to ask if this is the correct way to use different GPUs to train models? And what's the best way to run multiple GPU training python scripts using the PyTorch module in the Linux server?
Solution
Your code should work, but I'd suggest using some sort of variable to transfer submodel/tensor to different gpus. Something like this is what I've been using:
class MyModel(nn.Module):
def __init__(self, split_bool: bool = False):
self.submodule1 = ...
self.submodule2 = ...
self.split_bool = split_bool
if split_bool:
self.submodule1.cuda(0)
self.submodule2.cuda(1)
def forward(self, x):
x = self.submodule1(x)
if self.split_bool:
x = x.cuda(1) # Transfer tensor to second GPU
return self.submodule2(x)
For multiple training it really depends on your server. Are you using tensorboard/tensorboardX to plot results? You can launch multiple training script with different parameters with tmux, or even write your own bash script.
Answered By - Deusy94
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.