Thursday, April 21, 2022

[FIXED] Is this a correct way to training U-Net using different GPUs

April 21, 2022 gpu, pytorch No comments

Issue

U-Net code:

class UNet(nn.Module):
    def __init__(self):
        super(UNet, self).__init__()
        self.c1 = convBlock(1, 64).to('cuda:0')
        self.d1 = downSample(64).to('cuda:0')
        self.c2 = convBlock(64, 128).to('cuda:0')
        self.d2 = downSample(128).to('cuda:0')
        self.c3 = convBlock(128, 256).to('cuda:0')
        self.d3 = downSample(256).to('cuda:1')
        self.c4 = convBlock(256, 512).to('cuda:1')
        self.d4 = downSample(512).to('cuda:1')
        self.c5 = convBlock(512, 1024).to('cuda:1')
        self.u1 = upSample(1024).to('cuda:1')
        self.c6 = convBlock(1024, 512).to('cuda:1')
        self.u2 = upSample(512).to('cuda:1')
        self.c7 = convBlock(512, 256).to('cuda:1')
        self.u3 = upSample(256).to('cuda:1')
        self.c8 = convBlock(256, 128).to('cuda:1')
        self.u4 = upSample(128).to('cuda:0')
        self.c9 = convBlock(128, 64).to('cuda:0')
        self.out = nn.Conv3d(64, 1, 3, 1, 1).to('cuda:0')
        self.th = nn.Sigmoid().to('cuda:0')

    def forward(self, x):
        L1 = self.c1(x.to('cuda:0'))
        L2 = self.c2(self.d1(L1.to('cuda:0')).to('cuda:0'))
        L3 = self.c3(self.d2(L2.to('cuda:0')).to('cuda:0'))
        L4 = self.c4(self.d3(L3.to('cuda:1')).to('cuda:1'))
        L5 = self.c5(self.d4(L4.to('cuda:1')).to('cuda:1'))
        R4 = self.c6(self.u1(L5.to('cuda:1'), L4.to('cuda:1')).to('cuda:1'))
        R3 = self.c7(self.u2(R4.to('cuda:1'), L3.to('cuda:1')).to('cuda:1'))
        R2 = self.c8(self.u3(R3.to('cuda:1'), L2.to('cuda:1')).to('cuda:1'))
        R1 = self.c9(self.u4(R2.to('cuda:0'), L1.to('cuda:0')).to('cuda:0'))

        return self.th(self.out(R1.to('cuda:0')).to('cuda:0'))

convBlock, downSample, upSample is layer in my own code.

I want to train 3DU-Net, but the GPU memory is not enough, so I want to use multiple GPUs to train this model.

I assign different U-net layers to different GPUs.

I want to ask if this is the correct way to use different GPUs to train models? And what's the best way to run multiple GPU training python scripts using the PyTorch module in the Linux server?

Solution

Your code should work, but I'd suggest using some sort of variable to transfer submodel/tensor to different gpus. Something like this is what I've been using:

class MyModel(nn.Module):
    def __init__(self, split_bool: bool = False):
        self.submodule1 = ...
        self.submodule2 = ...

        self.split_bool = split_bool
        if split_bool:
            self.submodule1.cuda(0)
            self.submodule2.cuda(1)

    def forward(self, x):
        x = self.submodule1(x)
        if self.split_bool:
            x = x.cuda(1) # Transfer tensor to second GPU
        return self.submodule2(x)

For multiple training it really depends on your server. Are you using tensorboard/tensorboardX to plot results? You can launch multiple training script with different parameters with tmux, or even write your own bash script.

Answered By - Deusy94

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, April 21, 2022

[FIXED] Is this a correct way to training U-Net using different GPUs

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels