Issue
I am tying to train two datasets at the same time to get good results on both datasets.
data_loader_iterator = iter(data_loader_second)
for batch_idx, (image1, label1) in enumerate(data_loader):
image1 = image1.to(args.local_rank)
label1 = label1.to(args.local_rank)
label1 = label1.squeeze()
try:
image2, label2 = next(data_loader_iterator)
except StopIteration:
data_loader_iterator = iter(data_loader_second)
image2, label2 = next(data_loader_iterator)
image2 = image2.to(args.local_rank)
label2 = label2.to(args.local_rank)
label2 = label2.squeeze()
embedding1 = backbone.forward(image1)
embedding2 = backbone.forward(image2)
output1 = head1.forward(embedding1, label1)
output2 = head2.forward(embedding2, label2)
loss1 = criterion(output1, label1)
loss2 = criterion(output2, label2)
loss = loss1 + loss2
optimizer.zero_grad()
loss.backward()
optimizer.step()
...
head1 = HeadFactory(args.head_type, args.head_conf_file, 751577).get_head()
head2 = HeadFactory(args.head_type, args.head_conf_file, 253788).get_head()
...
optimizer = torch.optim.AdamW(params=[{"params": backbone.parameters()},{"params": head1.parameters()},{"params": head2.parameters()}], lr = args.lr, weight_decay=0.05)
...
criterion = torch.nn.CrossEntropyLoss().to(args.local_rank)
This can't work correctly WARNING: torch.distributed.elastic.multiprocessing.api: Sending process 2919 closing signal SIGTERM
and I wonder how to declare the optimizer
Solution
It's unclear from your error what exactly went wrong and where.
Let's try a few things:
- How do you merge two datasets with different labels into the same
DataLoader
? how can you ensure that the first label belongs tohead1
while the second tohead2
?
I would recommend twoDatasets
with twoDataLoaders
to be iterated using azip
function:
for batch_idx, (image1, label1), (image2, label2) in enumerate(zip(data_loader1, dataloader2)):
# ...
- I wonder if the way you pass all the parameters to the optimizer is not causing trouble.
How about wrapping everything into a singlenn.Container
?
container = nn.ModuleList([backbone, head1, head2])
optimizer = torch.optim.AdamW(container.parameters(), lr=...)
If you need to wrap the model for parallelism, you can wrap the container
.
Additionally, Are you training in a multi-GPU settings? In that case, you need to add a few pieces to the code:
- Wrap both
backbone
andhead
s with a distributed module. - You data loaders need to be aware of parallelism. This is usually done via
DistributedSampler
.
Answered By - Shai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.