Tuesday, July 5, 2022

[FIXED] Understanding model training and evaluation in Pytorch

July 05, 2022 deep-learning, python, pytorch No comments

Issue

I am following a Pytorch code on deep learning. Where I saw model evaluation taking place within the training epoch!

Q) Should the torch.no_grad and model.eval() be out of the training epoch loop?

Q) And how to determine that, which parameter (weight) are getting optimised by the optimiser during the back-propagation?

...

for l in range(1):
    model = GTN(num_edge=A.shape[-1],
                        num_channels=num_channels,w_in = node_features.shape[1],w_out = node_dim,
                        num_class=num_classes,num_layers=num_layers,norm=norm)
    
    if adaptive_lr == 'false':
        optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=0.001)
    else:
        optimizer = torch.optim.Adam([{'params':model.weight},{'params':model.linear1.parameters()},{'params':model.linear2.parameters()},
                                    {"params":model.layers.parameters(), "lr":0.5}], lr=0.005, weight_decay=0.001)
    
    loss = nn.CrossEntropyLoss()
    
    # Train & Valid & Test
    best_val_loss = 10000
    best_train_loss = 10000
    best_train_f1 = 0
    best_val_f1 = 0
    
    for i in range(epochs):
        print('Epoch:  ',i+1)
        model.zero_grad()
        model.train()
        loss,y_train,Ws = model(A, node_features, train_node, train_target)
        train_f1 = torch.mean(f1_score(torch.argmax(y_train.detach(),dim=1), train_target, num_classes=num_classes)).cpu().numpy()
        print('Train - Loss: {}, Macro_F1: {}'.format(loss.detach().cpu().numpy(), train_f1))
        
        loss.backward()
        optimizer.step()
        model.eval()
        # Valid

        with torch.no_grad():
            val_loss, y_valid,_ = model.forward(A, node_features, valid_node, valid_target)
            val_f1 = torch.mean(f1_score(torch.argmax(y_valid,dim=1), valid_target, num_classes=num_classes)).cpu().numpy()

        if val_f1 > best_val_f1:
            best_val_loss = val_loss.detach().cpu().numpy()
            best_train_loss = loss.detach().cpu().numpy()
            best_train_f1 = train_f1
            best_val_f1 = val_f1

    print('---------------Best Results--------------------')
    print('Train - Loss: {}, Macro_F1: {}'.format(best_train_loss, best_train_f1))
    print('Valid - Loss: {}, Macro_F1: {}'.format(best_val_loss, best_val_f1))
    final_f1 += best_test_f1

Solution

For each epoch, you are doing train, followed by validation/test. For validation/test you are moving the model to evaluation model using model.eval() and then doing forward propagation with torch.no_grad() which is correct. Again, you are moving back the model back to train model using model.train() at the start of train. There is no issue with the code and you are using the model modes correctly.
In your code, if adaptive_lr if False then you are optimizing the parameters given by model.parameters() and when adaptive_lr is True then you are optimizing:
- model.weight
- model.linear1.parameters()
- model.linear2.parameters()
- model.layers.parameters()

Answered By - mujjiga

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, July 5, 2022

[FIXED] Understanding model training and evaluation in Pytorch

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels