Sunday, August 14, 2022

[FIXED] I use pytorch to train a model to classify iris, but my acc was about 0.4

August 14, 2022 iris-dataset, model, neural-network, python, pytorch No comments

Issue

I have tried many improvements like increasing epochs, using better loss functions and optimizers, deepening the network and shuffling the dataset, etc, but still to no avail. This problem has been bothering me for a long time, thanks for your help. Below is my code.

load and process dataset

def Iris_Reader(dataset):
    np.random.shuffle(dataset['data'])
    np.random.shuffle(dataset['target'])

    data = torch.FloatTensor(dataset['data']) 
    label = torch.LongTensor(dataset['target'])
    #convert to onthot
    label = nn.functional.one_hot(label, num_classes=3).float()

    #divide train/test dataset
    train_data = data[:120]
    train_label = label[:120]
    test_data = data[120:]
    test_label = label[120:]

    return train_data, train_label, test_data, test_label

Define the classifier

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        
        #4*3*3 network
        self.model = nn.Sequential(
            nn.Linear(4,3),
            nn.Sigmoid(),

            nn.Linear(3,3),
            nn.Sigmoid()
        )
        
        #SGD
        self.optimiser = torch.optim.SGD(self.parameters(), lr = 0.1)
        
        #MSE LOSS_FUNCTION
        self.loss_fn = nn.MSELoss()

        self.counter = 0
        self.progress = []

    def forward(self, input):
        return self.model(input)
    
    def train(self, input, target):
        output = self.forward(input)

        loss = self.loss_fn(output, target)

        self.counter += 1
        self.progress.append(loss.item())

        self.optimiser.zero_grad()
        loss.backward()
        self.optimiser.step()
    # plot loss
    def plot_loss(self):
        plt.figure(dpi=100)
        plt.ylim([0,1.0])
        plt.yticks([0, 0.25, 0.5, 1.0])
        plt.scatter(x = [i for i in range(len(self.progress))], y = self.progress, marker = '.', alpha = 0.2)
        plt.grid('on')
        plt.show()

TRAIN

from sklearn import datasets

C = Classifier()
epochs = 5

for epoch in range(epochs):
    dataset = datasets.load_iris()
    train_data, train_label, _, _ = Iris_Reader(dataset)
    # loop = tqdm(zip(train_data, train_label), len( train_label))
    for i, j in zip(train_data, train_label):
        C.train(i, j)

Solution

There's a bunch of problems here:

First, you seem to shuffle data and labels independently, rendering the dataset useless.
Also, you recreate the dataset inside the loop every epoch, wasting the CPU time pointlessly.

Overall, the dataset creation can be shortened to something like this:

def Iris_Reader(dataset):
    train_data, test_data, train_label, test_label = sklearn.model_selection.train_test_split(dataset.data, dataset.target, test_size=0.2)
    return torch.FloatTensor(train_data), torch.LongTensor(train_label), torch.FloatTensor(test_data), torch.LongTensor(test_label)

and should be taken outside the loop.

Next, MSELoss() is suited for regression. For classification, CrossEntropyLoss() is the default choice.

Using sigmoid as activation in an intermediate layer is not the best choice, especially with a short number of epochs. ReLU should converge much better.

Last but not least, your loss chart would look much cleaner if the values were averaged per epoch.

Answered By - dx2-66

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, August 14, 2022

[FIXED] I use pytorch to train a model to classify iris, but my acc was about 0.4

Issue

load and process dataset

Define the classifier

TRAIN

Solution

0 comments:

Post a Comment

Popular Posts

Labels