Issue
I'm training a Classifier Model but it's a few days that I cannot overcame a problem! I have the ValueError: Target size (torch.Size([4, 1])) must be the same as input size (torch.Size([4, 2])) error but actually it seems correct to me ! Indeed I used unsqueeze(1) to put them of the same size. WHat else I can try? Thank you!
class SequenceClassifier(nn.Module):
def __init__(self, n_classes):
super(SequenceClassifier, self).__init__()
self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME,return_dict=False)
self.drop = nn.Dropout(p=0.3)
self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
def forward(self, input_ids, attention_mask):
_, pooled_output = self.bert(
input_ids=input_ids,
attention_mask=attention_mask
)
output = self.drop(pooled_output)
return self.out(output)
model = SequenceClassifier(len(class_names))
model = model.to(device)
EPOCHS = 10
optimizer = AdamW(model.parameters(), lr=2e-5, correct_bias=False)
total_steps = len(train_data_loader) * EPOCHS
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=0,
num_training_steps=total_steps
)
weights=[0.5,1]
pos_weight=torch.FloatTensor(weights).to(device)
loss_fn=nn.BCEWithLogitsLoss(pos_weight=pos_weight)
def train_epoch(
model,
data_loader,
loss_fn,
optimizer,
device,
scheduler,
n_examples
):
model = model.train()
losses = []
correct_predictions = 0
for d in data_loader:
input_ids = d["input_ids"].to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
targets = targets.unsqueeze(1)
loss = loss_fn(outputs, targets)
correct_predictions += torch.sum(preds == targets)
losses.append(loss.item())
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
return correct_predictions.double() / n_examples, np.mean(losses)
%%time
history = defaultdict(list)
best_accuracy = 0
for epoch in range(EPOCHS):
print(f'Epoch {epoch + 1}/{EPOCHS}')
print('-' * 10)
train_acc, train_loss = train_epoch(
model,
train_data_loader,
loss_fn,
optimizer,
device,
scheduler,
len(df_train)
)
print(f'Train loss {train_loss} accuracy {train_acc}')
val_acc, val_loss = eval_model(
model,
val_data_loader,
loss_fn,
device,
len(df_val)
)
print(f'Val loss {val_loss} accuracy {val_acc}')
print()
history['train_acc'].append(train_acc)
history['train_loss'].append(train_loss)
history['val_acc'].append(val_acc)
history['val_loss'].append(val_loss)
if val_acc > best_accuracy:
torch.save(model.state_dict(), 'best_model_state.bin')
best_accuracy = val_acc
ValueError: Target size (torch.Size([4, 1])) must be the same as input size (torch.Size([4, 2]))
EDIT I have a binary classifier problem, indeed i have 2 classes encoded 0 ("bad") and 1 ("good").
Solution
In case anyone stumbles on this like I did, I'll write out an answer since there aren't a lot of google hits for this target size/input size error and the previous answer has some factual inaccuracies.
Unlike the previous answer would suggest, the real problem isn't with the loss function but with the output of the model.nn.BCEWithLogitsLoss
is completely fine for multi-label and multi-class applications. Chiara updated her post saying that in fact she has a binary classification problem, but even that should not be a problem for this loss function. So why the error?
The original code has:
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
This means "Run the model, then create preds
with the row indeces of the highest output of the model". Obviously, there is only a "index of highest" if there are multiple predicted values. Multiple output values usually means multiple input classes, so I can see why Shai though this was multi-class. But why would we get multiple outputs from a binary classifier?
As it turns out, BERT (or Huggingface anyway) for binary problems expects that n_classes
is set to 2 -- setting classes to 1 puts the model in regression mode. This means that under the hood, binary problems are treated like a two-class problem, outputting predictions with the size [2, batch size] -- one column predicting the chance of it being a 1 and one for the chance of it being 0. The loss fucntion throws an error because it is supplied with only one row of one-hot encoded labels: targets = d["targets"].to(device)
so the labels have dimensions [batch size] or after the unsqueeze, [1, batch size]. Either way, the dimensions don't match up.
Some loss functions can deal with this fine, but others require the exact same dimensions. To make things more frustrating, for version 1.10, nn.BCEWithLogitsLoss
requires matching dimensions but later versions do not.
One solution may therefore be to update your pytorch (version 1.11 would work for example).
For me, this was not an option, so I ended up going with a different loss function. nn.CrossEntropyLoss
, as suggested by Shai, indeed does the trick as it accepts any input with the same length. In other words, they had a working solution for the wrong reasons.
Answered By - Anne
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.