Issue
I'm predicting sentiment analysis of Tweets with positive, negative, and neutral classes. I've trained a BERT model using Hugging Face. Now I'd like to make predictions on a dataframe of unlabeled Twitter text and I'm having difficulty.
I've followed the following tutorial (https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/) and was able to train a BERT model using Hugging Face.
Here's an example of predicting on raw text however it's only one sentence and I would like to use a column of Tweets. https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/#predicting-on-raw-text
review_text = "I love completing my todos! Best app ever!!!"
encoded_review = tokenizer.encode_plus(
review_text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
print(f'Review text: {review_text}')
print(f'Sentiment : {class_names[prediction]}')
Review text: I love completing my todos! Best app ever!!!
Sentiment : positive
Bill's response works. Here's the solution.
def predictionPipeline(text):
encoded_review = tokenizer.encode_plus(
text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
return(class_names[prediction])
df2['prediction']=df2['cleaned_tweet'].apply(predictionPipeline)
Solution
You can use the same code to predict texts from the dataframe column.
model = ...
tokenizer = ...
def predict(review_text):
encoded_review = tokenizer.encode_plus(
review_text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
print(f'Review text: {review_text}')
print(f'Sentiment : {class_names[prediction]}')
return class_names[prediction]
df = pd.DataFrame({
'texts': ["text1", "text2", "...."]
})
df_dataset["sentiments"] = df.apply(lambda l: predict(l.texts), axis=1)
Answered By - Bill
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.