Issue
I'm doing a sentiments analysis using Python (I'm still a rookie with this specific programming language). I have some Twitter data in a csv file that I need to pre-process before doing the real analysis. First of all I need to tokenize the text from a specific column, in my case the second or col B. I found some suggestions how to do the tokenization but not to pick the specific col. Anyone who has experience with this?
I tried this code, which seems to work for all columns, but how can I isolate it to the second col?
import csv
import nltk
from nltk import word_tokenize
with open('TwitterData.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
Any suggestions to modules and code that works for pre-processing to sentiments analysis?
Thanks a lot!
Solution
I can highly recommend you the scikit-learn documentation and modules, especially the part about "Working with Text Data": https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
There they also have a section about sentiment analysis: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html#exercise-2-sentiment-analysis-on-movie-reviews
If you need more specific help with your code, it is alway best to provide a "minimal reproducable example": https://stackoverflow.com/help/minimal-reproducible-example This way, others can help you better with a specific issue you are facing.
I hope that helps :)
Answered By - Kim Tang
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.