Issue
I'm trying to create a rudimentary sentiment analyzer. I have lists of words in categories, and two csv files from reddit threads which I'm taking comments from. I've managed to tag my data sets with the appropriate tags, and I now have sets of tuples in lists of lists which are separated by comments. I have a piece of code which I hoped to use to make an integer value for each comment based on the tags present, however I'm hitting a brick wall mentally.
I've tried the below code which results in a 0 at best, and a ValueError at worst. I know it's gotta be chock full of bad ideas, but I'm at a loss. At this point I just want something to FUNCTION T_T
tLOTR = [[('terrible', 'negative'),
('so', 'intensifier'),
('awesome', 'positive'),
('so', 'intensifier'),
('but', 'shifter'),
('agree', 'positive'),
('like', 'positive'),
('really', 'intensifier'),
('but', 'shifter'),
('but', 'shifter'),
('so', 'intensifier'),
('not', 'shifter'),
('like', 'positive'),
('really', 'intensifier'),
('like', 'positive'),
('so', 'intensifier')],
[('not', 'shifter'),
('amazing', 'positive'),
('but', 'shifter'),
('bad', 'negative'),
('but', 'shifter'),
('like', 'positive'),
('awful', 'negative'),
('but', 'shifter'),
('like', 'positive'),
('but', 'shifter'),
('so', 'intensifier'),
('completely', 'intensifier'),
('wrong', 'negative')]]
#this is just a few of my tagged sets
def sentalize(text):
value = 0
for x in text:
for (word, tag) in x:
if tag == "positive":
value += 1
elif tag == "negative":
value -= 1
elif tag == "shifter":
value *= -1
elif tag == "intensifier":
value *= 1.25
return value
So I'm getting either 0 or ValueError when I run a single thing (tLOTR[0] for instance) - what I'd like ideally is to have a list of values for each comment (comment 1 = -0.348) or something of the sort.
Solution
Assuming you want sentalize()
to process individual elements of tLOTR
, your problem is the loop:
def sentalize(text):
value = 0
for word, tag in text:
if tag == "positive":
value += 1
elif tag == "negative":
value -= 1
elif tag == "shifter":
value *= -1
elif tag == "intensifier":
value *= 1.25
return value
print(sentalize(tLOTR[0]))
Note how word, tag
can be captured in one line by iterating over text
, instead of first extracting a tuple x
and then trying to somehow loop over the components of that tuple, like in your example.
With that change you can do: values = list(map(sentalize, tLOTR))
and get the result [-2.833251953125, 0.5625]
Some additional comments:
- storing each word with its type as a string (i.e. "positive", "negative", etc.) is not very efficient; instead, consider representing that with a simpler value
- since you've already parsed the comments and have apparently matched each word with the type of modifier / tag, that would possibly be the right time to update value, instead of having this
tLOTR
list of intermediate values. - combining operators like
-=
and+=
with positive and negative constant values like1
and-1
is very confusing. I'd recommend only using+=
and*=
and using negative or positive values where appropriate.
Answered By - Grismar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.