Issue
I have a dataframe like this am I'm trying to count the words said by a specific author.
Author Text Date
Jake hey hey my names Jake 1.04.1997
Mac hey my names Mac 1.02.2019
Sarah heymy names Sarah 5.07.2001
I've been trying to get it set up in a way where if i were to search for the word "hey" it would produce
Author Count
Jake 2
Mac 1
Solution
Use Series.str.count
with aggregate sum
:
df1 = df['Text'].str.count('hey').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 0
2 Sarah 1
If need filter out rows with 0 values add boolean indexing
:
s = df['Text'].str.count('hey')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Sarah 1
EDIT: for test hey
separately add words boundaries \b\b
like:
df1 = df['Text'].str.count(r'\bhey\b').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1
2 Sarah 0
s = df['Text'].str.count(r'\bhey\b')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.