Issue
The goal is to plot something like this:
I have the following dummy df. Note that data = number of words = x axis
data = [13,2,2,13,14,5,6,2,2,2,1,1,1,1,1,1,1,1,9,200,12,3,1,1,1,1,1,2,5,4,5,5,6,7,3,2,3,4,6,5,4,7,4,7,4,7,1,1,32,7,9,4,6,2,2,3,2,1,1]
my_df = pd.DataFrame(data=data, columns=['number_of_words'])
Now I need to calculate the y-axis, namely the occurrences of the number of words. E.g. How often is number of words = 1 and how often = 9 and so on... I did it this way:
data = my_df['number_of_words'].value_counts()
Then I created a new df with that:
df_occurrences = pd.DataFrame(data=data)
df_occurrences.rename(columns={"number_of_words": "occurrences"}, inplace=True)
Now I wanted to merge them but their length is different because my_df
includes duplicates.
Thus, I removed the duplicates.
my_df.drop_duplicates(subset ="number_of_words", keep=False, inplace=True)
my_df
and df_occurrences
now have a different length and I cannot merge and plot them anymore...
Any idea what went wrong?
Solution
As user BigBen wrote in the comment to the original question post, my_df.value_counts().sort_index().plot()
is all I needed to do. The other approaches mentioned by Quang Hoang and keithpjolley in the same comment section also work.
Answered By - Exa
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.