Sunday, November 14, 2021

[FIXED] Plot word count on x axis and its occurrence on y axis from pandas df

November 14, 2021 dataframe, matplotlib, pandas, python No comments

Issue

The goal is to plot something like this:

I have the following dummy df. Note that data = number of words = x axis

data = [13,2,2,13,14,5,6,2,2,2,1,1,1,1,1,1,1,1,9,200,12,3,1,1,1,1,1,2,5,4,5,5,6,7,3,2,3,4,6,5,4,7,4,7,4,7,1,1,32,7,9,4,6,2,2,3,2,1,1]
my_df = pd.DataFrame(data=data, columns=['number_of_words'])

Now I need to calculate the y-axis, namely the occurrences of the number of words. E.g. How often is number of words = 1 and how often = 9 and so on... I did it this way:

data = my_df['number_of_words'].value_counts()

Then I created a new df with that:

df_occurrences = pd.DataFrame(data=data)
df_occurrences.rename(columns={"number_of_words": "occurrences"}, inplace=True)

Now I wanted to merge them but their length is different because my_df includes duplicates.

Thus, I removed the duplicates.

my_df.drop_duplicates(subset ="number_of_words", keep=False, inplace=True)

my_df and df_occurrences now have a different length and I cannot merge and plot them anymore...

Any idea what went wrong?

Solution

As user BigBen wrote in the comment to the original question post, my_df.value_counts().sort_index().plot() is all I needed to do. The other approaches mentioned by Quang Hoang and keithpjolley in the same comment section also work.

Answered By - Exa

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 14, 2021

[FIXED] Plot word count on x axis and its occurrence on y axis from pandas df

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels