Issue
I'm currently working on a quick project to look at film runtimes over the years. The data comes from a Netflix dataset, which I have filtered to get the information I'm interested in. I've also taken the average film length in minutes by year with groupby() and mean(), but when I attempt to create a bar plot I get an error.
import pandas as pd
import matplotlib.pyplot as plt
# read the csv files - turn into dataframes
netflix = pd.read_csv('titles.csv')
print(netflix)
# we just want to consider movies with over 60 minutes of runtime
movie_filter = netflix[(netflix["type"] == "MOVIE") &
(netflix["runtime"] > 60)]
# now lets factor in averages
averages_over_time = movie_filter.groupby("release_year")["runtime"].mean()
average_film_runtime = pd.DataFrame(averages_over_time)
plt.plot(average_film_runtime["release_year"], average_film_runtime["runtime"])
plt.show()
The following is the error that I get.
Traceback (most recent call last):
File "c:\Users\matth\Dropbox\Python Code\Netflix Analysis\netflix.py", line 16, in <module>
plt.plot(average_film_runtime["release_year"], average_film_runtime["runtime"])
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\Users\matth\AppData\Roaming\Python\Python312\site-packages\pandas\core\frame.py", line 3893, in __getitem__
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\matth\AppData\Roaming\Python\Python312\site-packages\pandas\core\indexes\base.py", line 3797, in get_loc
raise KeyError(key) from err
KeyError: 'release_year'
PS C:\Users\matth\Dropbox\Python Code\Netflix Analysis>
I'm still new at working with Pandas and I've been stuck on this problem for almost an hour now so I apologize if the answer is obvious.
Thank you for your help.
Solution
I think your issue is coming from the fact that when you do groupby
and then run pd.DataFrame()
over that grouped-by object, the new object uses your original column as the index, not as a column.
That is to say, average_film_runtime does not have two columns called 'release_year' and 'runtime', but it has an index that is 'release_year' and one column (a Series) called 'runtime'.
You should be able to fix this by do average_film_runtime = average_film_runtime.reset_index()
and then running it through plt.plot()
Answered By - scotscotmcc
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.