Issue
I am currently learning pandas and I am using an imdb movies database, which one of the columns is the duration of the movies. However, one of the values is "None", so I can´t calculate the mean because there is this string in the middle. I thought of changing the "None" to = 0, however that would skew the results. Like can be seen with the code below.
dur_temp = duration.replace("None", 0)
dur_temp = dur_temp.astype(float)
descricao_duration = dur_temp.mean()
Any ideas on what I should do in order to not skew the data? I also graphed it and it becomes more clear how it skews it.
Solution
You can replace "None" with numpy.nan
, instead that using 0.
Something like this should do the trick:
import numpy as np
dur_temp = duration.replace("None", np.nan)
descricao_duration = dur_temp.mean()
Answered By - Alessandro
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.