Issue
I've a list of number the size of the list is "74004228" and the minimum value is "1" with maximum value "65852", I'm trying to get sense of how their distribution will look like, so I'm using plt.hist() to plot the histogram, but it doesn't give me anything.
I'm getting the following histogram which looks messy.
matplotlib code:
unique_values = sorted(set(dataset_length))
bin_edges = unique_values + [unique_values[-1] + 1]
plt.hist(dataset_length, bins=bin_edges, log=True) # Align bins to the left
plt.title('Histogram of Data')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()
and I've tried
sns.displot(dataset_length)
the sns.distplot gives me an empty plot as below:
any solution for this ?
Solution
I took the log of the data, it looks better but then I can not know the exact distribution for each index.
So what I did: I used dict to count the values as follows
count_dataset_dict = {}
for item in dataset_length:
if item in count_dataset_dict:
count_dataset_dict[item] += 1
else:
count_dataset_dict[item] = 1
convert that to pandas DF and then filtered threshold of 1000
count_df_more_1K = count_df[count_df['count']>1000]
now I plotted the data
count_df_more_1K.sort_index().plot(kind='bar', figsize=(18, 8))
plt.title("Distribution of Index")
plt.xlabel("Index")
plt.ylabel("Count")
which looks something as follows
Answered By - Hadiana Sliwa
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.