Issue
I have a DataFrame that contains all numerical columns, where the range of the data differs considerably between columns. The code below provides a representative example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'A': np.random.randn(10000) * 20,
'B': np.random.randn(10000) * 1000,
'C': np.random.randn(10000) * 0.01,
'D': np.random.randn(10000) * 300000,
'E': np.random.randn(10000) * 500
})
axs = df.plot(kind = 'hist',subplots = True, bins = 10, layout = (2,3), figsize = (12,8), title = list(df.columns), sharex = False, sharey = True)
for i, ax in enumerate(axs.reshape(-1)):
if i>= len(df.columns):
break
ax.set_xlim(df[df.columns[i]].min(),df[df.columns[i]].max())
plt.suptitle('Histograms for all features')
plt.tight_layout()
plt.show()
When df.plot
was called, the xlim range was automatically set to the range of the column with the largest numbers, which is why I added the for loop to solve that.
However, as you can see in the screenshot below, the bins are not correctly scaled.
I would like every subplot to display 10 bins, with each bin of the appropriate width for each histogram.
Is there a way to do that, either in the call to df.plot
or accessing the Axes objects with some method?
Solution
You can use pandas hist function instead.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'A': np.random.randn(10000) * 20,
'B': np.random.randn(10000) * 1000,
'C': np.random.randn(10000) * 0.01,
'D': np.random.randn(10000) * 300000,
'E': np.random.randn(10000) * 500
})
df.describe()
plt.figure();
df.hist(bins = 10,layout = (2,3),density = True, figsize = (12,8), sharex = False, sharey = False
);
Answered By - AzulCou
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.