Issue
I have a dataframe with the x (column x) and y (column 1) values
below I am getting the mean
and stdev
.
Next I am plotting them together on one chart, but it just looks very wrong, It is not just that the fitted curve is shifted, I am not sure what is wrong with it.
import matplotlib.pyplot as plt
from scipy import stats
from scipy import optimize
import numpy as np
data_sample = {'x': [0,1,2,3,4,5,6,7,8,9,10], '1': [0,1,2,3,4,5,4,3,2,1,0]}
def test_func(x, a, b):
return stats.norm.pdf(x,a,b)
params, cov_params = optimize.curve_fit(test_func, data_sample['x'], data_sample['1'])
print(params)
plt.scatter(data_sample['x'], data_sample['1'], label='Data')
plt.plot(data_sample['x'] , test_func(data_sample['x'], params[0], params[1]), label='Fitted function')
plt.legend(loc='best')
plt.show()
Solution
The data needs to be normalized such that the area under the curve is 1. To calculate the area, when all x-values are 1 apart, you need the sum of the y-values. If the space between the x-values is larger or smaller than 1, that factor should also be included. Another way to calculate the area is np.trapz()
.
The normalization factor needs to be used when doing the fit. And the reverse needs to happen when drawing the curve with the original data.
When you try to fit the Gaussian pdf function to non-normalized points, the "best" fit is a very narrow, very high peak. This peak tries to approach the y=5
value in the center.
The example code below converts the lists to numpy arrays, so functions can be written more easily. Also, to draw a smooth curve, more detailed x-values are used.
import matplotlib.pyplot as plt
from scipy import stats
from scipy import optimize
import numpy as np
def test_func(x, a, b):
return stats.norm.pdf(x, a, b)
data_sample = {'x': np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),
'1': np.array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0])}
# x_dist = (data_sample['x'].max() - data_sample['x'].min()) / (len(data_sample['x']) - 1)
# normalization_factor = sum(data_sample['1']) * x_dist
normalization_factor = np.trapz(data_sample['1'], data_sample['x']) # area under the curve
params, pcov = optimize.curve_fit(test_func, data_sample['x'], data_sample['1'] / normalization_factor)
plt.scatter(data_sample['x'], data_sample['1'], clip_on=False, label='Data')
x_detailed = np.linspace(data_sample['x'].min() - 3, data_sample['x'].max() + 3, 200)
plt.plot(x_detailed, test_func(x_detailed, params[0], params[1]) * normalization_factor,
color='crimson', label='Fitted function')
plt.legend(loc='best')
plt.margins(x=0)
plt.ylim(ymin=0)
plt.tight_layout()
plt.show()
PS: Using the original code (without the normalization), but with more detailed x values, the narrow curve would be more apparent:
x_detailed = np.linspace(min(data_sample['x']) - 1, max(data_sample['x']) + 1, 500)
plt.plot(x_detailed, test_func(x_detailed, params[0], params[1]), color='m', label='Fitted function')
Answered By - JohanC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.