Sunday, July 10, 2022

[FIXED] Curve does not fit with the Histogram for Customize Function

July 10, 2022 curve-fitting, matplotlib, numpy, python, scipy No comments

Issue

I am trying to fit a curve to a histogram, but the resulting curve is flat even though the histogram was not. How can I fit the curve correctly?

My current code:

import numpy as np

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

import pandas as pd

import scipy.optimize as optimization


x = np.random.normal(1e-10, 1e-7, size=10000)

def func(x, a, b, c):
    return a * (np.exp(-b*(x-c)**2))

bins=25

logbins = np.logspace(np.log10(1.0E-10),np.log10(1E-07),bins)


bin_heights, bin_borders, _ = plt.hist(x, bins=logbins, edgecolor='black', color='b')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2

x0    = np.array([0.0, 0.0, 0.0])

popt,cov = optimization.curve_fit(func, bin_centers, bin_heights,x0)
a,b,c=popt

popt, pcov = curve_fit(func, bin_centers, bin_heights, p0=[a,b,c])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 1000)

plt.plot(x_interval_for_fit, func(x_interval_for_fit, *popt), label='Fit',color='r')


plt.xscale('log')

plt.show()

The result:

Screen shot showing my curve is flat but my histogram increases

Solution

You are getting bad results because the function you are using to fit the histogram doesn't look like the shape of the histogram at all. By using a simple second order interpolation function, the results are a lot better (though you might say not ideal):

def func(x, a, b, c):
    return a * x**2 + b * x + c  # Simple 2nd-order polynomial

Using it with your code (you can remove the two optimisation steps and do that only once), I got the following result:

One thing that may have been unintentional in your code is that, in spite of the fact that you created a normal distribution, on your histogram you decided to bin them in a surprising way: considering only what you have on one side of the distribution (since you start from 1e-10, and your distribution is centred in 1e-10) and you increase your bin size logarithmically to the right, you'll end up with more points on the larger bins. Also, you are ignoring more than half of your points (those the are smaller than 1e-10, check hist's documentation). You can check it by: np.sum(bin_heights), you'll see that the count is less than half of len(x).

If all I said above was unintentional and you really wanted to find out the original generating gaussian of the random numbers (but with counts instead of density), you should do that directly over your unmodified histogram. One problem though, since you are using a very small standard deviation, is that the problem is very badly conditioned (in the sense that the Jacobian has widely varying values, depending on which dimension you go). You can have a feeling for "how hard" it is for the optimiser to understand the influence of each parameter by calculating the error manually and seeing how it changes (or doesn't) as you vary the parameters. Since you are doing a least-squares optimisation, your cost function looks like this:

f = lambda a, b, c: sum((bin_heights - func(bin_centers, a, b, c))**2)

If you play with this function a bit around 0 initial condition, to see what the optimiser is seeing, you'll notice why convergence is difficult:

f(0, 0, 0)   # 8_407_566
f(0, 0, 1)   # 8_407_566, i.e. no change
f(0, 1, 0)   # 8_407_566, i.e. no change
f(1, 0, 0)   # 8_387_591, clearly an improvement

This is not a mathematically rigorous argument (that comes from the ill-conditioning of the Jacobian when a and b are close to 0), but it gives a good intuition of what is going wrong.

Thus, you need a good "first guess". Of course, in this problem, since you know that you are trying to fit a gaussian, you could use the average, max value and standard deviation as initial parameters (in your case, the inverse of the square of the standard deviation). The code below shows a set of "less tuned" initial parameters that also work:

def func(x, a, b, c):  # Original function
    return a * np.exp(- b * (x - c)**2)

bins=25  # Let hist do its magic
bin_heights, bin_borders, _ = plt.hist(x, bins=bins, edgecolor='black', color='b')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2

x0 = np.array([600.0, 1.0e7, 0.0])

popt, cov = curve_fit(func, bin_centers, bin_heights, x0)

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 1000)

plt.plot(x_interval_for_fit, func(x_interval_for_fit, *x0), label='Initial guess',color='y')
plt.plot(x_interval_for_fit, func(x_interval_for_fit, *popt), label='Fit',color='r')
plt.show()

And the result, with the curve created by the initial guess (in yellow) and the result of the interpolation (in red):

Answered By - nonDucor

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, July 10, 2022

[FIXED] Curve does not fit with the Histogram for Customize Function

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels