Wednesday, July 27, 2022

[FIXED] Fit data with a lognormal function via Maximum Likelihood estimators

July 27, 2022 numpy, python, scipy.stats No comments

Issue

Could someone help me in fitting the data collapse_fractions with a lognormal function, which has median and standard deviation derived via the maximum likelihood method?

I tried scipy.stats.lognormal.fit(data), but I did not obtain the data I retrieved with Excel. The excel file can be downloaded: https://stacks.stanford.edu/file/druid:sw589ts9300/p_collapse_from_msa.xlsx

Also, any reference is really welcomed.

import numpy as np

intensity_measure_vector = np.array([[0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 1]])    
no_analyses = 40    
no_collapses = np.array([[0, 0, 0, 4, 6, 13, 12, 16]])    
collapse_fractions = np.array(no_collapses/no_analyses)

print(collapse_fractions)
# array([[0.   , 0.   , 0.   , 0.1  , 0.15 , 0.325, 0.3  , 0.4  ]])

collapse_fractions.shape
# (1, 8)

import matplotlib.pyplot as plt
plt.scatter(intensity_measure_vector, collapse_fractions)

Solution

I couldn't figure out how to get the lognorm.fit to work. So I just implemented the functions from your excel-file and used scipy.optimize as the optimizer. The added benefit is, that it is easier to understand what is actually going on compared to lognorm.fit especially with the excel on the side.

Here is my implementation:

from functools import partial
import numpy as np
from scipy import optimize, stats


im = np.array([0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 1])
im_log = np.log(im)
number_of_analyses = np.array([40, 40, 40, 40, 40, 40, 40, 40]) 
number_of_collapses = np.array([0, 0, 0, 4, 6, 13, 12, 16])

FORMAT_STRING = "{:<20}{:<20}{:<20}"
print(FORMAT_STRING.format("sigma", "beta", "log_likelihood_sum"))


def neg_log_likelihood_sum(params, im_l, no_a, no_c):
    sigma, beta = params
    theoretical_fragility_function = stats.norm(np.log(sigma), beta).cdf(im_l)
    likelihood = stats.binom.pmf(no_c, no_a, theoretical_fragility_function)
    log_likelihood = np.log(likelihood)
    log_likelihood_sum = np.sum(log_likelihood)

    print(FORMAT_STRING.format(sigma, beta, log_likelihood_sum))
    return -log_likelihood_sum


neg_log_likelihood_sum_partial = partial(neg_log_likelihood_sum, im_l=im_log, no_a=number_of_analyses, no_c=number_of_collapses)


res = optimize.minimize(neg_log_likelihood_sum_partial, (1, 1), method="Nelder-Mead")
print(res)

And the final result is:

 final_simplex: (array([[1.07613697, 0.42927824],
       [1.07621925, 0.42935678],
       [1.07622438, 0.42924577]]), array([10.7977048 , 10.79770573, 10.79770723]))
           fun: 10.797704803509903
       message: 'Optimization terminated successfully.'
          nfev: 68
           nit: 36
        status: 0
       success: True
             x: array([1.07613697, 0.42927824])

The interesting part for you is on line one, the same final result as in the excel-calculation (sigma=1.07613697 and beta=0.42927824).

If you have any questions about what I did here, don't hesitate to ask as you said you are new to python. A few things in advance:

I did minimize the negative likelihood-sum as there is no maximizer in scipy.
partial from functools returns a function that has the specified arguments already defined (in this case im_l, no_a and no_c as they don't change) the partial function can then be called with just the missing argument.
The neg_log_likelihood_sum-function is basically whats happening in the excel-file, so it should be easy to understand when viewing it side-by-side.
scipy.optimize.minimize minimizes the function given as the first argument by changing the parameters (start-value as second argument). The method is chosen because it gave good results, I didn't dive deep into the abyss of different optimization-methods, gradients etc. So it may well be, that there is a better setup, but this one works fine and seems faster than the optimization with lognorm.fit.

The plot like in the excel-file is done like this with the results res from the optimization:

import matplotlib.pyplot as plt


x = np.linspace(0, 2.5, 100)
y = stats.norm(np.log(res["x"][0]), res["x"][1]).cdf(np.log(x))
plt.plot(x, y)
plt.scatter(im, number_of_collapses/number_of_analyses)
plt.show()

Answered By - MangoNrFive

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 27, 2022

[FIXED] Fit data with a lognormal function via Maximum Likelihood estimators

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels