Sunday, October 10, 2021

[FIXED] Kernel Density Estimation using scipy's gaussian_kde and sklearn's KernelDensity leads to different results

October 10, 2021 kernel-density, probability-density, python, scikit-learn, scipy No comments

Issue

I created some data from two superposed normal distributions and then applied sklearn.neighbors.KernelDensity and scipy.stats.gaussian_kde to estimate the density function. However, using the same bandwith (1.0) and the same kernel, both methods produce a different outcome. Can someone explain me the reason for this? Thanks for help.

Below you can find the code to reproduce the issue:

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
import seaborn as sns
from sklearn.neighbors import KernelDensity

n = 10000
dist_frac = 0.1
x1 = np.random.normal(-5,2,int(n*dist_frac))
x2 = np.random.normal(5,3,int(n*(1-dist_frac)))
x = np.concatenate((x1,x2))
np.random.shuffle(x)
eval_points = np.linspace(np.min(x), np.max(x))

kde_sk = KernelDensity(bandwidth=1.0, kernel='gaussian')
kde_sk.fit(x.reshape([-1,1]))
y_sk = np.exp(kde_sk.score_samples(eval_points.reshape(-1,1)))

kde_sp = gaussian_kde(x, bw_method=1.0)
y_sp = kde_sp.pdf(eval_points)

sns.kdeplot(x)
plt.plot(eval_points, y_sk)
plt.plot(eval_points, y_sp)
plt.legend(['seaborn','scikit','scipy'])

If I change the scipy bandwith to 0.25, the result of both methods look approximately the same.

Solution

What is meant by bandwidth in scipy.stats.gaussian_kde and sklearn.neighbors.KernelDensity is not the same. Scipy.stats.gaussian_kde uses a bandwidth factor https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html. For a 1-D kernel density estimation the following formula is applied:

the bandwidth of sklearn.neighbors.KernelDensity = bandwidth factor of the scipy.stats.gaussian_kde * standard deviation of the sample

For your estimation this probably means that your standard deviation equals 4.

I would like to refer to Getting bandwidth used by SciPy's gaussian_kde function for more information.

Answered By - Brecht

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 10, 2021

[FIXED] Kernel Density Estimation using scipy's gaussian_kde and sklearn's KernelDensity leads to different results

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels