Skip to contents

This estimation method approximates the densities of the unknown bivariate distributions \(P\) and \(Q\) by kernel density estimates using function 'bkde' from package 'KernSmooth'. If 'KernSmooth' is not installed, a message is issued and the (much) slower function 'kld_est_kde' is used instead.

Usage

kld_est_kde2(
  X,
  Y,
  MC = FALSE,
  hX = NULL,
  hY = NULL,
  rule = c("Silverman", "Scott"),
  eps = 1e-05
)

Arguments

X, Y

n-by-2 and m-by-2 matrices, representing n samples from the bivariate true distribution \(P\) and m samples from the approximate distribution \(Q\), respectively.

MC

A boolean: use a Monte Carlo approximation instead of numerical integration via the trapezoidal rule (default: FALSE)? Currently, this option is not implemented, i.e. a value of TRUE results in an error.

hX, hY

Bandwidths for the kernel density estimates of \(P\) and \(Q\), respectively. The default NULL means they are determined by argument rule.

rule

A heuristic to derive parameters hX and hY, default is "Silverman", which means that $$h_i = \sigma_i\left(\frac{4}{(2+d)n}\right)^{1/(d+4)}.$$

eps

A nonnegative scalar; if eps > 0, \(Q\) is estimated as a mixture between the kernel density estimate and a uniform distribution on the computational grid. The weight of the uniform component is eps times the maximum density estimate of \(Q\). This increases the robustness of the estimator at the expense of an additional bias. Defaults to eps = 1e-5.

Value

A scalar, the estimated Kullback-Leibler divergence \(\hat D_{KL}(P||Q)\).

Examples

# KL-D between two samples from 2-D Gaussians:
set.seed(0)
X1 <- rnorm(1000)
X2 <- rnorm(1000)
Y1 <- rnorm(1000)
Y2 <- Y1 + rnorm(1000)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)
kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
             mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
#> [1] 0.5
kld_est_kde2(X,Y)
#> [1] 0.3639046