2-D kernel density-based estimation of Kullback-Leibler divergence

This estimation method approximates the densities of the unknown bivariate distributions $P$ and $Q$ by kernel density estimates using function 'bkde' from package 'KernSmooth'. If 'KernSmooth' is not installed, a message is issued and the (much) slower function 'kld_est_kde' is used instead.

Usage

kld_est_kde2(
  X,
  Y,
  MC = FALSE,
  hX = NULL,
  hY = NULL,
  rule = c("Silverman", "Scott"),
  eps = 1e-05
)

Arguments

X, Y: n-by-2 and m-by-2 matrices, representing n samples from the bivariate true distribution $P$ and m samples from the approximate distribution $Q$, respectively.
MC: A boolean: use a Monte Carlo approximation instead of numerical integration via the trapezoidal rule (default: FALSE)? Currently, this option is not implemented, i.e. a value of TRUE results in an error.
hX, hY: Bandwidths for the kernel density estimates of $P$ and $Q$, respectively. The default NULL means they are determined by argument rule.
rule: A heuristic to derive parameters hX and hY, default is "Silverman", which means that $$h_i = \sigma_i\left(\frac{4}{(2+d)n}\right)^{1/(d+4)}.$$
eps: A nonnegative scalar; if eps > 0, $Q$ is estimated as a mixture between the kernel density estimate and a uniform distribution on the computational grid. The weight of the uniform component is eps times the maximum density estimate of $Q$. This increases the robustness of the estimator at the expense of an additional bias. Defaults to eps = 1e-5.

Value

A scalar, the estimated Kullback-Leibler divergence $\hat D_{KL}(P||Q)$.

Examples

# KL-D between two samples from 2-D Gaussians:
set.seed(0)
X1 <- rnorm(1000)
X2 <- rnorm(1000)
Y1 <- rnorm(1000)
Y2 <- Y1 + rnorm(1000)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)
kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
             mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
#> [1] 0.5
kld_est_kde2(X,Y)
#> [1] 0.3639046