k-nearest neighbour KL divergence estimator

This function estimates Kullback-Leibler divergence \(D_{KL}(P||Q)\) between two continuous distributions \(P\) and \(Q\) using nearest-neighbour (NN) density estimation in a Monte Carlo approximation of \(D_{KL}(P||Q)\).

Usage

kld_est_nn(X, Y = NULL, q = NULL, k = 1L, eps = 0, log.q = FALSE)

Arguments

X, Y: n-by-d and m-by-d matrices, representing n samples from the true distribution \(P\) and m samples from the approximate distribution \(Q\), both in d dimensions. Vector input is treated as a column matrix. Y can be left blank if q is specified (see below).
q: The density function of the approximate distribution \(Q\). Either Y or q must be specified.
k: The number of nearest neighbours to consider for NN density estimation. Larger values for k generally increase bias, but decrease variance of the estimator. Defaults to k = 1.
eps: Error bound in the nearest neighbour search. A value of eps = 0 (the default) implies an exact nearest neighbour search, for eps > 0 approximate nearest neighbours are sought, which may be somewhat faster for high-dimensional problems.
log.q: If TRUE, function q is the log-density rather than the density of the approximate distribution \(Q\) (default: log.q = FALSE).

Value

A scalar, the estimated Kullback-Leibler divergence \(\hat D_{KL}(P||Q)\).

Details

Input for estimation is a sample X from \(P\) and either the density function q of \(Q\) (one-sample problem) or a sample Y of \(Q\) (two-sample problem). In the two-sample problem, it is the estimator in Eq.(5) of Wang et al. (2009). In the one-sample problem, the asymptotic bias (the expectation of a Gamma distribution) is substracted, see Pérez-Cruz (2008), Eq.(18).

References:

Wang, Kulkarni and Verdú, "Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances", IEEE Transactions on Information Theory, Vol. 55, No. 5 (2009).

Pérez-Cruz, "Kullback-Leibler Divergence Estimation of Continuous Distributions", IEEE International Symposium on Information Theory (2008).

Examples

# KL-D between one or two samples from 1-D Gaussians:
set.seed(0)
X <- rnorm(100)
Y <- rnorm(100, mean = 1, sd = 2)
q <- function(x) dnorm(x, mean = 1, sd =2)
kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2)
#> [1] 0.4431472
kld_est_nn(X, Y)
#> [1] 0.2169136
kld_est_nn(X, q = q)
#> [1] 0.6374628
kld_est_nn(X, Y, k = 5)
#> [1] 0.491102
kld_est_nn(X, q = q, k = 5)
#> [1] 0.6904697
kld_est_brnn(X, Y)
#> [1] 0.1865685


# KL-D between two samples from 2-D Gaussians:
set.seed(0)
X1 <- rnorm(100)
X2 <- rnorm(100)
Y1 <- rnorm(100)
Y2 <- Y1 + rnorm(100)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)
kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
             mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
#> [1] 0.5
kld_est_nn(X, Y)
#> [1] 0.1841371
kld_est_nn(X, Y, k = 5)
#> [1] 0.1788047
kld_est_brnn(X, Y)
#> [1] 0.1854864