Bias-reduced generalized k-nearest-neighbour KL divergence estimation

This is the bias-reduced generalized k-NN based KL divergence estimator from Wang et al. (2009) specified in Eq.(29).

Usage

kld_est_brnn(X, Y, max.k = 100, warn.max.k = TRUE, eps = 0)

Arguments

X, Y: n-by-d and m-by-d matrices, representing n samples from the true distribution \(P\) and m samples from the approximate distribution \(Q\), both in d dimensions. Vector input is treated as a column matrix. Y can be left blank if q is specified (see below).
max.k: Maximum numbers of nearest neighbours to compute (default: 100). A larger max.k may yield a more accurate KL-D estimate (see warn.max.k), but will always increase the computational cost.
warn.max.k: If TRUE (the default), warns if max.k is such that more than max.k neighbours are within the neighbourhood \(\delta\) for some data point(s). In this case, only the first max.k neighbours are counted. As a consequence, max.k may required to be increased.
eps: Error bound in the nearest neighbour search. A value of eps = 0 (the default) implies an exact nearest neighbour search, for eps > 0 approximate nearest neighbours are sought, which may be somewhat faster for high-dimensional problems.

Value

A scalar, the estimated Kullback-Leibler divergence \(\hat D_{KL}(P||Q)\).

Details

Finite sample bias reduction is achieved by an adaptive choice of the number of nearest neighbours. Fixing the number of nearest neighbours upfront, as done in kld_est_nn(), may result in very different distances \(\rho^l_i,\nu^k_i\) of a datapoint \(x_i\) to its \(l\)-th nearest neighbours in \(X\) and \(k\)-th nearest neighbours in \(Y\), respectively, which may lead to unequal biases in NN density estimation, especially in a high-dimensional setting. To overcome this issue, the number of neighbours \(l,k\) are here chosen in a way to render \(\rho^l_i,\nu^k_i\) comparable, by taking the largest possible number of neighbours \(l_i,k_i\) smaller than \(\delta_i:=\max(\rho^1_i,\nu^1_i)\).

Since the bias reduction explicitly uses both samples X and Y, one-sample estimation is not possible using this method.

Reference: Wang, Kulkarni and Verdú, "Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances", IEEE Transactions on Information Theory, Vol. 55, No. 5 (2009). DOI: https://doi.org/10.1109/TIT.2009.2016060

Examples

# KL-D between one or two samples from 1-D Gaussians:
set.seed(0)
X <- rnorm(100)
Y <- rnorm(100, mean = 1, sd = 2)
q <- function(x) dnorm(x, mean = 1, sd =2)
kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2)
#> [1] 0.4431472
kld_est_nn(X, Y)
#> [1] 0.2169136
kld_est_nn(X, q = q)
#> [1] 0.6374628
kld_est_nn(X, Y, k = 5)
#> [1] 0.491102
kld_est_nn(X, q = q, k = 5)
#> [1] 0.6904697
kld_est_brnn(X, Y)
#> [1] 0.1865685


# KL-D between two samples from 2-D Gaussians:
set.seed(0)
X1 <- rnorm(100)
X2 <- rnorm(100)
Y1 <- rnorm(100)
Y2 <- Y1 + rnorm(100)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)
kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
             mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
#> [1] 0.5
kld_est_nn(X, Y)
#> [1] 0.1841371
kld_est_nn(X, Y, k = 5)
#> [1] 0.1788047
kld_est_brnn(X, Y)
#> [1] 0.1854864