Plug-in KL divergence estimator for samples from discrete distributions

Usage

kld_est_discrete(X, Y = NULL, q = NULL)

Arguments

X, Y: n-by-d and m-by-d matrices or data frames, representing n samples from the true discrete distribution \(P\) and m samples from the approximate discrete distribution \(Q\), both in d dimensions. Vector input is treated as a column matrix. Argument Y can be omitted if argument q is given (see below).
q: The probability mass function of the approximate distribution \(Q\). Currently, the one-sample problem is only implemented for d=1.

Value

A scalar, the estimated Kullback-Leibler divergence \(\hat D_{KL}(P||Q)\).

Examples

# 1D example, two samples
X <- c(rep('M',5),rep('F',5))
Y <- c(rep('M',6),rep('F',4))
kld_est_discrete(X, Y)
#> [1] 0.020411

# 1D example, one sample
X <- c(rep(0,4),rep(1,6))
q <- function(x) dbinom(x, size = 1, prob = 0.5)
kld_est_discrete(X, q = q)
#> [1] 0.02013551