Empirical convergence rate of a KL divergence estimator
Source:R/convergence-rate.R
convergence_rate.Rd
Subsampling-based confidence intervals computed by kld_ci_subsampling()
require the convergence rate of the KL divergence estimator as an input. The
default rate of 0.5
assumes that the variance term dominates the bias term.
For high-dimensional problems, depending on the data, the convergence rate
might be lower. This function allows to empirically derive the convergence
rate.
Usage
convergence_rate(
estimator,
X,
Y = NULL,
q = NULL,
n.sizes = 4,
spacing.factor = 1.5,
typical.subsample = function(n) sqrt(n),
B = 500L,
plot = FALSE
)
Arguments
- estimator
A KL divergence estimator.
- X, Y
n
-by-d
andm
-by-d
data frames or matrices (multivariate samples), or numeric/character vectors (univariate samples, i.e.d = 1
), representingn
samples from the true distribution \(P\) andm
samples from the approximate distribution \(Q\) ind
dimensions.Y
can be left blank ifq
is specified (see below).- q
The density function of the approximate distribution \(Q\). Either
Y
orq
must be specified. If the distributions are all continuous or all discrete,q
can be directly specified as the probability density/mass function. However, for mixed continuous/discrete distributions,q
must be given in decomposed form, \(q(y_c,y_d)=q_{c|d}(y_c|y_d)q_d(y_d)\), specified as a named list with fieldcond
for the conditional density \(q_{c|d}(y_c|y_d)\) (a function that expects two argumentsy_c
andy_d
) anddisc
for the discrete marginal density \(q_d(y_d)\) (a function that expects one argumenty_d
). If such a decomposition is not available, it may be preferable to instead simulate a large sample from \(Q\) and use the two-sample syntax.- n.sizes
Number of different subsample sizes to use (default:
4
).- spacing.factor
Multiplicative factor controlling the spacing of sample sizes (default:
1.5
).- typical.subsample
A function that produces a typical subsample size, used as the geometric mean of subsample sizes (default:
sqrt(n)
).- B
Number of subsamples to draw per subsample size.
- plot
A boolean (default:
FALSE
) controlling whether to produce a diagnostic plot visualizing the fit.
Value
A scalar, the parameter \(\beta\) in the empirical convergence
rate \(n^-\beta\) of the estimator
to the true KL divergence.
It can be used in the convergence.rate
argument of kld_ci_subsampling()
as convergence.rate = function(n) n^beta
.