Since Kullback-Leibler divergence is scale-invariant, its sample-based approximations can be computed on a conveniently chosen scale. This helper functions transforms each variable in a way that all marginal distributions of the joint dataset \((X,Y)\) are uniform. In this way, the scales of different variables are rendered comparable, with the idea of a better performance of neighbour-based methods in this situation.
Examples
# 2D example
n <- 10L
X <- cbind(rnorm(n, mean = 0, sd = 3),
rnorm(n, mean = 1, sd = 2))
Y <- cbind(rnorm(n, mean = 1, sd = 2),
rnorm(n, mean = 0, sd = 2))
to_uniform_scale(X, Y)
#> $X
#> [,1] [,2]
#> [1,] 0.25 0.70
#> [2,] 0.10 0.60
#> [3,] 0.70 0.85
#> [4,] 0.60 0.20
#> [5,] 0.90 0.50
#> [6,] 0.05 0.75
#> [7,] 0.30 0.95
#> [8,] 1.00 0.80
#> [9,] 0.40 0.65
#> [10,] 0.85 0.90
#>
#> $Y
#> [,1] [,2]
#> [1,] 0.55 0.30
#> [2,] 0.20 0.35
#> [3,] 0.50 0.10
#> [4,] 0.45 0.40
#> [5,] 0.65 0.55
#> [6,] 0.35 0.15
#> [7,] 0.80 1.00
#> [8,] 0.95 0.25
#> [9,] 0.15 0.05
#> [10,] 0.75 0.45
#>