Git Product home page Git Product logo

densratio's Introduction

An R Package for Density Ratio Estimation

Koji MAKIYAMA (@hoxo-m)

Travis-CI Build Status CRAN Version CRAN Downloads Coverage Status Say Thanks!

1. Overview

Density ratio estimation is described as follows: for given two data samples x1 and x2 from unknown distributions p(x) and q(x) respectively, estimate w(x) = p(x) / q(x), where x1 and x2 are d-dimensional real numbers.

The estimated density ratio function w(x) can be used in many applications such as anomaly detection [Hido et al. 2011], change-point detection [Liu et al. 2013], and covariate shift adaptation [Sugiyama et al. 2007]. Other useful applications about density ratio estimation were summarized by [Sugiyama et al. 2012].

The package densratio provides a function densratio() that returns an object with a method to estimate density ratio as compute_density_ratio().

For example,

set.seed(3)
x1 <- rnorm(200, mean = 1, sd = 1/8)
x2 <- rnorm(200, mean = 1, sd = 1/2)

library(densratio)
densratio_obj <- densratio(x1, x2)

The densratio object has a function compute_density_ratio() that can compute the estimated density ratio w_hat(x) for any d-dimensional input x (now d=1).

new_x <- seq(0, 2, by = 0.03)
w_hat <- densratio_obj$compute_density_ratio(new_x)

plot(new_x, w_hat, pch=19)

In this case, the true density ratio w(x) = p(x)/q(y) = Norm(1, 1/8) / Norm(1, 1/2) is known. So we can compare w(x) with the estimated density ratio w-hat(x).

true_density_ratio <- function(x) dnorm(x, 1, 1/8) / dnorm(x, 1, 1/2)

plot(true_density_ratio, xlim=c(0, 2), lwd=2, col="red", xlab = "x", ylab = "Density Ratio")
plot(densratio_obj$compute_density_ratio, xlim=c(0, 2), lwd=2, col="green", add=TRUE)
legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)

2. Installation

You can install the densratio package from CRAN.

install.packages("densratio")

You can also install the package from GitHub.

install.packages("remotes") # if you have not installed "remotes" package
remotes::install_github("hoxo-m/densratio")

The source code for densratio package is available on GitHub at

3. Details

3.1 Basics

The package provides densratio(). The function returns an object that has a function to compute estimated density ratio.

For data samples x1 and x2,

set.seed(3)
x1 <- rnorm(200, mean = 1, sd = 1/8)
x2 <- rnorm(200, mean = 1, sd = 1/2)

library(densratio)
densratio_obj <- densratio(x1, x2)

In this case, densratio_obj$compute_density_ratio() can compute estimated density ratio.

new_x <- seq(0, 2, by = 0.03)
w_hat <- densratio_obj$compute_density_ratio(new_x)

plot(new_x, w_hat, pch=19)

3.2 Methods

densratio() has method argument that you can pass "uLSIF", "RuSLIF", or "KLIEP".

  • uLSIF (unconstrained Least-Squares Importance Fitting) is the default method. This algorithm estimates density ratio by minimizing the squared loss. You can find more information in [Kanamori et al. 2009] and [Hido et al. 2011].
  • RuLSIF (Relative unconstrained Least-Squares Importance Fitting). This algorithm estimates relative density ratio by minimizing the squared loss. You can find more information in [Yamada et al. 2011] and [Liu et al. 2013].
  • KLIEP (Kullback-Leibler Importance Estimation Procedure). This algorithm estimates density ratio by minimizing Kullback-Leibler divergence. You can find more information in [Sugiyama et al. 2007].

The methods assume that density ratio are represented by linear model:

  • w(x) = theta_1 * K(x, c_1) + theta_2 * K(x, c_2) + ... + theta_b * K(x, c_b)

where

  • K(x, c) = exp(-||x - c||^2 / 2 * sigma^2)

is the Gaussian (RBF) kernel.

densratio() performs the following:

  • Decides kernel parameter sigma by cross-validation,
  • Optimizes the kernel weights theta (in other words, find the optimal coefficients of the linear model), and
  • The parameters sigma and theta are saved into densratio object, and are used when to compute density ratio in the call compute_density_ratio().

3.3 Result and Arguments

densratio() outputs the result like as follows:

#> 
#> Call:
#> densratio(x = x1, y = x2, method = "uLSIF")
#> 
#> Kernel Information:
#>   Kernel type:  Gaussian 
#>   Number of kernels:  100 
#>   Bandwidth (sigma):  0.1 
#>   Centers:  num [1:100, 1] 0.907 1.093 1.18 1.136 1.046 ...
#> 
#> Kernel Weights:
#>   num [1:100] 0.067455 0.040045 0.000459 0.016849 0.067084 ...
#> 
#> Regularization Parameter (lambda):  1 
#> 
#> Function to Estimate Density Ratio:
#>   compute_density_ratio()
  • Kernel type is fixed as Gaussian.
  • Number of kernels is the number of kernels in the linear model. You can change by setting kernel_num argument. In default, kernel_num = 100.
  • Bandwidth (sigma) is the Gaussian kernel bandwidth. In default, sigma = "auto", the algorithm automatically select an optimal value by cross validation. If you set sigma a number, that will be used. If you set sigma a numeric vector, the algorithm select an optimal value in them by cross validation.
  • Centers are centers of Gaussian kernels in the linear model. These are selected at random from the data sample x1 underlying a numerator distribution p(x). You can find the whole values in result$kernel_info$centers.
  • Kernel Weights are theta parameters in the linear kernel model. You can find these values in result$kernel_weights.
  • Function to Estimate Density Ratio is named compute_density_ratio().

4. Multi Dimensional Data Samples

So far, the input data samples x1 and x2 were one dimensional. densratio() allows to input multidimensional data samples as matrix, as long as their dimensions are the same.

For example,

library(densratio)
library(mvtnorm)

set.seed(3)
x1 <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/8, 2))
x2 <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/2, 2))

densratio_obj_2d <- densratio(x1, x2)
densratio_obj_2d
#> 
#> Call:
#> densratio(x = x1, y = x2, method = "uLSIF")
#> 
#> Kernel Information:
#>   Kernel type:  Gaussian 
#>   Number of kernels:  100 
#>   Bandwidth(sigma):  0.316 
#>   Centers:  num [1:100, 1:2] 1.257 0.758 1.122 1.3 1.386 ...
#> 
#> Kernel Weights:
#>   num [1:100] 0.0756 0.0986 0.059 0.0797 0.0421 ...
#> 
#> Regularization Parameter (lambda):  0.3162278 
#> 
#> Function to Estimate Density Ratio:
#>   compute_density_ratio()

In this case, as well, we can compare the true density ratio with the estimated density ratio.

true_density_ratio <- function(x) {
  dmvnorm(x, mean = c(1, 1), sigma = diag(1/8, 2)) /
    dmvnorm(x, mean = c(1, 1), sigma = diag(1/2, 2))
}

N <- 20
range <- seq(0, 2, length.out = N)
input <- expand.grid(range, range)
w_true <- matrix(true_density_ratio(input), nrow = N)
w_hat <- matrix(densratio_obj_2d$compute_density_ratio(input), nrow = N)

par(mfrow = c(1, 2))
contour(range, range, w_true, main = "True Density Ratio")
contour(range, range, w_hat, main = "Estimated Density Ratio")

5. Related work

References

  • Hido, S., Y. Tsuboi, H. Kashima, M. Sugiyama, and T. Kanamori. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems, 2011.
  • Kanamori, T., S. Hido, and M. Sugiyama. A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 2009.
  • Liu, S., M. Yamada, N. Collier, M. Sugiyama. Change-point detection in time-series data by relative density-ratio estimation. Neural Net, 2013
  • Sugiyama, M., S. Nakajima, H. Kashima, P. von Bünau, and M. Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation. NIPS 2007.
  • Sugiyama, M., T. Suzuki, and T. Kanamori. Density ratio estimation in machine learning. Cambridge University Press, 2012.
  • Yamada, M., T. Suzuki, T. Kanamori, H. Hachiya, and M. Sugiyama. Relative density-ratio estimation for robust distribution comparison. NIPS 2011.

densratio's People

Contributors

hoxo-m avatar

Watchers

James Cloos avatar CRAN robot avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.