Git Product home page Git Product logo

truncexpfam's Introduction

What is this?

This is an R package to handle truncated members from the exponential family.

Installation

Stable version

TruncExpFam is available on CRAN and can be installed by running the following in an interactive R session:

install.packages("TruncExpFam")

Development version

The development version of the package contains features and bug fixes that are yet to be published. It is, however, much less stable than the CRAN version. You can install the development version of TruncExpFam by running the following command in R (requires the remotes package to be installed beforehand):

remotes::install_github("ocbe-uio/TruncExpFam")

If you want to browse the vignette, add build_vignettes = TRUE to your install_github() command.

Further details on installing TruncExpFam can be found on the Wiki.

Usage

Once installed, TruncExpFam can be loaded with library(TruncExpFam). A list of available functions can be printed with ls("package:TruncExpFam").

For more information about the package (e.g. suppored distributions), run ?TruncExpFam after loading the package in your R session.

Are you familiar with the stats package and its r* and d* functions such as rnorm() and dpois()? If so, you will feel right at home with TruncExpFam, which uses the rtrunc() function to generate random numbers and the dtrunc() function to generate probability densities.

For a more detailed explanation on how to use this package’s features, check out its vignette:

browseVignettes("TruncExpFam")

Contributing

TruncExpFam is open-source software licensed by the GPL. All contributions are welcome! Please use the issues page to submit any bugs you find or see what other issues have been submitted.

To contribute with code, we recommend reading this Wiki page on the subject.

Citing

If you present work that uses this package, please remember to cite it. To cite TruncExpFam in publications, use the output of citation("TruncExpFam") on your R session.

Badges

Stable version

CRAN version CRAN downloads License: GPL-3 DOI

Development version

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Last commit Code size R build status codecov CodeFactor

truncexpfam's People

Contributors

rho62 avatar wleoncio avatar

Watchers

 avatar  avatar  avatar

Forkers

rho62

truncexpfam's Issues

Eliminate ml.estimation.trunc.dist dependency on the "family" argument

Motivation

The ml.estimation.trunc.dist() function requires a family argument to manually trigger the appropriate estimation method. As of commit 8e17b33, this is not required anymore, because the y argument already contains this information in its class.

Steps

  • Remove family argument
  • If using S3: change name of ml.estimation.trunc.dist function to something without . (name can be kept for S4)
  • Split generic and methods for each signature
  • Adjust tests and examples

Classes.R file

Go through the file:

  • Update parameter lists
  • Add new classes for the new distributions

Export natural2parameters and parameters2natural

Each distribution family contains its own set of these two functions, so maybe this is another case for creating generic natural2parameters() and parameters2natural() functions and only document these. Namely:

  • Beta
  • Binomial
  • Chisq
  • Continuous Bernoulli
  • Exponential
  • Gamma
  • Inverse Gamma
  • Inverse Gaussian
  • Log-normal
  • Negative Binomial
  • Normal
  • Poisson

Develop rtrunc.binomial

According to the following lines, rtrunc.binomial is not yet working:

# set.seed(117)
# # NOT WORKING YET
# sample.binom <- rtrunc.binomial(1000, 0.6, 4, , 10)
# hist(sample.binom)
# ml.estimation.trunc.dist(sample.binom, y.min = 4, max.it = 500, delta = 0.33, family = "Binomial", nsize = 10)

From René's e-mail:

For the binomial case ,there is a small caveat: It has an extra parameter n (The number of trials). This parameter «n» is not to be estimated, but a fixed value, that has to be transferred to the various functions. As it does not conform with the syntax, for the other distributions ,I have tried to use the dot-dot-dot facility.

Join truncation limits into one slot

Truncation limits are currently defined as separate slots in the Trunc class:

TruncExpFam/R/classes.R

Lines 11 to 19 in 32e40d3

setClass(
Class = "Trunc",
slots = list(
n = "integer",
a = "numeric",
b = "numeric", # TODO: join with a as a length 1+ vector (trunc. points)
sample = "numeric"
)
)

They should be joined as a 1+ vector into one slot, as it may have length 1, 2, 3.

Improve readability of rtrunc signatures

The current help file of rtrunc reads:

Usage:

 rtrunc(n, size, prob, alpha, beta, mulog, sigmalog, mu, sigma, lambda, a, b)
 
 ## S4 method for signature 'numeric,missing,missing,missing,missing'
 rtrunc(n, size, prob, a, b)
 
 ## S4 method for signature 'missing,numeric,missing,missing,missing'
 rtrunc(n, alpha, beta, a = 0, b = Inf)
 
 ## S4 method for signature 'missing,missing,numeric,missing,missing'
 rtrunc(n, mulog, sigmalog, a, b)
 
 ## S4 method for signature 'missing,missing,missing,numeric,missing'
 rtrunc(n, mu, sigma, a, b)
 
 ## S4 method for signature 'missing,missing,missing,missing,numeric'
 rtrunc(n, lambda, a, b)

This is a bit confusing, as the name of the signatures is not understandable unless one looks at the source code. They should be replaced with something like the name of the distribution.

Moreover, the top usage (with all the parameters) makes no sense, so it should probably be removed not to confuse the user.

Sampling n units

The sampling functions rXXX do most often not sample the full sample n, due to truncation

Export density* functions as S3 methods of dtrunc

This would require changing the class of the inputs of those files (y and eta) so they can be properly matched by the generic density() function. Otherwise, the functions won't work as methods for density().

Create generic rtrunc function

It will call rtrunc.norm(), rtrunc.gamma(), etc. as methods. Maybe the generic and its methods should be grouped into one function instead of spread across several files.

rcontbernoulli

The sampling functions implemented in 'rtrunc.R' relies on random sampling from distributions follow by a truncation. For the continuous bernoulli, there was no sampling function available in base R. Hence it has been implemented in the rtrunc.R file (at the top) - Needs proper adaption

Reorganize family name validation in rtrunc

Problem

Using the rtrunc() aliases skips rtrunc() and, therefore, domain validation.

Possible solutions

  • add validateDomain methods to each rtrunc method
  • transform family name validation into validateFamily() (like is done for domain)
  • Replace the valid_distros vector with the valid_fam_parm list on validateFamilyParms()

Dependencies

This is dependent on the closure of the following issues/PRs:

Function variables missing

Some functions are calling variables that don't exist in their scope. Namely (as for commit 6531470):

  • dtrunc.trunc_chisq: no visible binding for global variable ‘parm’

    TruncExpFam/R/chisq.R

    Lines 24 to 38 in 83c7189

    dtrunc.trunc_chisq <- function(y, eta, a = 0, b) {
    df <- natural2parameters.trunc_chisq(eta)
    dens <- ifelse((y <= a) | (y > b), 0, dchisq(y, df=df))
    if (!missing(a)) {
    F.a <- pchisq(a, parm) # FIXME: parm is not defined
    } else {
    F.a <- 0
    }
    if (!missing(b)) {
    F.b <- pchisq(b, parm)
    } else {
    F.b <- 1
    }
    return(dens / (F.b - F.a))
    }

  • dtrunc.trunc_exp: no visible binding for global variable ‘parm’

    dtrunc.trunc_exp <- function(y, eta, a = 0, b) {
    # TODO: develop rtrunc.exp?
    rate <- natural2parameters.trunc_exp(eta)
    dens <- ifelse((y <= a) | (y > b), 0, dexp(y, rate=rate))
    if (!missing(a)) {
    F.a <- pexp(a, rate)
    } else {
    F.a <- 0
    }
    if (!missing(b)) {
    F.b <- pexp(b, parm) # FIXME: parm is not defined
    } else {
    F.b <- 1
    }
    return(dens / (F.b - F.a))
    }

  • dtrunc.trunc_nbinom: no visible global function definition for ‘my.pbinom’

    dtrunc.trunc_nbinom <- function(y, eta, a = 0, b, ...) {
    # TODO: develop rtrunc.nbinom
    my.dnbinom <- function(nsize) {
    dnbinom(y, size = nsize, prob = proba)
    }
    my.pnbinom <- function(z, nsize) {
    pnbinom(z, size = nsize, prob = proba)
    }
    proba <- exp(eta)
    dens <- ifelse((y < a) | (y > b), 0, my.dnbinom(...))
    if (!missing(a)) {
    F.a <- my.pbinom(a - 1, ...)
    } else {
    F.a <- 0
    }
    if (!missing(b)) {
    F.b <- my.pbinom(b, ...)
    } else {
    F.b <- 1
    }
    return(dens / (F.b - F.a))
    }

  • get.grad.E.T.inv.trunc_nbinom: no visible binding for global variable ‘r’

    get.grad.E.T.inv.trunc_nbinom <- function(eta) {
    # eta: Natural parameter
    # return the inverse of E.T differentiated with respect to eta
    p=exp(eta)
    return(A = (1-p)^2/(r*p)) # FIXME: r not defined
    }

  • get.y.seq.trunc_invgauss: no visible binding for global variable ‘sd’

    get.y.seq.trunc_invgauss <- function(y, y.min, y.max, n = 100) {
    mean <- mean(y, na.rm = T)
    shape <- var(y, na.rm = T)^0.5
    lo <- max(max(0,y.min), mean - 3.5 * sd) # FIXME: sd not defined
    hi <- min(y.max, mean + 3.5 * sd)
    return(seq(lo, hi, length = n))
    }

These variables must be either internally calculated or passed as arguments.

Add family=gaussian as an argument to rtrunc

Part of the user experience should be the explicit specification of a family when calling rtrunc(). For compatibility with glm(), the argument should default to gaussian, though.

Extra, related task:

  • Check consistency between parameters passed and family (e.g. rtrunc(n=10, df=4, family=gaussian should yield an error an explain why)

Add probability and quantile functions

After we've implemented the r* and d* functions, work on p* and q*. Both would probably be numerically determined (though analytical solutions are of course preferred).

Maybe one function can do the job for all distributions, since the procedure is similar see sketch below)?

prtrunclnorm <- function(q, meanlog = 0, sdlog = 1, lower.tail = TRUE, log.p = FALSE) {
  y <- rtrunc.lognormal(n = 1e3, ...)
  # bootstrap y
  # get value of y_0 of y corresponding to q
  # return probability from -Inf to y_0
}

and q* is basically the other way around.

  • Implement ptrunc*
  • Implement qtrunc*

Documentation for rtrunc should group arguments by distro

The contents of ?rtrunc (source code here) lists all possible arguments together. This makes it hard for a user to know how to use the function, since the proper "pair" of arguments must be used (e.g. mu with sigma and not with lambda).

Ideally, the arguments should be grouped inside the documentation, but I don't know if Roxygen allows for this. In any case, there should be details about this in the @details section.

rtrunc signatures are too long

The rtrunc function uses signatures that involve so many arguments that generate documentation with names that are too long for an R package. This is the error from devtools::check():

E checking for portable file names
Found the following non-portable file paths:
TruncExpFam/man/rtrunc-numeric-numeric-ANY-missing-missing-missing-missing-missing-missing-missing-missing-numeric-method.Rd
TruncExpFam/man/rtrunc-numeric-numeric-ANY-missing-missing-missing-missing-missing-missing-numeric-numeric-missing-method.Rd
TruncExpFam/man/rtrunc-numeric-numeric-ANY-missing-missing-missing-missing-numeric-numeric-missing-missing-missing-method.Rd
TruncExpFam/man/rtrunc-numeric-numeric-ANY-missing-missing-numeric-numeric-missing-missing-missing-missing-missing-method.Rd
TruncExpFam/man/rtrunc-numeric-numeric-numeric-numeric-numeric-missing-missing-missing-missing-missing-missing-missing-method.Rd

Tarballs are only required to store paths of up to 100 bytes and cannot
store those of more than 256 bytes, with restrictions including to 100
bytes for the final component.
See section ‘Package structure’ in the ‘Writing R Extensions’ manual.
OK

The methods of rtrunc must be adapted to recognize simpler signatures, not only so that the package cleanly checks, but also to facilitate debugging (it's not eay to understand what distribution "rtrunc-numeric-numeric-numeric-numeric-numeric-missing-missing-missing-missing-missing-missing-missing" refers to).

Naming of functions

use ". " rather than "_" in function names: xxx.trunc_normal -> xxx.trunc.normal
And avoid using "trunc" twice as in "dtrunc.trunc_normal" Should be "dtrunc.normal"
And similar for all distributions.
Reluctant to do this, as I'm uncertain about the repercussions for other parts

Include more distributions

Add methods for these distributions:

  • Negative binomial
  • Exponential
  • Laplace
  • Inverse gaussian
  • Generalized inv gaussian
  • Beta

This depends on several generic functions being in place. In other words, the following issues must be closed first to avoid rework:

Make aliases for rtrunc methods

The idea is for a user to be able to call either one of those:

rtrunc(100, family="poisson", lambda=3)
rtruncpois(100, lambda=3)

So all rtrunc methods would have the same name as their untruncated counterparts in stats, but with the word "trunc" between "r" and the distribution name.

This can either be achieved by creating wrapper functions or aliases (preferred).

Truncation limits are not working on rtrunc

Summary

I guess I messed something up when writing the rtrunc() generic and now it's ignoring truncation limits. Should be easy to fix, though.

MRE

set.seed(10); rtrunc(n=10, mean=0, sd=4, a=0)

Observed result

[1]  0.07498468 -0.73701017 -5.48532220 -2.39667086  1.17818051  1.55917720 -4.83230470 -1.45470407 -6.50669073 -1.02591358
attr(,"class")
[1] "trunc_normal"

Expected result

No negative values.

Regroup unit tests

Unit tests are grouped by distribution, and it's a bit messy and hard to navigate (and it's not even 100 lines long!). I bet it would look much better with the following structure:

context("Sampling with rtrunc")
context("Matching output of stats::r*")
context("ML estimation")

Missing rinvnbinom function

Hi @rho62,

Line 7 below references an rinvnbinom() function which is not defined in the package, and I couldn't find it elsewhere in the R-verse.

rtrunc.nbinom <- function(n, size, prob, mu, a,b=Inf) {
y <- rinvnbinom(n, size, prob, mu)
if (!missing(a)) {
y <- y[y >= a]
}
if (!missing(b)) {
y <- y[y <= b]
}
class(y) <- "trunc_nbinom"
return(y)
}

Should we code this function or import it from another package?

Add short vignette

Add vignette showing package usage. Here's a quick draft:

x <- rtrunc(...)
dtrunc(x)
ml.estimation
  • Use an ml.estimation... example as a base
  • Add some text

Inconsistent class names

On some parts of the code, the rtrunc classes begin with trunc_ whereas in others they don't. This should be standardized ASAP to avoid confusion and code fragility.

ML estimation not working for Binomial

MRE

sample.binom <- rtrunc(n=1000, prob=0.6, size=20, a=4, b=10, family="binomial") 
ml_binom <- ml.estimation.trunc.dist(
	sample.binom, y.min = 4, max.it = 500, delta = 0.33, family = "Binomial", nsize = 10
)

Observed output

Error in y/... : invalid unary operator

Expected output

An estimation of p (plus, intermediate output for each iteration)

Remove redundant code

In rtrunc.normal:
.....
if (!missing(b)) {
y <- y[y <= b]
} else {
b <- Inf
}
class(y) <- "trunc_normal"
return(y)

Guess "else {b<-Inf } can" be deleted?

The same issue applies for other distributions gamma ao

Add unit tests comparing rtrunc with their stats:: counterparts

The following behavior should remain applicable as we further develop the package:

r$> set.seed(2); rbinom(n=100, size=10, prob=.4)                                             
  [1] 3 5 4 3 6 6 2 5 4 4 4 3 5 3 4 6 7 3 4 2 5 4 6 2 3 4 2 3 7 2 1 2 5 6 4 4 6 3 5 2 7 3 2 2
 [45] 6 5 7 3 4 5 1 1 5 6 3 5 5 8 4 5 5 6 4 3 6 4 4 4 3 2 3 3 1 3 3 5 3 6 4 4 3 5 1 4 3 6 7 3
 [89] 5 3 7 4 3 4 4 3 4 2 2 4

r$> set.seed(2); rtrunc(n=100, size=10, prob=.4, a=0, b=Inf)@sample                          
  [1] 3 5 4 3 6 6 2 5 4 4 4 3 5 3 4 6 7 3 4 2 5 4 6 2 3 4 2 3 7 2 1 2 5 6 4 4 6 3 5 2 7 3 2 2
 [45] 6 5 7 3 4 5 1 1 5 6 3 5 5 8 4 5 5 6 4 3 6 4 4 4 3 2 3 3 1 3 3 5 3 6 4 4 3 5 1 4 3 6 7 3
 [89] 5 3 7 4 3 4 4 3 4 2 2 4

r$> identical({set.seed(2); rbinom(n=100, size=10, prob=.4)}, {set.seed(2); rtrunc(n=100, siz
    e=10, prob=.4, a=0, b=Inf)@sample})                                                      
[1] TRUE

For the generated values, at least. The vector generated by rtrunc will probably not literally be the same as it will have a different class.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.