Git Product home page Git Product logo

gburtini / probability-distributions-for-php Goto Github PK

View Code? Open in Web Editor NEW
57.0 10.0 17.0 186 KB

PHP implementation of statistical probability distributions: normal distribution, beta distribution, gamma distribution and more.

License: MIT License

PHP 100.00%
statistics statistical-distributions probability-distributions normal-distribution binomial-distribution bernoulli-distribution beta-distribution gamma-distribution t-distribution dirichlet-distribution

probability-distributions-for-php's Introduction

Hi there ๐Ÿ‘‹

There's an assortment of personal hobby projects here, but due to intellectual property rights and non-disclosure agreements, the majority of my work takes place in private repositories. Largely nothing here should be assumed to be production ready, but I'm happy to chat if you need a hand applying anything you find here.

โšก Selected Projects

A gently curated list of projects that might have some more general public interest value.

๐Ÿ“ซ Contact

Email me: [email protected].

If you would like to send me encrypted messages, you can find my GPG key on the MIT keyserver. The correct key has fingerprint 4D09 3BE8 C030 9D5D.

probability-distributions-for-php's People

Contributors

gaswelder avatar gburtini avatar giftonian avatar gustawdaniel avatar mossadal avatar nefski avatar peter279k avatar splitbrain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

probability-distributions-for-php's Issues

PHP7 - All instances of mt_rand() should be changed to random_int();

According to the PHP7 documentation, a new function random_int(); has been introduced in PHP7 that is more cryptographically appropriate as well as a more 'truly' random number.

The new documentation has the following to say,

Generates cryptographic random integers that are suitable for use where unbiased results are critical, such as when shuffling a deck of cards for a poker game.

Accordingly, the documentation for mt_rand(); has been updated with the release of 7.0 to say the following:

Caution: This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using random_int(), random_bytes(), or openssl_random_pseudo_bytes() instead.

PHP.net docs - http://php.net/manual/en/function.random-int.php

Add versioning to the package

Hi, would it be possible to add versioning to the Composer package so that we can lock our code onto a specific version and be sure that no changes were made to the code we depend on between issuing individual composer install commands?

Interface of normal distribution - notice about convention

In this package Normal distribution have two parameters: mean and variance. But in many mathematical programs there is mean and standard derivation used. Examples:

  • Matlab

https://www.mathworks.com/help/stats/prob.normaldistribution.html

Parameters: mu, sigma

  • Mathematica

https://reference.wolfram.com/language/ref/NormalDistribution.html

Parameters: [ฮผ,ฯƒ]

  • R

https://www.tutorialspoint.com/r/r_normal_distribution.htm

Parameters [x, mean, sd]


What we should do now? It is breaking change, so I propose to not change nothing now in this package but:

  • change interface of Normal from (mean, variance) to (mean, sigma) in next major breaking release and type it in migration documentation
  • now describe it in more visible manner. Because of In my opinion it is default approach that Normal dist should have (mean, sigma) as parameters and it should be marked more bold.

Parameters in Gamma distribution

The code for the Gamma distribution is very incomplete -- the class only basically only contains code for random number generation from a Gamma distribution.

I implemented the pdf, cdf, icdf as well as unit tests, and noticed that the parameters are named $shape and $rate, which would seem congruent with alpha and beta as described in Wikipedia's description of the Gamma distribution.

Running the unit tests on Gamma::draw gives a sample of points that seem to be Gamma(alpha, 1/beta) distributed, suggesting that $rate is really $scale.

What is your preference? Changing the parameters to $shape and $scale or keeping $shape and $rate and correct Gamma::draw?

Abstract simulation testing strategy.

When testing if an implementation probabilistically fits the expected distribution, we should be able to abstract simulatory code something like this to test probability cutoff and means. We could also compute expected variances and such within a bound.

$scale = 50000;
$cutoff = 10.0;
$counter = 0;
$draws = new SplFixedArray($scale);
for($i = 0; $i < $scale; $i++) {
   $x = $d->rand();
   $draws[$i] = $x;
   if ($x > $cutoff) $counter = $counter + 1;
}
$number = array_sum((array) $draws) / count($draws);
$this->assertEquals( $number,7.0, "Attempting to draw from P(7.0) {$scale} times gives us a value too far from the expected mean. This could be just random chance.", 0.01);

$p = $counter / $scale;
$this->assertEquals(1-$d->cdf($cutoff), $p, "Attempting to draw from P(7.0) {$scale} times gives the wrong number of values greater than {$cutoff}. This could be just random chance.", 0.01);

A single implementation of this sort of testing that can be reused in the tests will have a lot of value in this project. Speed ($scale) is a concern though.

Add parameter estimators for statistical distributions

add parameter estimators for statistical distributions

also average log-likilihood functions. given a set of observations, what is the avg log likilihood of them given the parameters of the distribution.

then, for those distributions without closed form MLE estimators (such as beta), one can use newton's method on the avg log likilihood. (maybe using method of moments to get the initial point.)

binomial icdf returns value > 1.0

the binomial cdf prcduces a value greater than 1.0 when for N (the number of trials). If this is value is used in the icdf function, an error occurs.

Example:

 $bin1 = new Binomial(10,0.2);
 $cdf1=$bin1->cdf(10);
 $xxDo=$bin1->icdf($cdf1);
 print " For bin(10,0.2) : cdf(10) = $cdf1. Inverse of this= $xxDo";

yields an error message:

 ( ! ) InvalidArgumentException: Parameter ($p = 1.0000000000000011 must  be between 0 and 1. in

Code organization for new distributions?

I intend to contribute a couple of additional distributions, Weibull, chi^2 and lognormal are highest on my list.

Do you still think that the new code should support pre 5.3 with the ugly folder containing the actual implementation, or should I put the new code directly in the namespaced folder?

BadMethodCallException: PDF not implemented. Please create a pull request if you implement it yourself. in /Library/WebServer/Documents/playground-raj/vendor/gburtini/distributions/src/gburtini/Distributions/Distribution.php on line 10

screenshot-localhost-2018 10 15-12-27-26

My COde

<?php
    include("./vendor/autoload.php");
    use gburtini\Distributions\Beta;
    $beta = new Beta(81, 219);
    $draw = $beta->rand();
    echo $draw;

    $beta->pdf(.9);
    // $beta->cdf($x) = [0,1] non-decreasing
    // $beta::quantile($y in [0,1]) = [0,1] (aliased Beta::icdf)
    // $beta->rand() = [0,1]
?>

Tag a composer release.

The current state of dev-master has some considerable changes that I have not had time to play with. I'm reluctant to mark it a new release until someone has confirmed that the documentation, test coverage and actual behavior are all aligned.

Resolve licensing choices.

This library is missing a clear license. I will want to clear any license choice with @mossadal and @graemedouglas as they have made significant contributions to the library without a clear license at this point. GPL or BSD/MIT should be considered.

The Cepheus library by which some of the code is derived is itself released under a very ambiguous license. This in some sense compromises the integrity of our own ability to relicense this library. https://lists.debian.org/debian-legal/2004/12/msg00295.html

Student's T distribution buggy

I worked through the next distribution on my list and found several issues with the implementation of StudentT.

The most serious being that T::icdf is returning values that are way off. For example

$T = new T(6);
echo $T->icdf(0.2)

gives -4.8989794855663567 instead of the expected -0.90570325863942802.

Also, T::pdfdoesn't reliably give an accuracy of better than 1e-5 or so, at least not for moderately large inputs. For example

$T = new T(6);
echo $T->pdf(-5)

returns 0.0012208409808473011 instead of 0.00122617088037926.

I'll try to look into this today, but the likely culprit is the implementation of the beta, incomplete beta and inverse incomplete beta functions, and they don't look very appealing to dig through ;)

If I don't find a solution, I'll at least give you a PR with some tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.