Git Product home page Git Product logo

stats's Introduction

Stats - Golang Statistics Package

A well tested and comprehensive Golang statistics library / package / module with no dependencies.

If you have any suggestions, problems or bug reports please create an issue and I'll do my best to accommodate you. In addition simply starring the repo would show your support for the project and be very much appreciated!

Installation

go get github.com/montanaflynn/stats

Example Usage

All the functions can be seen in examples/main.go but here's a little taste:

// start with some source data to use
data := []float64{1.0, 2.1, 3.2, 4.823, 4.1, 5.8}

// you could also use different types like this
// data := stats.LoadRawData([]int{1, 2, 3, 4, 5})
// data := stats.LoadRawData([]interface{}{1.1, "2", 3})
// etc...

median, _ := stats.Median(data)
fmt.Println(median) // 3.65

roundedMedian, _ := stats.Round(median, 0)
fmt.Println(roundedMedian) // 4

Documentation

The entire API documentation is available on GoDoc.org or pkg.go.dev.

You can also view docs offline with the following commands:

# Command line
godoc .              # show all exported apis
godoc . Median       # show a single function
godoc -ex . Round    # show function with example
godoc . Float64Data  # show the type and methods

# Local website
godoc -http=:4444    # start the godoc server on port 4444
open http://localhost:4444/pkg/github.com/montanaflynn/stats/

The exported API is as follows:

var (
    ErrEmptyInput = statsError{"Input must not be empty."}
    ErrNaN        = statsError{"Not a number."}
    ErrNegative   = statsError{"Must not contain negative values."}
    ErrZero       = statsError{"Must not contain zero values."}
    ErrBounds     = statsError{"Input is outside of range."}
    ErrSize       = statsError{"Must be the same length."}
    ErrInfValue   = statsError{"Value is infinite."}
    ErrYCoord     = statsError{"Y Value must be greater than zero."}
)

func Round(input float64, places int) (rounded float64, err error) {}

type Float64Data []float64

func LoadRawData(raw interface{}) (f Float64Data) {}

func AutoCorrelation(data Float64Data, lags int) (float64, error) {}
func ChebyshevDistance(dataPointX, dataPointY Float64Data) (distance float64, err error) {}
func Correlation(data1, data2 Float64Data) (float64, error) {}
func Covariance(data1, data2 Float64Data) (float64, error) {}
func CovariancePopulation(data1, data2 Float64Data) (float64, error) {}
func CumulativeSum(input Float64Data) ([]float64, error) {}
func Describe(input Float64Data, allowNaN bool, percentiles *[]float64) (*Description, error) {}
func DescribePercentileFunc(input Float64Data, allowNaN bool, percentiles *[]float64, percentileFunc func(Float64Data, float64) (float64, error)) (*Description, error) {}
func Entropy(input Float64Data) (float64, error) {}
func EuclideanDistance(dataPointX, dataPointY Float64Data) (distance float64, err error) {}
func GeometricMean(input Float64Data) (float64, error) {}
func HarmonicMean(input Float64Data) (float64, error) {}
func InterQuartileRange(input Float64Data) (float64, error) {}
func ManhattanDistance(dataPointX, dataPointY Float64Data) (distance float64, err error) {}
func Max(input Float64Data) (max float64, err error) {}
func Mean(input Float64Data) (float64, error) {}
func Median(input Float64Data) (median float64, err error) {}
func MedianAbsoluteDeviation(input Float64Data) (mad float64, err error) {}
func MedianAbsoluteDeviationPopulation(input Float64Data) (mad float64, err error) {}
func Midhinge(input Float64Data) (float64, error) {}
func Min(input Float64Data) (min float64, err error) {}
func MinkowskiDistance(dataPointX, dataPointY Float64Data, lambda float64) (distance float64, err error) {}
func Mode(input Float64Data) (mode []float64, err error) {}
func NormBoxMullerRvs(loc float64, scale float64, size int) []float64 {}
func NormCdf(x float64, loc float64, scale float64) float64 {}
func NormEntropy(loc float64, scale float64) float64 {}
func NormFit(data []float64) [2]float64{}
func NormInterval(alpha float64, loc float64,  scale float64 ) [2]float64 {}
func NormIsf(p float64, loc float64, scale float64) (x float64) {}
func NormLogCdf(x float64, loc float64, scale float64) float64 {}
func NormLogPdf(x float64, loc float64, scale float64) float64 {}
func NormLogSf(x float64, loc float64, scale float64) float64 {}
func NormMean(loc float64, scale float64) float64 {}
func NormMedian(loc float64, scale float64) float64 {}
func NormMoment(n int, loc float64, scale float64) float64 {}
func NormPdf(x float64, loc float64, scale float64) float64 {}
func NormPpf(p float64, loc float64, scale float64) (x float64) {}
func NormPpfRvs(loc float64, scale float64, size int) []float64 {}
func NormSf(x float64, loc float64, scale float64) float64 {}
func NormStats(loc float64, scale float64, moments string) []float64 {}
func NormStd(loc float64, scale float64) float64 {}
func NormVar(loc float64, scale float64) float64 {}
func Pearson(data1, data2 Float64Data) (float64, error) {}
func Percentile(input Float64Data, percent float64) (percentile float64, err error) {}
func PercentileNearestRank(input Float64Data, percent float64) (percentile float64, err error) {}
func PopulationVariance(input Float64Data) (pvar float64, err error) {}
func Sample(input Float64Data, takenum int, replacement bool) ([]float64, error) {}
func SampleVariance(input Float64Data) (svar float64, err error) {}
func Sigmoid(input Float64Data) ([]float64, error) {}
func SoftMax(input Float64Data) ([]float64, error) {}
func StableSample(input Float64Data, takenum int) ([]float64, error) {}
func StandardDeviation(input Float64Data) (sdev float64, err error) {}
func StandardDeviationPopulation(input Float64Data) (sdev float64, err error) {}
func StandardDeviationSample(input Float64Data) (sdev float64, err error) {}
func StdDevP(input Float64Data) (sdev float64, err error) {}
func StdDevS(input Float64Data) (sdev float64, err error) {}
func Sum(input Float64Data) (sum float64, err error) {}
func Trimean(input Float64Data) (float64, error) {}
func VarP(input Float64Data) (sdev float64, err error) {}
func VarS(input Float64Data) (sdev float64, err error) {}
func Variance(input Float64Data) (sdev float64, err error) {}
func ProbGeom(a int, b int, p float64) (prob float64, err error) {}
func ExpGeom(p float64) (exp float64, err error) {}
func VarGeom(p float64) (exp float64, err error) {}

type Coordinate struct {
    X, Y float64
}

type Series []Coordinate

func ExponentialRegression(s Series) (regressions Series, err error) {}
func LinearRegression(s Series) (regressions Series, err error) {}
func LogarithmicRegression(s Series) (regressions Series, err error) {}

type Outliers struct {
    Mild    Float64Data
    Extreme Float64Data
}

type Quartiles struct {
    Q1 float64
    Q2 float64
    Q3 float64
}

func Quartile(input Float64Data) (Quartiles, error) {}
func QuartileOutliers(input Float64Data) (Outliers, error) {}

Contributing

Pull request are always welcome no matter how big or small. I've included a Makefile that has a lot of helper targets for common actions such as linting, testing, code coverage reporting and more.

  1. Fork the repo and clone your fork
  2. Create new branch (git checkout -b some-thing)
  3. Make the desired changes
  4. Ensure tests pass (go test -cover or make test)
  5. Run lint and fix problems (go vet . or make lint)
  6. Commit changes (git commit -am 'Did something')
  7. Push branch (git push origin some-thing)
  8. Submit pull request

To make things as seamless as possible please also consider the following steps:

  • Update examples/main.go with a simple example of the new feature
  • Update README.md documentation section with any new exported API
  • Keep 100% code coverage (you can check with make coverage)
  • Squash commits into single units of work with git rebase -i new-feature

Releasing

This is not required by contributors and mostly here as a reminder to myself as the maintainer of this repo. To release a new version we should update the CHANGELOG.md and DOCUMENTATION.md.

First install the tools used to generate the markdown files and release:

go install github.com/davecheney/godoc2md@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
brew tap git-chglog/git-chglog
brew install gnu-sed hub git-chglog

Then you can run these make directives:

# Generate DOCUMENTATION.md
make docs

Then we can create a CHANGELOG.md a new git tag and a github release:

make release TAG=v0.x.x

To authenticate hub for the release you will need to create a personal access token and use it as the password when it's requested.

MIT License

Copyright (c) 2014-2023 Montana Flynn (https://montanaflynn.com)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORpublicS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

stats's People

Contributors

a-robinson avatar alixaxel avatar andrey-yantsen avatar cjongseok avatar edupsousa avatar ginodeis avatar gu1nness avatar instance01 avatar janberktold avatar kazhuravlev avatar kjellkvinge avatar kreativka avatar kunde21 avatar mishrashivendra avatar montanaflynn avatar nurjeff avatar orthographic-pedant avatar pararang avatar rvchess avatar samueldixxon avatar saromanov avatar shiyan2016 avatar toashd avatar tzzed avatar vadv avatar xiaobogaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stats's Issues

Trouble with trying to add Changelog and Documentation

Hello, I wanted to try and practice some of the go programming language, while contributing to an open source package. I was able to write some functions and test those functions utilizing the makefile provided. However, I had trouble trying to get the packages for changelog / documention . MD file updates

go get github.com/davecheney/godoc2md
go get github.com/golangci/golangci-lint/cmd/golangci-lint

I ran the top command and had some errors, which I believe yielded me in errors for running the bottom command. I checked the top repository and it looks like it is no longer being developed.

I've never contributed in this way and don't have experience in making the edits to the markdown files, but am trying to learn more of all things git and software.

NaN when running exponential regression

[{0 2.5} {1 5} {4 25} {6 5} {8 5} {11 15} {12 2.5} {14 25} {15 0} {16 1.6666666666666667} {17 40} {20 5} {21 15} {22 20} {23 16.666666666666668} {24 13.333333333333334} {25 50} {26 18} {27 75} {28 21} {29 0} {30 5} {31 37.5} {32 5} {34 40} {36 5} {37 39}]

When running exponential regression with the series above. The result is:

[{0 NaN} {1 NaN} {4 NaN} {6 NaN} {8 NaN} {11 NaN} {12 NaN} {14 NaN} {15 NaN} {16 NaN} {17 NaN} {20 NaN} {21 NaN} {22 NaN} {23 NaN} {24 NaN} {25 NaN} {26 NaN} {27 NaN} {28 NaN} {29 NaN} {30 NaN} {31 NaN} {32 NaN} {34 NaN} {36 NaN} {37 NaN}]

Is this desired? If so could you maybe elaborate on why this is happens.

support string in LoadRawData()

Thank you for this great package.
I added support for a string and io.Reader in LoadRawData() so it support whitespace separated strings.

i.e

stats.LoadRawData("1.1 2 3.0 4 5")

// or
stats.LoadRawData(os.Stdin)

Is this something you would consider implemented in you package? If so, I can create a pull request.

kk

How to calculate the weighted percentile?

See: https://en.wikipedia.org/wiki/Percentile#The_weighted_percentile_method

My current code is as follows. But I don't know how to support weighted percentile?

func TestPercentile(t *testing.T) {
	values := []float64{4, 5, 3, 1, 2}
	percentiles := []float64{}
	for i := 1; i <= 100; i++ {
		percentile, err := stats.PercentileNearestRank(values, float64(i))
		if err != nil {
			panic(err)
		}
		percentiles = append(percentiles, percentile)
		fmt.Printf("%d%%: %f, ", i, percentile)
	}
	println()
	println()

	for f := 0.0; f <= 5; f += 0.1 {
		index := sort.SearchFloat64s(percentiles, f + 0.00000001)
		fmt.Printf("%f: %d%%, ", f, index)
	}
	println()
	println()
}

Using an interface to support []float64 and []int

I have a feeling it might be possible to use an interface to support both []float64 and []int data. However I've not designed Public interfaces or worked around the lack of generics myself so I'll either have to do some research and hacking or have the excellent community of Gophers help in this area or tell me my attempts will be futile. Either way, any feedback is appreciated!

A few tests fail in different architectures due to precision errors

When running the test suite on s390x and ppc64le architectures, I get the following output:

go test -compiler gc -ldflags '' github.com/montanaflynn/stats
--- FAIL: TestCorrelation (0.00s)
	correlation_test.go:33: Correlation 0.9912407071619304 != 0.9912407071619302
	correlation_test.go:47: Correlation 0.9912407071619304 != 0.9912407071619302
--- FAIL: TestOtherDataMethods (0.00s)
	data_test.go:22: github.com/montanaflynn/stats.(Float64Data).Correlation-fm() => 0.2087547359760545 != 0.20875473597605448
	data_test.go:22: github.com/montanaflynn/stats.(Float64Data).Pearson-fm() => 0.2087547359760545 != 0.20875473597605448
	data_test.go:22: github.com/montanaflynn/stats.(Float64Data).Covariance-fm() => 7.381421553571428 != 7.3814215535714265
	data_test.go:22: github.com/montanaflynn/stats.(Float64Data).CovariancePopulation-fm() => 6.458743859374999 != 6.458743859374998
--- FAIL: TestLinearRegression (0.00s)
	regression_test.go:19: [{1 2.380000000000002} {2 3.080000000000001} {3 3.7800000000000002} {4 4.4799999999999995} {5 5.179999999999999}] != 2.3800000000000026
	regression_test.go:23: [{1 2.380000000000002} {2 3.080000000000001} {3 3.7800000000000002} {4 4.4799999999999995} {5 5.179999999999999}] != 3.0800000000000014
	regression_test.go:31: [{1 2.380000000000002} {2 3.080000000000001} {3 3.7800000000000002} {4 4.4799999999999995} {5 5.179999999999999}] != 4.479999999999999
	regression_test.go:35: [{1 2.380000000000002} {2 3.080000000000001} {3 3.7800000000000002} {4 4.4799999999999995} {5 5.179999999999999}] != 5.179999999999998
--- FAIL: TestLogarithmicRegression (0.00s)
	regression_test.go:94: [{1 2.152082236381168} {2 3.330555922249221} {3 4.019918836568675} {4 4.509029608117274} {5 4.8884133966836645}] != 2.1520822363811702
	regression_test.go:98: [{1 2.152082236381168} {2 3.330555922249221} {3 4.019918836568675} {4 4.509029608117274} {5 4.8884133966836645}] != 3.3305559222492214
	regression_test.go:102: [{1 2.152082236381168} {2 3.330555922249221} {3 4.019918836568675} {4 4.509029608117274} {5 4.8884133966836645}] != 4.019918836568674
	regression_test.go:106: [{1 2.152082236381168} {2 3.330555922249221} {3 4.019918836568675} {4 4.509029608117274} {5 4.8884133966836645}] != 4.509029608117273
	regression_test.go:110: [{1 2.152082236381168} {2 3.330555922249221} {3 4.019918836568675} {4 4.509029608117274} {5 4.8884133966836645}] != 4.888413396683663
FAIL
FAIL	github.com/montanaflynn/stats	0.003s

I also opened a similar bug report for x/image with a bug that seems to be related to this one golang/go#21460

In addition, the tests also fail for i686 architectures, with a different output:

go test -compiler gc -ldflags '' github.com/montanaflynn/stats
--- FAIL: TestLogarithmicRegression (0.00s)
	regression_test.go:94: [{1 2.1520822363811654} {2 3.3305559222492205} {3 4.019918836568676} {4 4.509029608117276} {5 4.888413396683665}] != 2.1520822363811702
	regression_test.go:98: [{1 2.1520822363811654} {2 3.3305559222492205} {3 4.019918836568676} {4 4.509029608117276} {5 4.888413396683665}] != 3.3305559222492214
	regression_test.go:102: [{1 2.1520822363811654} {2 3.3305559222492205} {3 4.019918836568676} {4 4.509029608117276} {5 4.888413396683665}] != 4.019918836568674
	regression_test.go:106: [{1 2.1520822363811654} {2 3.3305559222492205} {3 4.019918836568676} {4 4.509029608117276} {5 4.888413396683665}] != 4.509029608117273
	regression_test.go:110: [{1 2.1520822363811654} {2 3.3305559222492205} {3 4.019918836568676} {4 4.509029608117276} {5 4.888413396683665}] != 4.888413396683663
FAIL
FAIL	github.com/montanaflynn/stats	0.005s

Note that this does not seem to be related to the issue mentioned above for x/image.

Go module version 0.5.0

The source code from go get github.com/montanaflynn/stats on my project with go module is different from the source you have in your GitHub repo. It shows me [email protected] but the changes log for ver0.5.0 isn't on the code I have received. Here is the go sum:
github.com/montanaflynn/stats v0.5.0 h1:2EkzeTSqBB4V4bJwWrt5gIIrZmpJBcoIRGS2kWLgzmk= github.com/montanaflynn/stats v0.5.0/go.mod h1:wL8QJuTMNUDYhXwkmfOly8iTdp5TEcJFWZD2D7SIkUc=

Mode is not calculated correctly

Hi,
First of all, congratulations and thanks for your work.

Using the function for calculate the mode, I've found that when the mode is a low value compared with the rest, and the data array is relatively long, an incorrect result occurs:

Example:

var data = []float64{1, 2, 3, 4, 4, 4, 4, 4, 5, 3, 6, 7, 5, 0, 8, 8, 7, 6, 9, 9}
mode, _ := stats.Mode(data)
fmt.Printf("%v\n", data)

// Result: [4 8]

As we can see the result is incorrect, because the function should return [4]

After analyze the results and study the function code, I've located the problem and this is the fix:

File: https://github.com/montanaflynn/stats/blob/master/mode.go

package stats

// Mode gets the mode [most frequent value(s)] of a slice of float64s
func Mode(input Float64Data) (mode []float64, err error) {
    // Return the input if there's only one number
    l := input.Len()
    if l == 1 {
        return input, nil
    } else if l == 0 {
        return nil, EmptyInput
    }

    c := sortedCopyDif(input)
    // Traverse sorted array,
    // tracking the longest repeating sequence
    mode = make([]float64, 5)
    cnt, maxCnt := 1, 1
    for i := 1; i < l; i++ {
        switch {
        case c[i] == c[i-1]:
            cnt++
        case cnt == maxCnt && maxCnt != 1:
            mode = append(mode, c[i-1])
            cnt = 1
        case cnt > maxCnt:
            mode = append(mode[:0], c[i-1])
            maxCnt, cnt = cnt, 1
        // :: the fix - reset the counter ::
        default:
            cnt = 1
        // :: end fix ::
        }
    }
    switch {
    case cnt == maxCnt:
        mode = append(mode, c[l-1])
    case cnt > maxCnt:
        mode = append(mode[:0], c[l-1])
        maxCnt = cnt    
    }


    // Since length must be greater than 1,
    // check for slices of distinct values
    if maxCnt == 1 {
        return Float64Data{}, nil
    }

    return mode, nil
}

I don't know if the solution convinces you, but it works. If you see it well, I can make the changes and emit a pull request to correct it sooner.

Adding single-pass descriptive stats for large data sets

Hi, wondered if I could contribute by adding single-pass descriptive stats for people working with large datasets? This will simply return mean, sdev, var, min, max, correlation. All the things you have, but for situations where Float64Data would be too big.

[enhancement] Error message on percentile calc is slightly misleading

Using the library and getting error messages that Input is outside of range, which turned out to be that the data array had a single value, (which I fully realise is not suitable for a percentile calc!). Would it be possible to change the error message to be more informative, or add a documentation item?

To reproduce:

package main

import (
	"fmt"
	"github.com/montanaflynn/stats"
)

func main()  {
	points1 := []float64{1.0,2.0,3.0}
	points2 := []float64{123.0}
	sa1 := stats.LoadRawData(points1)
	sa2 := stats.LoadRawData(points2)
	if vPerc1, err := stats.Percentile(sa1,90.0); err == nil {
		fmt.Printf("Calc'd 90th percentile for multivalue, it's %f\n",vPerc1)
	} else {
		fmt.Printf("Error calculating 90th percentile for multivalue, err: %s\n",err.Error())
	}
	if vPerc2, err := stats.Percentile(sa2,90.0); err == nil {
		fmt.Printf("Calc'd 90th percentile for single value, it's %f\n",vPerc2)
	} else {
		fmt.Printf("Error calcultating 90th percentile for single value, err: %s\n",err.Error())
	}
}

which gives output of:

Calc'd 90th percentile for multivalue, it's 2.500000
Error calcultating 90th percentile for single value, err: Input is outside of range.

undefined: stats.MedianAbsoluteDeviationPopulation

Hi,

I am not able to use the MedianAbsoluteDeviationPopulation function,

If I use "go doc", I do not see all functions:
$ go doc stats

package stats // import "github.com/zizmos/ego/vendor/github.com/montanaflynn/stats"

func Correlation(data1, data2 Float64Data) (float64, error)
func Covariance(data1, data2 Float64Data) (float64, error)
func GeometricMean(input Float64Data) (float64, error)
func HarmonicMean(input Float64Data) (float64, error)
func InterQuartileRange(input Float64Data) (float64, error)
func Max(input Float64Data) (max float64, err error)
func Mean(input Float64Data) (float64, error)
func Median(input Float64Data) (median float64, err error)
func Midhinge(input Float64Data) (float64, error)
func Min(input Float64Data) (min float64, err error)
func Mode(input Float64Data) (mode []float64, err error)
func Percentile(input Float64Data, percent float64) (percentile float64, err error)
func PercentileNearestRank(input Float64Data, percent float64) (percentile float64, err error)
func PopulationVariance(input Float64Data) (pvar float64, err error)
func Round(input float64, places int) (rounded float64, err error)
func Sample(input Float64Data, takenum int, replacement bool) ([]float64, error)
func SampleVariance(input Float64Data) (svar float64, err error)
func StandardDeviation(input Float64Data) (sdev float64, err error)
func StandardDeviationPopulation(input Float64Data) (sdev float64, err error)
func StandardDeviationSample(input Float64Data) (sdev float64, err error)
func StdDevP(input Float64Data) (sdev float64, err error)
func StdDevS(input Float64Data) (sdev float64, err error)
func Sum(input Float64Data) (sum float64, err error)
func Trimean(input Float64Data) (float64, error)
func VarP(input Float64Data) (sdev float64, err error)
func VarS(input Float64Data) (sdev float64, err error)
func Variance(input Float64Data) (sdev float64, err error)
type Coordinate struct{ ... }
func ExpReg(s []Coordinate) (regressions []Coordinate, err error)
func LinReg(s []Coordinate) (regressions []Coordinate, err error)
func LogReg(s []Coordinate) (regressions []Coordinate, err error)
type Float64Data []float64
type Outliers struct{ ... }
func QuartileOutliers(input Float64Data) (Outliers, error)
type Quartiles struct{ ... }
func Quartile(input Float64Data) (Quartiles, error)
type Series []Coordinate
func ExponentialRegression(s Series) (regressions Series, err error)
func LinearRegression(s Series) (regressions Series, err error)
func LogarithmicRegression(s Series) (regressions Series, err error)

However, MedianAbsoluteDeviationPopulation function is a public function in the implementation.
$go version
go version go1.10.2 darwin/amd64

$dep status
....
....
github.com/montanaflynn/stats ^0.2.0 0.2.0 eeaced0 0.2.0 1
...
...

Add mature test suites

Even though I've spent a lot of time writing tests for stats I think it could benefit from incorporating other more mature test suites from other statistics tools as well. For instance here's a NIST test suite used by Gnu gsl which could be ported to Go and added as a test for stats.

I'm sure there are other test suites as well, let me know if you have suggestions or want to help with this!

Release strategy

Since the public API isn't finalized I've been suggesting to simply clone and vendor into your projects but I'd like others to be able to take advantage of tools like godep, glide, gopkg.in to install stats into their projects.

How can we best release changes to stats? I'd like to automate it if possible, as of now I'm building the CHANGELOG.md and git tagging manually which is slow and error prone.

Does anyone have experience with releasing packages into the Golang ecosystem?

Public API Discussion

Would love to have some discussion on the Public API. Specifically I want to know if having types with methods in addition to the functions makes sense. I think it does but maybe having two ways to do something is confusing to some. Here's what I mean by two ways of doing things:

var data = []float64{1, 2, 3, 4, 4, 5}
median, _ := stats.Median(data)
fmt.Println(median) // 3.5

var d stats.Float64Data = data
median, _ = d.Median()
fmt.Println(median) // 3.5

Please share your thoughts with me!

[Suggestion] Calculate Quartile from the instance of Float64Data

Nice package, I am using it right now. And I found an inconsistency while calculating the quartiles. Any reason why we must pass the data/input to calculate Quartile? Why not use the instance. If there is no specific reason, I suggest adding the Quartiles method on the Float64Data's struct without any input, but use the current instance, like we use Mean(), Max(), etc.

Suggestion:

func (f Float64Data) Quartiles() (Quartiles, error) {
	return Quartile(f)
}

If this is possible, I will make the MR.

Edge cases with Percentiles

I believe there are some errors with the Percentiles() edge cases.

Passing 0 as the percent will cause an error as will a small set of data and a small percentage (such that the c[i-1] is out of bounds because i = 0.

I'm not sure the best approach to fix this, as picking which index is quite critical for the correct result, but I think this might work?

index := (percent / 100) * float64(len(c) - 1)

And then use c[i] and c[i+1] later on. Using c[i+1] would be dangerous with input.Len() == 1 and maybe in the case of 99.9 percent and few values?

Exported error values

Instead of the error values being generated within the functions, globally-defined errors would allow error handling in the calling function to be done without parsing the error string.

This error is generated in many places, but there's no way to compare two error values without comparing the string itself. Additionally, there's no guarantee that two "Empty input errors" have the exact same error string.

    if input.Len() == 0 {
        return 0, errors.New("Input must not be empty")
    }

If all of those errors are collected into a set of exported constants:

type struct StatErr {
      err string
}

func (s StatErr) Error() string {
        return err
}

const (
        EmptyArrayError = StatErr{"Empty input can't be processed"}
        ...
)

That way an error value can be identified by these constants:

In the code:

    if input.Len() == 0 {
        return 0, EmptyArrayError
    }

In the call:

    v, err := stat.Mean(input)
    if err == stat.EmptyArrayError {
        //Handle the specific error
    }

This would clean up the error returns in the library and make error handling easier for anyone using it.

T-Test

Hi, I was thinking of implementing t-test, should I add it as a pull request?

Percentile Calculation Bug?

Given a slice a := []float64{0, 300, 600}
stats.Percentile(a, 50) should return 300. However it returns 150.

Please Provide Annotated Release Tags

This package is used by Gitea which is a package I'm currently working to get into Debian. This means I get to do a review of all build dependencies. While working through your project, I saw that tags are used to mark releases, but they are not being annotated.

Unannotated release tags end up causing some headaches for packaging systems that monitor upstream activity--mostly for new releases--because the information is missing from 'git describe'. To annotate a tag, it just needs the -a flag passed. (git tag -a).

If you're willing to, it's possible to update the current tags (or just latest) with annotation. I've included some links [1] [2] that explain the process.

If you choose not to update tags, it would still be hugely appreciated if you could use annotated tags in the future.

[1] http://sartak.org/2011/01/replace-a-lightweight-git-tag-with-an-annotated-tag.html
[2] http://stackoverflow.com/questions/5002555/can-a-lightweight-tag-be-converted-to-an-annotated-tag

p-value in Fisher's Exact Test is different in output and print

Hi, when I run Fisher's exact test. The print p-value is differenct from the results's p-value.
For example, in print, the p-value < 2.2e-16. However, the jk$p.value is 1.899826e-32.
Is there any thing wrong in the function? Thank you.

> fm_dn
       Down    No
Yes  281   326
No  4074 13090

jk = janitor::fisher.test(fm_dn)

> jk

	Fisher's Exact Test for Count Data

data:  fm_dn
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 2.343308 3.271687
sample estimates:
odds ratio 
  2.769178 

> jk$p.value
[1] 1.899826e-32

panic: sync: negative WaitGroup counter in dbscan.go

We use your library for our open-source photo app, in particular the DBSCAN implementation. Thanks for providing it!

While it works great for me, a developer reported issues with panics in dbscan.go, line 251. Seems w.Done() may be called too often, probably depending on input data. Couldn't reproduce it with my local samples.

Our related code and GitHub issue:

Trace:

panic: sync: negative WaitGroup counter

goroutine 3326 [running]:
sync.(*WaitGroup).Add(0xc004761070, 0xffffffffffffffff)
         /usr/local/go/src/sync/waitgroup.go:74 +0x147
 sync.(*WaitGroup).Done(...)
         /usr/local/go/src/sync/waitgroup.go:99
 github.com/mpraski/clusters.(*dbscanClusterer).nearestWorker(0xc000cce3c0)
        /go/pkg/mod/github.com/mpraski/[email protected]/dbscan.go:251 +0x230
 created by github.com/mpraski/clusters.(*dbscanClusterer).startNearestWorkers
         /go/pkg/mod/github.com/mpraski/[email protected]/dbscan.go:228

Describe function

I think it cool be great to have a Describe() function like pandas.describe().

Standard Deviation

Hi
I found sth on paper which shows we can calculate SD of a dataset from SD's of its subsets :)
e.g : NEW_FUNCTION(SD({1,3}), SD(5), medians) = SD({1,3,5})
and the use case is where we have calculated the SD of 1000 rows of data and want to calculate SD of 1000 + 1 without processing former data
is this implemented in this library cause i didn't see such a thing
if it seems proper let me know to submit the pull request.
Thanks

AutoCorrelation bug.

Hello Flynn. I tried to use your autocorrelation function, but found a strange thing, it returns very similar and incorrect values for some lags.
Example:
We have the following sequence:
[22, 24, 25, 25, 28, 29, 34, 37, 40, 44, 51, 48, 47, 50, 51]
In this case we must obtain the following sequence of values of the autocorrelation function:
[1, 0.83174224, 0.65632458, 0.49105012, 0.27863962, 0.03102625, -0.16527446, -0.30369928, -0.40095465, -0.45823389, -0.45047733]
For lags 0,1,2,3... respectively.
But your function return that:
[0, 0.8317422434367543, 0.8871917263325378, 0.8908883585255901, 0.8911348006717935,...]

for i := 0; i < lags; i++ {
	v := (data[0] - mean) * (data[0] - mean)
	for i := 1; i < len(data); i++ {
		delta0 := data[i-1] - mean
		delta1 := data[i] - mean
		q += (delta0*delta1 - q) / float64(i+1)
		v += (delta1*delta1 - v) / float64(i+1)
	}
	result = q / v
}

And in your function there is this strange loop through the lag range, in which you overwrite the value of the variable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.