Git Product home page Git Product logo

multikde.jl's Introduction

MultiKDE

Actions Status codecov

A kernel density estimation library, what make this one different from other Julia KDE libraries are:

  1. Multidimension: Using product kernel to estimate multi-dimensional kernel density.
  2. Lazy evaluation: Doesn't pre-initialize a KDE, only evaluate points when necessary.
  3. Categorical distribution: This library supports categorical KDE using two specific kernel functions Wang-Ryzin and Aitchson-Aitken, in which the former one is for categorical distribution that is ordered (age, amount...), the latter is for categorical distribution that is unordered (sex, the face of the coin...). When using unordered categorical distribution, non-numeric objects are also supported.

Use

Example [notebook]

One-dimension KDE

using MultiKDE
using Distributions, Random, Plots

# Simulation
bws = [0.05 0.1 0.5]
d = Normal(0, 1)
observations = rand(d, 50)
granularity_1d = 100
x = Vector(LinRange(minimum(observations), maximum(observations), granularity_1d))
ys = []
for bw in bws
    kde = KDEUniv(ContinuousDim(), bw, observations, MultiKDE.gaussian)
    y = [MultiKDE.pdf(kde, _x, keep_all=false) for _x in x]
    push!(ys, y)
end

# Plot
highest = maximum([maximum(y) for y in ys])
plot(x, ys, label=bws, fmt=:svg)
plot!(observations, [highest+0.05 for _ in 1:length(ys)], seriestype=:scatter, label="observations", size=(900, 450), legend=:outertopright)

1d KDE visualization

Multi-dimension KDE

using MultiKDE
using Distributions, Random, Plots

# Simulation
dims = [ContinuousDim(), ContinuousDim()]
bws = [[0.3, 0.3], [0.5, 0.5], [1, 1]]
mn = MvNormal([0, 0], [1, 1])
observations = rand(mn, 50)
observations = [observations[:, i] for i in 1:size(observations, 2)]
observations_x1 = [_obs[1] for _obs in observations]
observations_x2 = [_obs[2] for _obs in observations]
granularity_2d = 100
x1_range = LinRange(minimum(observations_x1), maximum(observations_x1), granularity_2d)
x2_range = LinRange(minimum(observations_x2), maximum(observations_x2), granularity_2d)
x_grid = [[_x1, _x2] for _x1 in x1_range for _x2 in x2_range]
y_grid = []
for bw in bws
    kde = KDEMulti(dims, bw, observations)
    y = [MultiKDE.pdf(kde, _x) for _x in x_grid]
    push!(y_grid, y)
end

# Plot
highest = maximum([maximum(y) for y in y_grid])
plot([_x[1] for _x in x_grid], [_x[2] for _x in x_grid], y_grid, label=[bw[1] for bw in bws][:, :]', size=(900, 450), legend=:outertopright)
plot!(observations_x1, observations_x2, [highest for _ in 1:length(observations)], seriestype=:scatter, label="observations")

2d KDE visualization

Post

MultiKDE.jl: A Lazy Evaluation Multivariate Kernel Density Estimator

Liscense

Licensed under MIT Liscense.

Contact

[email protected]

multikde.jl's People

Contributors

davide-f avatar github-actions[bot] avatar pizhn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

davide-f

multikde.jl's Issues

Allow sampling from the Multivariate KDE

Is there a way to generate samples from KDE approximation of the dataset?
It would be especially nice in order to implement "kombine"-style monte carlo samplers.

Thank you in advance!
Francesco

Multithreading issue

Hi,

I am trying to apply multithreading approach to your codes
It looks like your KDE codes are not compatible with multithreading.
For some reasons, it even took more time than serial version when the length of observations is relatively big.
the length of the grid is about 10000

here is the length of observations in 3d coordinates

for i in 1:12
println(length(cell_coor[i]));
end

1349
1582
652
1915
2585
4256
2416
3632
2722
5782
5276
5345

single thread version
function single_KDE_(cs::Vector{Vector{Float64}},cs_grid::Vector{Vector{Float64}},
dims::Vector{ContinuousDim},bws::Vector{Float64})
kde = KDEMulti(dims, bws, cs)
projection = zeros(Float64,length(cs_grid))

    for i in 1:length(cs_grid)
        projection[i]= MultiKDE.pdf(kde,cs_grid[i])
    end
projection

end

for i in 1:12
@time single_KDE_(cell_coor[i],wbg,dims,bws);
end

7.664427 seconds (91.29 M allocations: 3.383 GiB, 5.24% gc time, 16.62% compilation time)
7.908790 seconds (103.59 M allocations: 3.761 GiB, 4.39% gc time, 8.71% compilation time)
3.376405 seconds (43.29 M allocations: 1.582 GiB, 4.39% gc time, 11.66% compilation time)
9.388097 seconds (125.17 M allocations: 4.546 GiB, 3.92% gc time, 9.63% compilation time)
13.182955 seconds (168.77 M allocations: 6.113 GiB, 4.04% gc time, 11.63% compilation time)
19.032727 seconds (277.10 M allocations: 10.025 GiB, 4.48% gc time, 0.15% compilation time)
12.208769 seconds (157.81 M allocations: 5.708 GiB, 4.04% gc time, 11.19% compilation time)
18.432195 seconds (236.65 M allocations: 8.561 GiB, 3.51% gc time, 13.57% compilation time)
13.631460 seconds (177.65 M allocations: 6.433 GiB, 3.68% gc time, 11.73% compilation time)
25.260157 seconds (376.03 M allocations: 13.608 GiB, 3.95% gc time, 0.09% compilation time)
22.995075 seconds (343.23 M allocations: 12.423 GiB, 3.88% gc time, 0.10% compilation time)
23.216559 seconds (347.70 M allocations: 12.588 GiB, 3.81% gc time, 0.17% compilation time)

multi thread version
function single_KDE_th(cs::Vector{Vector{Float64}},cs_grid::Vector{Vector{Float64}},
dims::Vector{ContinuousDim},bws::Vector{Float64})
kde = KDEMulti(dims, bws, cs)
projection = zeros(Float64,length(cs_grid))

    @threads for i in 1:length(cs_grid)
        projection[i]= MultiKDE.pdf(kde,cs_grid[i])
    end
projection

end

for i in 1:12
@time single_KDE_th(cell_coor[i],wbg,dims,bws);
end

5.042971 seconds (88.41 M allocations: 3.211 GiB, 87.86% gc time)
2.400278 seconds (103.51 M allocations: 3.757 GiB, 66.58% gc time)
0.299331 seconds (43.21 M allocations: 1.578 GiB)
3.085649 seconds (125.10 M allocations: 4.542 GiB, 72.25% gc time)
16.860719 seconds (168.69 M allocations: 6.109 GiB, 88.83% gc time)
54.170797 seconds (277.01 M allocations: 10.020 GiB, 93.90% gc time)
54.794265 seconds (157.73 M allocations: 5.704 GiB, 97.01% gc time)
36.269803 seconds (236.56 M allocations: 8.556 GiB, 93.03% gc time)
51.954300 seconds (177.57 M allocations: 6.429 GiB, 96.32% gc time)
29.745716 seconds (375.94 M allocations: 13.603 GiB, 85.52% gc time)
33.432218 seconds (343.14 M allocations: 12.418 GiB, 87.29% gc time)
34.831675 seconds (347.61 M allocations: 12.584 GiB, 87.29% gc time)


I am not familiar with multithreading and just getting started to learn, so I don't know what's going on here.

I use AMD 5950x (16 cores, so 32 threads) and tested the simple multithreading with mandelbrot set, and it works well.

Still maintained?

Is this package still maintained? I would like to use it but am concerned about the failing CI.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.