Git Product home page Git Product logo

arviz.jl's People

Contributors

canyon289 avatar dependabot[bot] avatar github-actions[bot] avatar juliatagbot avatar oriolabril avatar pitmonticone avatar sethaxen avatar torfjelde avatar yiyuezhuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

arviz.jl's Issues

Support conversion of Gen traces

Gen is a PPL for compositional inference. Each draw from a Gen model is stored in a Gen.Trace. Values from the trace are extracted into a Gen.ChoiceMap. We could add a converter for either of these types (not sure which is best).

I'm just not certain if ArviZ will be that useful for the Gen approach. Gen models seem to be highly customized, and I imagine that in general the plots will be too. Plus, the choices can have a hierarchical structure, which wouldn't map directly to an InferenceData.

I'll leave this issue open for input.

DimensionMismatch("tried to assign 4 elements to 1 destinations")

The quickstart says

posterior_predictive = predict(param_mod_predict, turing_chns)

but that produces

DimensionMismatch("tried to assign 4 elements to 1 destinations")

Stacktrace:
 [1] throw_setindex_mismatch(::Array{Float64,1}, ::Tuple{Int64}) at ./indices.jl:191
 [2] setindex_shape_check at ./indices.jl:242 [inlined]
 [3] setindex! at ./array.jl:872 [inlined]
 [4] setval!(::DynamicPPL.VarInfo{NamedTuple{(:μ, :τ, :θ, :y),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:μ,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:μ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:τ,Tuple{}},Int64},Array{Truncated{Cauchy{Float64},Continuous,Float64},1},Array{DynamicPPL.VarName{:τ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::Array{Float64,1}, ::DynamicPPL.VarName{:μ,Tuple{}}) at julia/packages/DynamicPPL/MRwtL/src/varinfo.jl:282
 [5] (::Turing.Inference.var"#100#101"{DynamicPPL.SampleFromPrior,DynamicPPL.Model{var"###evaluator#271",(:J, :y, :σ, :TV),Tuple{Int64,Array{Missing,1},Array{Float64,1},Type{Array{Float64,1}}},,DynamicPPL.ModelGen{var"###generator#272",(:J, :y, :σ, :TV),(:TV,),Tuple{Type{Array{Float64,1}}}}},Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{,Tuple{}}},DynamicPPL.VarInfo{NamedTuple{(:μ, :τ, :θ, :y),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:μ,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:μ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:τ,Tuple{}},Int64},Array{Truncated{Cauchy{Float64},Continuous,Float64},1},Array{DynamicPPL.VarName{:τ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}})(::Int64) at julia/packages/Turing/GMBTf/src/inference/Inference.jl:756
 [6] iterate at ./generator.jl:47 [inlined]
 [7] _collect(::UnitRange{Int64}, ::Base.Generator{UnitRange{Int64},Turing.Inference.var"#100#101"{DynamicPPL.SampleFromPrior,DynamicPPL.Model{var"###evaluator#271",(:J, :y, :σ, :TV),Tuple{Int64,Array{Missing,1},Array{Float64,1},Type{Array{Float64,1}}},,DynamicPPL.ModelGen{var"###generator#272",(:J, :y, :σ, :TV),(:TV,),Tuple{Type{Array{Float64,1}}}}},Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{,Tuple{}}},DynamicPPL.VarInfo{NamedTuple{(:μ, :τ, :θ, :y),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:μ,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:μ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:τ,Tuple{}},Int64},Array{Truncated{Cauchy{Float64},Continuous,Float64},1},Array{DynamicPPL.VarName{:τ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./array.jl:699
 [8] collect_similar(::UnitRange{Int64}, ::Base.Generator{UnitRange{Int64},Turing.Inference.var"#100#101"{DynamicPPL.SampleFromPrior,DynamicPPL.Model{var"###evaluator#271",(:J, :y, :σ, :TV),Tuple{Int64,Array{Missing,1},Array{Float64,1},Type{Array{Float64,1}}},,DynamicPPL.ModelGen{var"###generator#272",(:J, :y, :σ, :TV),(:TV,),Tuple{Type{Array{Float64,1}}}}},Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{,Tuple{}}},DynamicPPL.VarInfo{NamedTuple{(:μ, :τ, :θ, :y),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:μ,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:μ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:τ,Tuple{}},Int64},Array{Truncated{Cauchy{Float64},Continuous,Float64},1},Array{DynamicPPL.VarName{:τ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:θ,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:y,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}) at ./array.jl:628
 [9] map(::Function, ::UnitRange{Int64}) at ./abstractarray.jl:2162
 [10] transitions_from_chain(::DynamicPPL.Model{var"###evaluator#271",(:J, :y, :σ, :TV),Tuple{Int64,Array{Missing,1},Array{Float64,1},Type{Array{Float64,1}}},,DynamicPPL.ModelGen{var"###generator#272",(:J, :y, :σ, :TV),(:TV,),Tuple{Type{Array{Float64,1}}}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{,Tuple{}}}; sampler::DynamicPPL.SampleFromPrior) at julia/packages/Turing/GMBTf/src/inference/Inference.jl:748
 [11] predict(::DynamicPPL.Model{var"###evaluator#271",(:J, :y, :σ, :TV),Tuple{Int64,Array{Missing,1},Array{Float64,1},Type{Array{Float64,1}}},,DynamicPPL.ModelGen{var"###generator#272",(:J, :y, :σ, :TV),(:TV,),Tuple{Type{Array{Float64,1}}}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{,Tuple{}}}; include_all::Bool) at julia/packages/Turing/GMBTf/src/inference/Inference.jl:674
 [12] predict(::DynamicPPL.Model{var"###evaluator#271",(:J, :y, :σ, :TV),Tuple{Int64,Array{Missing,1},Array{Float64,1},Type{Array{Float64,1}}},,DynamicPPL.ModelGen{var"###generator#272",(:J, :y, :σ, :TV),(:TV,),Tuple{Type{Array{Float64,1}}}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{,Tuple{}}}) at julia/packages/Turing/GMBTf/src/inference/Inference.jl:671
 [13] top-level scope at In[18]:5
 [14] include_string(::Function, ::Module, ::String, ::String) at /nix/store/l899vbb7z6w01xl2mkn7xzc9s8bx2msp-julia-1.5.2/lib/julia/sys.so:?
 [15] execute_code(::String, ::String) at julia/packages/IJulia/a1SNk/src/execute_request.jl:27
 [16] execute_request(::ZMQ.Socket, ::IJulia.Msg) at julia/packages/IJulia/a1SNk/src/execute_request.jl:86
 [17] #invokelatest#1 at ./essentials.jl:710 [inlined]
 [18] invokelatest at ./essentials.jl:709 [inlined]
 [19] eventloop(::ZMQ.Socket) at julia/packages/IJulia/a1SNk/src/eventloop.jl:8
 [20] (::IJulia.var"#15#18")() at ./task.jl:356

Better reproducibility wrapping Python ArviZ

One of the ways the Julia ecosystem achieves reproducibility even with binary dependencies is by defining glue packages that wrap the binaries and are tagged for specific versions. e.g. https://github.com/JuliaBinaryWrappers/wget_jll.jl. This way Julia packages can specify exact supported versions for these dependencies.

I'm considering doing something like that here until #130 is completely resolved. Specifically, we would create an ArviZ_py.jl (or PyArviZ) package that only depends on PyCall. v0.12.0 of the package would install v0.12.0 of the ArviZ Python package upon loading, and upon start-up it would check that the exact Python ArviZ version is installed and otherwise prompt the user to update (what we currently do in this package). When a new Python ArviZ release is made, a new release is tagged there, and then we only update our compat here when we fully support that version (and drop compat for versions we don't support). As an added bonus, we maybe could ensure that on the main branch, the main branch of Python ArviZ is always installed, so testing Julia ArviZ against future releases of Python ArviZ would be as easy as

] add ArviZ, ArviZ_py#main

Since Python ArviZ releases are relatively rare, this would not be much of an additional maintenance burden, and it would let us be at least more reproducible than we currently are.

Hook into Plots.jl and/or Makie.jl

Currently ArviZ.jl only supports ArviZ's matplotlib backend (using PyPlot.jl) and partially supports its Bokeh backend. ArviZ.jl should hook into Plots.jl for several reasons:

  • better interop with Julia packages. The plots objects would be native Julia objects and could be easily modified or included in existing figures using the Plots.jl API.
  • instant access to a multitude of backends, including the interactive PlotlyJS.jl backend, publication-quality plots with PGFPlotsX.jl, plots in the REPL with UnicodePlots.jl, and, of course, PyPlot.jl.

Makie.jl is a newer package that offers highly performant, interactive plots. Both packages use a recipe system for designing new plots. Plots.jl recipes can be consumed by Makie.jl, but the interactivity/reactivity is lost.

There are a few ways we could go about doing this.

  1. Add a new "backend" to ArviZ (the Python package) that, instead of plotting, just returns all data necessary to create a plot. We would then define recipes that always call the Python plotting function with this backend and then implement the same plot in the Plots API.
  2. Reimplement the plots from scratch, calling the necessary internal functions in the ArviZ Python package or Julia equivalents to perform the necessary analyses.

Both of these options have a significant drawback: requiring more maintenance time for this package.

The first option requires a lot of work upfront on the Python side, but very little on the Julia side. However, it will ultimately require more work to maintain on the Julia side, as changes to Python ArviZ can change the data returned and potentially break the plot for Julia. And unlike with Julia package dependencies, we cannot use version bounds to enforce that only mutually compatible versions of the Julia and Python packages are installed. This has made keeping the packages in sync challenging.

The second option requires more work upfront but will ultimately be easier to maintain. However, as new plotting options are added to ArviZ, work would be needed to support them over here, and gradually the two packages could become out-of-sync in the features they support.

For now I think the way forward is to experiment with the second option for very simple plots and see if the same component recipes can be trivially combined to construct the more elaborate plots. I'm interested in any alternative suggestions though.

Additions and changes to API functions

ArviZ has added the following API functions:

  • plot_bpv
  • plot_dist_comparison

It's also renamed the following API functions:

  • plot_hpd -> plot_dist_hdi plot_hdi
  • hpd -> hdi

We just need to add wrappers for the former and rename wrappers for the latter. Since the latter is a breaking change, it will require a minor version bump.

Add `from_turing` and `from_soss` to simplify getting auxiliary groups

As demonstrated in the Quickstart, generating prior, prior predictive, and posterior predictive samples for Turing and Soss, as well as log-likelihood values for Turing, is mostly boiler plate code that can be copied from the Quickstart by the user. We could eliminate the boiler plate for the user by adding from_turing and from_soss converters that take posterior samples as the first argument and an optional model keyword, e.g.

julia> from_turing(chains; model=param_mod, rng=Random.default_rng());

julia> from_soss(chain_or_multichain; model=param_mod, rng=Random.default_rng());

arviz.from_pymc3 takes a similar model keyword, which (I think) it only uses to compute log-likelihoods.

cc @cscherrer, @torfjelde

Increase Julia lower bound to 1.6

Julia 1.6 is now the LTS. Already, we cannot test on versions earlier than 1.5, since the SampleChains/SampleChainsDynamicHMC dependencies are only available for 1.5. Since this package is user-facing, not dev-facing, we might as well drop compatibility for pre-1.6 versions. If critical bugs arise, we can backport if necessary.

Changes for new org

ArviZ.jl is now hosted under the arviz-devs organization. 🎉

GitHub already forwarded the URL, but a few housekeeping items need to be handled:

If I'm missing anything, feel free to comment.

Add required_groups argument to convert_to_inference_data

Most ArviZ methods call convert_from_inference_data on the inputs, which allows objects returned by PPLs to be passed directly to these methods. However, there's some flexibility in what groups we might generate from provided objects. e.g. #133 provides a converter from a DynamicPPL.Model and MCMCChains.Chains (outputs of Turing modeling), from which most groups can be generated. But a user may not want to generate all groups.

If we add a required_groups argument to convert_to_inference_data, then the user could use this to specify which groups are generated from the PPL returns. More usefully, our methods could use it to specify which groups they need, raising a useful error if those groups cannot be generated from inputs. So e.g. loo would call convert_to_inference_data(input; required_groups=[:log_likelihood]).

Implementing this would require more complicated wrappers for function we currently forward to Python, but it wouldn't be a lot of work. Then we could merge #133.

Removing MonteCarloMeasurements support

We added have conversions from MonteCarloMeasurements.Particles to InferenceData in the early days of the package to support Soss, which used Particles as its storage type. This is no longer the case though, and I'm not aware of any PPLs using Particles. If this functionality is unused, I'd prefer to remove it so we don't need to maintain it.

@cscherrer @baggepinnen do either of you use this functionality or know of someone who does?

Don't run CmdStan in test

Running CmdStan in the test requires us to download and compile cmdstan, which roughly doubles the runtime of the test suite. Perhaps we can check in the cmdstan output files and read them in using CmdStan's functions to get the Chains.

Support ArviZ's Bokeh backend

ArviZ plots are gradually getting a Bokeh backend in addition to the existing matplotlib backend. It would be nice if we could support this here. Specifically, we would like Bokeh plots to appear in interactive contexts instead of just the HTML file they currently are saved in.

Bokeh.jl is pre-1.0 and discontinued, so we would need to find our own ideally minimal solution.

from_namedtuple seems to ignore coords and dims

Either when using only dims or when using both dims and coords argument, from_namedtuple seems to ignore these arguments for the groups observed_data and constant_data. All the other groups are converted to xarray datasets correctly.

Example with dims only

print(
    from_namedtuple(
        observed_data=(y = rand(Float64, (5, 10)),), 
        dims=Dict("y" => ["subject", "experiment"])
    ).observed_data
)

Output

Dataset (xarray.Dataset)
Dimensions:  (y_dim_0: 5, y_dim_1: 10)
Coordinates:
  * y_dim_0  (y_dim_0) int64 0 1 2 3 4
  * y_dim_1  (y_dim_1) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    y        (y_dim_0, y_dim_1) float64 0.2093 0.6207 0.3961 ... 0.955 0.4163
Attributes:
    created_at:     2020-10-22T15:10:34.734993
    arviz_version:  0.10.0

Expected output

Dataset (xarray.Dataset)
Dimensions:  (subject: 5, experiment: 10)
Coordinates:
  * y_dim_0  (subject) int64 0 1 2 3 4
  * y_dim_1  (experiment) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    y        (subject, experiment) float64 0.2093 0.6207 0.3961 ... 0.955 0.4163
Attributes:
    created_at:     2020-10-22T15:10:34.734993
    arviz_version:  0.10.0

Example with both arguments

In this case, coords are added to the dataset but not used, they are not completely ignored.

print(
    from_namedtuple(
        observed_data=(y = rand(Float64, (5, 10)),), 
        dims=Dict("y" => ["subject", "experiment"]),
        coords=Dict("subject" => ["a", "b", "c", "d", "e"])
    ).observed_data
)

Output

Dataset (xarray.Dataset)
Dimensions:  (subject: 5, y_dim_0: 5, y_dim_1: 10)
Coordinates:
  * y_dim_0  (y_dim_0) int64 0 1 2 3 4
  * y_dim_1  (y_dim_1) int64 0 1 2 3 4 5 6 7 8 9
  * subject  (subject) <U1 'a' 'b' 'c' 'd' 'e'
Data variables:
    y        (y_dim_0, y_dim_1) float64 0.3847 0.0252 0.71 ... 0.6309 0.3035
Attributes:
    created_at:     2020-10-22T15:16:28.209845
    arviz_version:  0.10.0

Expected output

Dataset (xarray.Dataset)
Dimensions:  (subject: 5, experiment: 10)
Coordinates:
  * experiment  (experiment) int64 0 1 2 3 4 5 6 7 8 9
  * subject  (subject) <U1 'a' 'b' 'c' 'd' 'e'
Data variables:
    y        (subject, experiment) float64 0.3847 0.0252 0.71 ... 0.6309 0.3035
Attributes:
    created_at:     2020-10-22T15:16:28.209845
    arviz_version:  0.10.0

Don't build Quickstart with docs

Currently several PRs are halted because the documentation build stalls when building quickstart. It's hard to tell what the culprit here is, since Documenter doesn't output very detailed debugging logs. It would be good to pre-build Quickstart and periodically update it manually, similarly to how ArviZ does it.

Unfortunately, Documenter has no way to integrate Jupyter notebooks or pre-built HTML files, so for the moment, this will probably require linking to a Jupyter notebook in the repo or in nbviewer.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Make ArviZ.jl more Julian

Currently ArviZ.jl is mostly a wrapper of the Python package arviz, with key additional functionality to support some of Julia's PPLs. But our goal is that ArviZ.jl is also an excellent package for the Julia community. This is a super-issue to track various initiatives to meet this goal:

  • Use Julia's plotting packages for plotting (#108)
  • Provide native Julia implementation of InferenceData (#128)
  • Separate stats and diagnostics in their own lightweight package (#129)

[Security] Workflow format_check.yml is using vulnerable action actions/checkout

The workflow format_check.yml is referencing action actions/checkout using references v1. However this reference is missing the commit a6747255bd19d7a757dbdda8c654a9f84db19839 which may contain fix to the some vulnerability.
The vulnerability fix that is missing by actions version could be related to:
(1) CVE fix
(2) upgrade of vulnerable dependency
(3) fix to secret leak and others.
Please consider to update the reference to the action.

Support new arviz v0.10.0 features

Relevant entries from the changelog:
New plots:

  • Added plot_separation (1359)

Methods for InferenceData:

  • Added to_dict method for InferenceData object (1223)
  • Extended methods from xr.Dataset to InferenceData (1254)
  • Add extend and add_groups to InferenceData (1300 and 1386)
  • Added __iter__ method (.items) for InferenceData (1356)

New examples:

  • Add more plot_parallel examples (1380)

These can already be accessed with the forwarded dot syntax (i.e. idata.to_dict() already works), but we need to figure out if any of these should also be overloads of Base functions.

Improve Stan conversion and docs

Currently we convert Stan's internal statistics names to the ones used by arviz.from_cmdstan. However, those names don't follow ArviZ's schema and will eventually be updated on the ArviZ side. We should update them for Stan.jl and CmdStan.jl.

We should also switch the tests and docs from CmdStan.jl to Stan.jl, since that's the future of Julia's Stan integration.

Remove Pandas.jl as dependency

Pandas.jl as a dependency increases our load time quite a bit. Now that #35 has been merged, we actually don't need most of the wrappers.

A preview of potential speed-ups:

julia> @time using ArviZ
 14.766538 seconds (22.23 M allocations: 1.219 GiB, 4.20% gc time)

Removing Pandas as a dependency but loading the pandas Python module ourselves nearly halves the load time:

julia> @time using ArviZ
  7.580144 seconds (9.58 M allocations: 502.910 MiB, 3.46% gc time)

Error creating InferenceData from Turing Chains with extra info

Currently it seems that if some objects are in the Turing info, we can't map these to the InferenceData info.

using ArviZ, Turing

julia> @model function foo()
    x ~ Normal()
end
foo (generic function with 1 method)

julia> chn = sample(foo(),NUTS(),200); # this is fine

julia> chn.info
NamedTuple()

julia> from_mcmcchains(chn)
InferenceData with groups:
	> posterior
	> sample_stats

julia> chn = sample(foo(),NUTS(),200,;save_state=true) # this will error

julia> chn.info
(model = DynamicPPL.Model{var"#3#4", (), (), (), Tuple{}, Tuple{}}(:foo, var"#3#4"(), NamedTuple(), NamedTuple()), sampler = DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.DiagEuclideanMetric}}(NUTS{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.DiagEuclideanMetric}(-1, 0.65, 10, 1000.0, 0.0), DynamicPPL.Selector(0x00016a8da36513f2, :default, false)), samplerstate = Turing.Inference.HMCState{DynamicPPL.TypedVarInfo{NamedTuple{(:x,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, Tuple{}}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, AdvancedHMC.NUTS{AdvancedHMC.MultinomialTS, AdvancedHMC.GeneralisedNoUTurn, AdvancedHMC.Leapfrog{Float64}, Float64}, AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, Turing.Inference.var"#logπ#54"{DynamicPPL.TypedVarInfo{NamedTuple{(:x,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, Tuple{}}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.Model{var"#3#4", (), (), (), Tuple{}, Tuple{}}}, Turing.Inference.var"#∂logπ∂θ#53"{DynamicPPL.TypedVarInfo{NamedTuple{(:x,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, Tuple{}}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.Model{var"#3#4", (), (), (), Tuple{}, Tuple{}}}}, AdvancedHMC.PhasePoint{Vector{Float64}, AdvancedHMC.DualValue{Float64, Vector{Float64}}}, AdvancedHMC.Adaptation.StanHMCAdaptor{AdvancedHMC.Adaptation.WelfordVar{Float64, Vector{Float64}}, AdvancedHMC.Adaptation.NesterovDualAveraging{Float64}}}(DynamicPPL.TypedVarInfo{NamedTuple{(:x,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, Tuple{}}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}((x = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, Tuple{}}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(x => 1), [x], UnitRange{Int64}[1:1], [0.007224315165188178], Normal{Float64}[Normal{Float64}(μ=0.0, σ=1.0)], Set{DynamicPPL.Selector}[Set([DynamicPPL.Selector(0x00016a8da36513f2, :default, false)])], [0], Dict{String, BitVector}("del" => [0], "trans" => [1])),), Base.RefValue{Float64}(-0.9189646285694758), Base.RefValue{Int64}(0)), 299, NUTS{MultinomialTS,Generalised}(integrator=Leapfrog(ϵ=1.43), max_depth=10), Δ_max=1000.0), Hamiltonian(metric=DiagEuclideanMetric([1.0])), AdvancedHMC.PhasePoint{Vector{Float64}, AdvancedHMC.DualValue{Float64, Vector{Float64}}}([0.007224315165188178], [-0.5308342150394731], AdvancedHMC.DualValue{Float64, Vector{Float64}}(-0.9189646285694758, [0.007224315165188178]), AdvancedHMC.DualValue{Float64, Vector{Float64}}(-0.14089248192828682, [-0.5308342150394731])), StanHMCAdaptor(
    pc=WelfordVar,
    ssa=NesterovDualAveraging=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.425166901462951),
    init_buffer=75, term_buffer=50, window_size=25,
    state=window(76, 50), window_splits()
)))

julia> from_mcmcchains(chn)
ERROR: PyError ($(Expr(:escape, :(ccall(#= /Users/sethaxen/.julia/packages/PyCall/L0fLP/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("cannot pickle 'PyCall.jlwrap' object")
  File "/Users/sethaxen/.julia/conda/3/lib/python3.8/site-packages/arviz/data/inference_data.py", line 1837, in concat
    args_groups[group] = deepcopy(group_data) if copy else group_data
  File "/Users/sethaxen/.julia/conda/3/lib/python3.8/copy.py", line 153, in deepcopy
    y = copier(memo)
  File "/Users/sethaxen/.julia/conda/3/lib/python3.8/site-packages/xarray/core/dataset.py", line 1425, in __deepcopy__
    return self.copy(deep=True)
  File "/Users/sethaxen/.julia/conda/3/lib/python3.8/site-packages/xarray/core/dataset.py", line 1322, in copy
    attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
  File "/Users/sethaxen/.julia/conda/3/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/sethaxen/.julia/conda/3/lib/python3.8/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/sethaxen/.julia/conda/3/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)

We should probably filter the info on our end before InferenceData creation so that these errors can't happen.

Support conversion of NamedTuple-based samples to InferenceData

Several PPLs/samplers that don't use MCMCChains.Chains to store samples (e.g. DynamicHMC.jl and Soss.jl) instead return the samplers as a vector of NamedTuples (or a vector of such vectors if multiple chains are obtained). We should provide functionality for converting these data types into InferenceData. As a side-effect, this would add more support for Turing, since currently posterior predictive samples from Turing will usually be in NamedTuples instead of Chains.

Return dataframes from DataFrames instead of Pandas

Pandas.DataFrames are functional but nonintuitive and don't render usefully in Jupyter notebooks (e.g. not all columns are shown; should be addressed by JuliaPy/Pandas.jl#63). More importantly, Pandas.jl doesn't appear to be actively maintained. We should instead convert to DataFrames.DataFrame, which will be more useful in Julia and render correctly in HTML.

The only two challenges here are 1) converting the index to a column and 2) some columns have numpy datatypes that aren't converted to Julia types by PyCall.

ArviZ fails to load when scipy.fft is imported

Our daily CI runs started spontaneously failing 14 days ago, i.e. going from https://github.com/arviz-devs/ArviZ.jl/runs/6517258108, which passes to https://github.com/arviz-devs/ArviZ.jl/runs/6533529487, which fails. It fails upon precompilation of the package and emits the error message:

The Python package arviz could not be imported by pyimport. Usually this means
that you did not install arviz in the Python version being used by PyCall.

No commits were made to ArviZ between these runs. Both runs use the same operating system version, same Julia and Python versions, and same PyCall and Conda versions. There were no new releases to Python ArviZ in that time period.

Allow specifying variable eltype in converters

Currently, ArviZ uses the dtype of the variables to infer whether they are continuous and discrete for plotting. However, it's common for samples to be stored using a single eltype. For example, MCMCChains.Chains stores all data as floats. In cases like this, ArviZ will incorrectly interpret discrete data as continuous and do the wrong thing (see below example). As a result, all of our converters, but especially from_mcmcchains, should take a keyword argument that can be used to specify the eltype of any variables, with the fallback being the Julia eltype. e.g. from_mcmcchains(chains; eltypes=Dict(:y => Int64))

See arviz-devs/arviz#1632 for a similar proposal for ArviZ's Stan converters.

Example

using Turing, ArviZ
@model dice_throw(y) = begin
    p ~ Dirichlet(6, 1)
    for i in eachindex(y)
        y[i] ~ Categorical(p)
    end
end
data = [5, 5, 4, 5, 4, 4, 6, 6, 2, 6]
model = dice_throw(data)
chain = sample(model, NUTS(), MCMCThreads(), 2_000, 4)
model_predict = dice_throw(similar(data, Missing))
posterior_predictive = predict(model_predict, chain)

integer data encoded as Float64 interpreted as continuous:

idata = from_mcmcchains(
    chain;
    posterior_predictive=posterior_predictive,
    observed_data=Dict("y" => data),
    library="Turing",
)
plot_ppc(idata; data_pairs=Dict("y" => "y"), num_pp_samples=100)

tmp

integer data encoded as Int64 interpreted as discrete:

idata = from_mcmcchains(
    chain;
    posterior_predictive=Dict("y" => Array{Int}(permutedims(posterior_predictive.value, (3, 1, 2)))),
    observed_data=Dict("y" => data),
    library="Turing",
)
plot_ppc(idata; data_pairs=Dict("y" => "y"), num_pp_samples=100)

tmp2

Make BokehPlot compatible with FileIO

Bokeh support for now is limited to visualization and interaction with the plot that ArviZ generates. Without implementing a wrapper for Bokeh, which is beyond the scope of this package, we won't be able to support more complex interaction with the plots.

However, we can enable saving of BokehPlot images to image files using FileIO.jl, which would look like

plt = plot_trace(idata; backend = :bokeh)
save("plot.png", plt)

PyPlot.jl already has its own support for saving PyPlot images that wraps pyplot.savefig.

Stackoverflow errors when converting chain with large-dimensional parameter vector to idata object

So I found that when using a relatively large dimensional parameter (specifically a 8x131, resulting in 1048 parameters for a single parameter group), I get a stackoverflow error resulting from trying to convert the chain to an idata object.

I found a way to reproduce it:

stackoverflowarray = [MCMCChains.AxisArray(rand(500,4)) for i in 1:1048]
cat(stackoverflowarray...; dims=3)

Which comes from this line in the library essentially:

oldarr = reshape_values(replacemissing(convert(Array, cat(var_views...; dims=3))))

where var_views is similar to my stackoverflowarray above

In reality, it somehow seems caused by the AxisArray, because leave out the AxisArray part in my reproduction example and it doesn't happen.
Similarly, replacing the line:

var_views = (@view(chns.value[:, n, :]) for n in loc_names)

with

var_views = (Array(@view(chns.value[:, n, :])) for n in loc_names)

seems to fix it as well.

Support recent Soss versions

Soss is currently at v0.18.3, but our docs currently only support v0.14. We should update to support v0.18 and ensure that the converter still works. This will involve adding support for Soss's SampleChains.jl. @cscherrer noted on Slack that some of the SampleChains functionality will be moved into TupleVectors.jl, so it's probably best to wait until that has happened to work on this.

Implement the Tables interface for Dataset

There are pros and cons to implementing the Tables interface for Dataset.

Pros:

  • Primarily: users can instantly use the Tables-compatible @df macro in StatsPlots or data wrapper in AlgebraOfGraphics to construct plots.
  • Users can easily convert a Dataset to any Tables sink, such as a DataFrames.DataFrame, and if we implement the interface to make Dataset a Tables sink, then users can also construct a Dataset from anything else that implements the Tables interface.

Cons:

  • Unlike MCMCChains.Chains, which also implements the Tables interface, data stored in a Dataset is typically inherently multidimensional. To flatten it into a table structure, we would need to discard that multidimensional information. From this perspective, it would be better for plotting purposes to have something like Tables that is friendly to multidimensional data and can use the coords and dims we store.
  • This will cause Pluto.jl to override our custom HTML representation for Dataset, since it checks for the Tables interface early in the pipeline.

Docs build times out after 6 hours

Between 6 and 23 days ago, the docs build began hannging until the workflow is terminated at 6 hours. It looks like this happens when it encounters the Turing.pointwise_loglikelihoods command in the Quickstart https://github.com/arviz-devs/ArviZ.jl/blame/v0.5.0/docs/src/quickstart.md#L177. When I run the Turing example locally and reach that line, my terminal fills up with many copies of the following warning:

┌ Warning: the following keys were not found in `vi`, and thus `kernel!` was not applied to these: ["acceptance_rate", "hamiltonian_energy", "hamiltonian_energy_error", "is_accept", "log_density", "lp", "max_hamiltonian_energy_error", "n_steps", "nom_step_size", "numerical_error", "step_size", "tree_depth"]

This message was added in TuringLang/DynamicPPL.jl#216, which was released in DynamicPPL v0.10.9 19 days ago, so it seems likely that these warnings are causing the hang (sometimes Documenter can't handle many warnings or print statements). Pinning DynamicPPL to v0.10.8 causes the docs build to succeed.

@torfjelde is there something that we need to change about the Quickstart example to avoid all of these warnings? Or is this a DynamicPPL bug?

Implement stats and diagnostics in new lightweight Julia package

ArviZ's goal is to "...provide backend-agnostic tools for diagnostics and visualizations of Bayesian inference...". I would add that we want everyone to critique their models with the best available algorithms. Not every PPL user is going to use ArviZ.jl, but we want them to all use our algorithms.

A very Julian way to handle this problem is to reimplement some subset of the stats and diagnostics functionality in the Python arviz package in a lightweight, standalone Julia package, which we can then depend on. Then packages like MCMCChains or Soss could depend on this package to provide any custom overloads for their sample storage types, so that their users are directly encouraged to use these algorithms.

While such a package is within the scope of other PPLs (see TuringLang/MCMCChains.jl#266), arviz has a robust developer community devoted explicitly to implementing/maintaining the best available algorithms for these tasks (as of this writing, arviz has had 80 contributors, the same number as Turing and Soss combined). So there is a good argument for the project being hosted by the arviz-devs org and used by Julia's PPLs.

Remove ArviZ.jl-specific logo

We adopted our own logo early on to distinguish ourselves from the Python package. However, since then ArviZ has become a multi-language project, and the logo for the Python package has been adopted as the logo for the org. As discussed on Slack, we should drop our Julia-specific logo.

Adapt ArviZ docstrings

Currently most of the API functions simply wrap their Python counterparts and forward the docstrings for use in the Julia REPL, e.g.

julia> using ArviZ
? plot_parallel

This works for the REPL but not for generating documentation.

Moreover, the Python docstrings are in ReST format, which Documenter can't format well, and the code examples, style, and type are Python specific. In the future, some functions may have slight API changes.

It's probably best going forward to adapt ArviZ's docstrings for the API functions in this package. The downside to this is the need to synchronize docstrings across packages, but for the reasons listed above, I think this is the better option.

Removing bokeh plotting backend

Currently we have no documented examples of the Bokeh backend, and it's unlikely that it's being used by any users, even though it more or less works. But as we move toward pure Julia plotting packages #108 #130, it might be nice to drop official support for the Bokeh backend so we don't need to maintain it.

Prior predictive keyword arg not used in from_mcmcchains

When using from_mcmcchains(chain; prior_predictive=prior_pred_chain), the prior predictive chain is not stored in the resulting InferenceData.

Looking at mcmcchains.jl, it's pretty clear why; the keyword arg is listed in the documentation but I don't see it handled anywhere in the code?

Convert to InferenceData within some functions

Many ArviZ functions (e.g. plotting and stats) take either an InferenceData or an object that can be converted to one with convert_to_inference_data. Since we overload convert_to_inference_data here, to get this same functionality for custom Julia types, we need to call this function on the inputs before calling the wrapped ArviZ function.

ValueError('cannot convert float NaN to integer')

I'm following the Arviz.jl quickstart but

plot_trace(idata)

produces this error:

PyError ($(Expr(:escape, :(ccall(#= julia/packages/PyCall/BcTLp/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'ValueError'>
ValueError('cannot convert float NaN to integer')
  File "result/lib/python3.8/site-packages/arviz/plots/traceplot.py", line 238, in plot_trace
    axes = plot(**trace_plot_args)
  File "result/lib/python3.8/site-packages/arviz/plots/backends/matplotlib/traceplot.py", line 242, in plot_trace
    ax = _plot_chains_mpl(
  File "result/lib/python3.8/site-packages/arviz/plots/backends/matplotlib/traceplot.py", line 467, in _plot_chains_mpl
    axes = plot_dist(
  File "result/lib/python3.8/site-packages/arviz/plots/distplot.py", line 213, in plot_dist
    ax = plot(**dist_plot_args)
  File "result/lib/python3.8/site-packages/arviz/plots/backends/matplotlib/distplot.py", line 86, in plot_dist
    ax = plot_kde(
  File "result/lib/python3.8/site-packages/arviz/plots/kdeplot.py", line 250, in plot_kde
    grid, density = kde(values, circular, bw=bw, adaptive=adaptive, cumulative=cumulative)
  File "result/lib/python3.8/site-packages/arviz/stats/density_utils.py", line 535, in kde
    return kde_fun(x, **kwargs)
  File "result/lib/python3.8/site-packages/arviz/stats/density_utils.py", line 642, in _kde_linear
    grid, pdf = _kde_convolution(x, bw, grid_edges, grid_counts, grid_len, bound_correction)
  File "result/lib/python3.8/site-packages/arviz/stats/density_utils.py", line 774, in _kde_convolution
    kernel_n = int(bw * 2 * np.pi)


Stacktrace:
 [1] pyerr_check at julia/packages/PyCall/BcTLp/src/exception.jl:62 [inlined]
 [2] pyerr_check at julia/packages/PyCall/BcTLp/src/exception.jl:66 [inlined]
 [3] _handle_error(::String) at julia/packages/PyCall/BcTLp/src/exception.jl:83
 [4] macro expansion at julia/packages/PyCall/BcTLp/src/exception.jl:97 [inlined]
 [5] #110 at julia/packages/PyCall/BcTLp/src/pyfncall.jl:43 [inlined]
 [6] disable_sigint at ./c.jl:446 [inlined]
 [7] __pycall! at julia/packages/PyCall/BcTLp/src/pyfncall.jl:42 [inlined]
 [8] _pycall!(::PyCall.PyObject, ::PyCall.PyObject, ::Tuple{InferenceData}, ::Int64, ::PyCall.PyObject) at julia/packages/PyCall/BcTLp/src/pyfncall.jl:29
 [9] _pycall!(::PyCall.PyObject, ::PyCall.PyObject, ::Tuple{InferenceData}, ::Base.Iterators.Pairs{Symbol,Symbol,Tuple{Symbol},NamedTuple{(:backend,),Tuple{Symbol}}}) at julia/packages/PyCall/BcTLp/src/pyfncall.jl:11
 [10] #_#117 at julia/packages/PyCall/BcTLp/src/pyfncall.jl:86 [inlined]
 [11] plot_trace(::InferenceData; backend::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}) at julia/packages/ArviZ/tH1YR/src/utils.jl:129
 [12] plot_trace(::InferenceData) at julia/packages/ArviZ/tH1YR/src/utils.jl:122
 [13] top-level scope at In[117]:1
 [14] include_string(::Function, ::Module, ::String, ::String) at /nix/store/l899vbb7z6w01xl2mkn7xzc9s8bx2msp-julia-1.5.2/lib/julia/sys.so:?
 [15] execute_code(::String, ::String) at julia/packages/IJulia/a1SNk/src/execute_request.jl:27
 [16] execute_request(::ZMQ.Socket, ::IJulia.Msg) at julia/packages/IJulia/a1SNk/src/execute_request.jl:86
 [17] #invokelatest#1 at ./essentials.jl:710 [inlined]
 [18] invokelatest at ./essentials.jl:709 [inlined]
 [19] eventloop(::ZMQ.Socket) at julia/packages/IJulia/a1SNk/src/eventloop.jl:8
 [20] (::IJulia.var"#15#18")() at ./task.jl:356

Julia 1.5.2
ArviZ.jl v0.4.8

Python 3.8
arviz-0.10.0

MCMCChains breaking change cause `MCMCChains.Chains` convert failed

MCMCChains v4.0.0 introduces a break change, in which Symbol is used as variable names instead of String. See:

TuringLang/MCMCChains.jl#197

TuringLang/MCMCChains.jl#185

So now, related Turing code in quickstart will fail. Another MWE:

    using MCMCChains
    using ArviZ

    # Define the experiment
    n_iter = 500
    n_name = 3
    n_chain = 2

    # experiment results
    val = randn(n_iter, n_name, n_chain) .+ [1, 2, 3]'
    val = hcat(val, rand(1:2, n_iter, 1, n_chain))

    # construct a Chains object
    chn = MCMCChains.Chains(val)  # According to version, this may introduce String or Symbol name

    ArviZ.summary(chn)

A PR will follow.

Failure to build Soss/MonteCarloMeasurements in Quickstart

Currently, the docs take nearly 2 hours to build for some reason, nearly all of that time spent on the Quickstart. The Quickstart currently errors while compiling Soss, causing everything afterward to fail. I suspect that that failure is related to the massive slowdown. The error however seems to come from precompiling MonteCarloMeasurements. After calling using Soss, I get the following stacktrace (See for example this build):

ERROR: LoadError: LoadError: UndefVarError: PT not defined
Stacktrace:
 [1] top-level scope at /home/vsts/.julia/packages/MonteCarloMeasurements/bf9PV/src/particles.jl:122
 [2] include at ./boot.jl:328 [inlined]
 [3] include_relative(::Module, ::String) at ./loading.jl:1105
 [4] include at ./Base.jl:31 [inlined]
 [5] include(::String) at /home/vsts/.julia/packages/MonteCarloMeasurements/bf9PV/src/MonteCarloMeasurements.jl:1
 [6] top-level scope at /home/vsts/.julia/packages/MonteCarloMeasurements/bf9PV/src/MonteCarloMeasurements.jl:79
 [7] include at ./boot.jl:328 [inlined]
 [8] include_relative(::Module, ::String) at ./loading.jl:1105
 [9] include(::Module, ::String) at ./Base.jl:31
 [10] top-level scope at none:2
 [11] eval at ./boot.jl:330 [inlined]
 [12] eval(::Expr) at ./client.jl:425
 [13] top-level scope at ./none:3
in expression starting at /home/vsts/.julia/packages/MonteCarloMeasurements/bf9PV/src/particles.jl:122
in expression starting at /home/vsts/.julia/packages/MonteCarloMeasurements/bf9PV/src/MonteCarloMeasurements.jl:79
ERROR: LoadError: Failed to precompile MonteCarloMeasurements [0987c9cc-fe09-11e8-30f0-b96dd679fdca] to /home/vsts/.julia/compiled/v1.3/MonteCarloMeasurements/tHYBD_7eNvK.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1283
 [3] _require(::Base.PkgId) at ./loading.jl:1024
 [4] require(::Base.PkgId) at ./loading.jl:922
 [5] require(::Module, ::Symbol) at ./loading.jl:917
 [6] include at ./boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1105
 [8] include(::Module, ::String) at ./Base.jl:31
 [9] top-level scope at none:2
 [10] eval at ./boot.jl:330 [inlined]
 [11] eval(::Expr) at ./client.jl:425
 [12] top-level scope at ./none:3
in expression starting at /home/vsts/.julia/packages/Soss/2R1Bp/src/Soss.jl:12
┌ Warning: failed to run `@setup` block in src/quickstart.md

@cscherrer and @baggepinnen, have you seen errors like this before, or do you have any idea what could be causing them?

This particular build used Soss v0.10.0 and MonteCarloMeasurements v0.6.4.

Implement pure Julia InferenceData scheme

Currently our InferenceData just wraps arviz.InferenceData for dispatch. This is convenient because it's low maintenance and highly functional, as we get all class methods implemented on the Python side for free. However, the data itself is stuck in xarray Datasets, which is fine for Python users who may already be familiar with xarray, but for Julia users who do not use xarray, it is not as friendly as we would like for actually accessing or modifying the underlying data.

An alternative is to reimplement the InferenceData schema using only Julia packages. This comes with a maintenance cost but has the benefit of improving this package's usability for Julia users. By providing converts to/from the Python implementation of InferenceData, and using PyCall's feature of no-copy array passing, we can construct xarray views of our InferenceData to access the all existing xarray-based functionality.

To implement the schema, we need to choose (at least) one representation for arrays with named dimensions and named keys and for collections of these arrays into datasets with named variables and attributes. The old solution to named dimensions and named keys was AxisArrays.jl, which MCMCChains.jl is built on. This is probably not the best way going forward, as a rich ecosystem of packages that provide either named dimensions or name keys has been developing. See JuliaCollections/AxisArraysFuture#1 for useful discussion.

In particular, it was noted on slack that AxisSets.jl provides a KeyedDataset type that contains collections of KeyedArrays from AxisArrays.jl, which wrap NamedArray from NamedDims.jl. Because AxisSets and NamedDims are going into production at Invenia labs, they are battle tested and are maintained by dedicated teams. It is also more likely then that our users will be familiar with one or more of these packages going forward. Therefore, I propose that our Julian InferenceData is a collection of Dataset objects that wrap AxisSets.jl's KeyedDataset objects.

trouble installing with Julia 1.6.2

The command below from the repo Readme.md is not working with Julia 1.6.2:

PYTHON="" julia -e 'using Pkg; Pkg.add("PyCall"); Pkg.build("PyCall"); Pkg.add("ArviZ");'
julia> using ArviZ
[ Info: Precompiling ArviZ [131c737c-5715-5e2e-ad31-c244f01c1dc7]
INTEL MKL ERROR: dlopen(/Users/blaine/.julia/conda/3/lib/libmkl_intel_thread.dylib, 9): Library not loaded: @rpath/libiomp5.dylib
  Referenced from: /Users/blaine/.julia/conda/3/lib/libmkl_intel_thread.dylib
  Reason: image not found.
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib.
ERROR: Failed to precompile ArviZ [131c737c-5715-5e2e-ad31-c244f01c1dc7] to /Users/blaine/.julia/compiled/v1.6/ArviZ/jl_Ps8GJt.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::Base.TTY, internal_stdout::Base.TTY, ignore_loaded_modules::Bool)
   @ Base ./loading.jl:1385
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1329
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1043
 [5] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:936
 [6] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:923

Error passing results from loo to compare

Originally reported by user uferc on Julia Discourse: https://discourse.julialang.org/t/bayesian-model-comparison-selection/80420

julia> using ArviZ

julia> idata1 = load_arviz_data("centered_eight");

julia> idata2 = load_arviz_data("non_centered_eight");

julia> model_dict = Dict("model1" => idata1, "model2" => idata2)
Dict{String, InferenceData} with 2 entries:
  "model1" => InferenceData with groups:"model2" => InferenceData with groups:…

julia> compare(model_dict)  # passing InferenceData is fine
2×10 DataFrame
 Row │ name    rank   loo       p_loo     d_loo     weight   se       dse        warning  loo_scale 
     │ String  Int64  Float64   Float64   Float64   Float64  Float64  Float64    Bool     String    
─────┼──────────────────────────────────────────────────────────────────────────────────────────────
   1 │ model2      0  -30.6876  0.842156  0.0           1.0  1.3653   0.0          false  log
   2 │ model1      1  -30.8105  0.954222  0.122985      0.0  1.42713  0.0859593    false  log

julia> loo1 = loo(idata1)
1×9 DataFrame
 Row │ loo       loo_se   p_loo     n_samples  n_data_points  warning  loo_i                              pareto_k ⋯
     │ Float64   Float64  Float64   Int64      Int64          Bool     PyObject                           PyObject ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1-30.8105  1.42713  0.954222       2000              8    false  PyObject <xarray.DataArray 'loo_…  PyObject ⋯
                                                                                                   2 columns omitted

julia> loo2 = loo(idata2)
1×9 DataFrame
 Row │ loo       loo_se   p_loo     n_samples  n_data_points  warning  loo_i                              pareto_k ⋯
     │ Float64   Float64  Float64   Int64      Int64          Bool     PyObject                           PyObject ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ -30.6876   1.3653  0.842156       2000              8    false  PyObject <xarray.DataArray 'loo_…  PyObject ⋯
                                                                                                   2 columns omitted

julia> loo_dict = Dict("model1" => loo1, "model2" => loo2)  #  Passing loo results is not 
Dict{String, DataFrames.DataFrame} with 2 entries:
  "model1" => 1×9 DataFrame…
  "model2" => 1×9 DataFrame…

julia> compare(loo_dict)
ERROR: PyError ($(Expr(:escape, :(ccall(#= /home/sethaxen/.julia/packages/PyCall/7a7w0/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError('Encountered error in ic computation of compare.')
  File "/home/sethaxen/.julia/conda/3/lib/python3.9/site-packages/arviz/stats/stats.py", line 173, in compare
    raise e.__class__("Encountered error in ic computation of compare.") from e

The issue is that we convert the arviz.ELPDData on the Julia side to be a DataFrame, but we don't currently convert it back when calling compare, so it encounters a type it doesn't know. We can currently work around this by calling ArviZ.arviz.loo directly, which bypasses the conversion:

julia> loopy1 = ArviZ.arviz.loo(idata1);

julia> loopy2 = ArviZ.arviz.loo(idata2);

julia> loopy_dict = Dict("model1" => loopy1, "model2" => loopy2)
Dict{String, PyCall.PyObject} with 2 entries:
  "model1" => PyObject Computed from 2000 by 8 log-likelihood matrix…
  "model2" => PyObject Computed from 2000 by 8 log-likelihood matrix…

julia> compare(loopy_dict)
2×10 DataFrame
 Row │ name    rank   loo       p_loo     d_loo     weight   se       dse        warning  loo_scale 
     │ String  Int64  Float64   Float64   Float64   Float64  Float64  Float64    Bool     String    
─────┼──────────────────────────────────────────────────────────────────────────────────────────────
   1 │ model2      0  -30.6876  0.842156  0.0           1.0  1.3653   0.0          false  log
   2 │ model1      1  -30.8105  0.954222  0.122985      0.0  1.42713  0.0859593    false  log

Add a gallery

We should add a gallery similar to ArviZ's: https://arviz-devs.github.io/arviz/examples/index.html

While we could (and do) just link to that gallery, ours would show the examples in pure Julia, so that one can simply copy and paste without any modification. It will also point to the bokeh support, which is not currently highlighted.

This would require #59, but I know how to do that now.

Allow variables in log_likelihood group to have different dimensions

According to the InferenceData spec, the log_likelihood group is unique in that the dims or coords of a variable can be different from its dims or coords in observed_data, prior_predictive, and posterior_predictive. e.g. if the likelihood of a variable y is a multivariate normal, then in observed_data, the shape might be (nchains, ndraws, length_y), while in log_likelihood, the shape would be (nchains, ndraws). Currently from_namedtuple and from_mcmcchains don't handle this correctly.

Make ArviZ.jl compatible with arviz v0.9.0

Relevant items from the changelog:

  • Added html_repr of InferenceData objects for jupyter notebooks. (#1217)
  • plot_hdi can now take already computed HDI values (#1241)
  • plot_bpv. A new plot for Bayesian p-values (#1222)

#75 needs to be handled first.

Toward a completely PPL-agnostic Bayesian workflow

This first post is a stream-of-consciousness dump of some ideas I've been tossing around in my head for a few months now. I'll edit it for clarity as needed.

The goal

To construct a small ecosystem of completely PPL-agnostic functions for diagnostics, statistics, and plotting in a typical Bayesian workflow.

The problem

  • Every function needs different things.
  • Each object returned by the PPL offers different things.
  • Our functions have no way of knowing how to interact with what the user provides

Some solutions

  • Have a bunch of glue code overloading our functions for each PPL in some package.
    • Not scalable.
    • Where does the code go? If here, we're curating a list of supported PPLs. If in the PPL, they're endorsing us as the go-to diagnostic package.
    • Increases barrier for new PPLs to be incorporated into users' existing workflows and increases barrier for new users to try out PPLs.
    • Maintenance burden
  • Have a bunch of converters from PPL-returned objects to a unified object we control. (current approach)
    • Shoe-horns users into one way of doing things.
    • Inherently lossy; we discard some things the PPL provides because it doesn't fit our structure, and then it can't be used downstream
  • Design one or more interfaces that are independent of ArviZ and indeendent of the PPLs and give a unified way to query what functionality some PPL-produced object has and use that functionality in some function.

I am becoming more convinced that the latter is what we need.

An InferenceData interface?

There are already a few relevant interfaces in the ecosystem that we should mention.

  • DensityInterface.jl defines an interface for an object that is or has a (log)density function, that allows one to call that density function. Implemented by AbstractPPL (Turing ecosystem) and Distributions. Probably eventually by MeasureTheory (Soss ecosystem)
  • ChangesOfVariables.jl defines an interface for logdetjac adjustment to change variables. Doesn't provide a transform function. Implemented by Bijectors (Turing ecosystem), TransformVariables (Tamas Papp's ecosystem of inference ingredients, currently used by Soss ecosystem)
  • AbstractDifferentiation.jl defines an interface for backend-agnostic automatic differentiation.
  • AbstractPPL.jl defines an interface for PPLs. This is probably the most relevant for us and may in the end provide a lot of what we need. Currently used only by DynamicPPL (Turing ecosystem).
  • ArrayInterface.jl defines interfaces for various array properties.
  • Tables.jl defines an interface for objects that can be represented as a table. In particular, once an object has implemented the interface, then some generic function can detect it as a table and iterate through the rows and columns. Implemented by a zillion packages, including MCMCChains (Turing ecosystem).

What we need is something a little different than all of these, but likely with some overlap that will let us reuse some of these interfaces. An arbitrary diagnostic/statistic/plotting function may need access to one or more of the following (non-exhaustive list):

  • (unconstrained) log density function (e.g. certain variants of LOO)
  • already-generated prior/posterior draws or ability to generate new ones given some sampler object (e.g. ESS)
    • either as array-of-structs, struct-of-arrays, or flattened somehow into a pure numeric array
  • conditionable joint distribution from which samples can be drawn given some sampler object (e.g. SBC)
  • a sampleable predictive distribution given draws of parameters (e.g. SBC)
  • log-likelihood function or values (e.g. LOO)
  • sampling statistics (e.g. divergences plot)
  • raw data (PPC plots)
  • iterator over parameters with names (multidimensional or flattened)
  • iterator over draws
  • iterator over chains

The idea is that all or part of the above interface can be implemented for objects returned by a PPL. e.g. an MCMCChains.Chains object could implement the parts corresponding to raw samples, while a Soss.Model or a DynamicPPL.Model would implement the part corresponding to the log density function, and then the objects could be passed directly to certain functions already. But in many cases one would like some features from both the samples themselves and the model object and the data, so I think what we're looking for is some super-object that ties together these objects to represent the current state of inference and implements the entire interface. Such an object could then be passed to any generic function, which could even update the object to store generated quantities like prior samples and void re-drawing them.

See e.g. TuringLang/AbstractPPL.jl#27 for an example of how one might implement simulation-based calibration (SBC) with just generic interface functions.

Some considerations:

  • An interface is useless if not adopted by a community, so this should ideally be community driven. In all likelihood, the way forward us to push toward one or more of the above interface packages adding the missing features, and only if necessary, we add an additional interface to the list with input from devs of multiple PPLs.
  • Lastly, it's important to note that if such an interface exists, it automatically empowers devs of other plotting/diagnostic packages as well. Users don't need to use ArviZ; they should use whatever reliable tool best fits their workflow/aesthetic, and an interface can help empower other devs as well.
  • Obviously a lot of work is needed to identify the minimal set of functions that are needed for functions implementing the statistical workflow. To facilitate that, I'm working towards generically implementing/modifying 3 such generic diagnostics: LOO, SBC, and ESS/R-hat, along with the corresponding diagnostic plots. Once finished, we can analyze what functionality was needed and try to refine that into a minimal interface.

Order of the multi-level parameter is not preserved when converted into InferenceData

Thank you for this remarkable package...
As a newbie, I'm trying to replicate the centered_eight example from arviz homepage.

I've obtained successfully the posterior sample from the model. (let's call this chain chn_post)
And the I've created an InferenceData from this posterior by from_mcmcchains() function. (=idata)
This idata must contain the above mentioned chn_post as idata.posterior.

Below is the result of summarystats of the two version of the chain.

    summarystats(chn_post)

     parameters      mean       std   naive_se      mcse         ess      rhat    ⋯
   
           μ    4.3999    3.2838     0.0367    0.0976   1334.4223    1.0020    ⋯
           τ    4.3683    3.0589     0.0342    0.1179    713.4324    1.0097    ⋯
        θ[1]    6.6081    5.9446     0.0665    0.1327   2420.8891    1.0019    ⋯
        θ[2]    5.0276    4.9369     0.0552    0.0980   2592.4887    1.0014    ⋯
        θ[3]    3.7422    5.4718     0.0612    0.1177   2451.2714    1.0005    ⋯
        θ[4]    4.8221    5.0381     0.0563    0.1072   2825.2853    1.0010    ⋯
        θ[5]    3.4301    4.8281     0.0540    0.1178   2073.0898    1.0016    ⋯
        θ[6]    3.9433    5.0540     0.0565    0.1108   2378.3583    1.0015    ⋯
        θ[7]    6.7735    5.4439     0.0609    0.1182   2091.5826    1.0012    ⋯
        θ[8]    4.9105    5.6013     0.0626    0.1054   3117.7890    1.0010    ⋯
    summarystats(idata.posterior)

     variable	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail

	μ	4.4	3.284	-1.781	10.603	0.09	0.064	1336.0	1866.0
	τ	4.368	3.059	0.936	9.741	0.114	0.081	366.0	175.0
	θ[1]	3.43	4.828	-6.15	12.045	0.106	0.075	1995.0	3179.0
	θ[2]	4.822	5.038	-5.274	13.899	0.095	0.067	2543.0	2929.0
	θ[3]	4.91	5.601	-5.469	15.622	0.1	0.074	2676.0	3293.0
	θ[4]	3.943	5.054	-6.119	13.205	0.104	0.073	2233.0	2681.0
	θ[5]	6.608	5.945	-3.666	18.663	0.121	0.085	2269.0	2632.0
	θ[6]	6.774	5.444	-3.065	17.534	0.119	0.084	1959.0	2302.0
	θ[7]	5.028	4.937	-4.818	13.872	0.097	0.07	2386.0	3902.0
	θ[8]	3.742	5.472	-6.317	14.578	0.11	0.078	2294.0	2851.0

As you noticed, the order of θ parameter is quite different between the two.

θ[1] of the chn_post corresponds to θ[5] of the idata.posterior.

Which one is correct?

As arviz package provides this data as zdata=az.load_arviz_data("centered_eight"), this will give us the correct one.

    summarystats(zdata.posterior)

	variable	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail
	mu	4.093	3.372	-2.118	10.403	0.215	0.152	250.0	643.0
	theta[1]	6.026	5.782	-3.707	17.337	0.291	0.206	348.0	743.0
	theta[2]	4.724	4.736	-4.039	13.999	0.199	0.142	471.0	1018.0
	theta[3]	3.576	5.559	-6.779	13.838	0.248	0.175	463.0	674.0
	theta[4]	4.478	4.939	-5.528	13.392	0.199	0.141	503.0	666.0
	theta[5]	3.064	4.642	-5.972	11.547	0.234	0.166	380.0	833.0
	theta[6]	3.821	4.979	-5.507	13.232	0.211	0.15	516.0	1104.0
	theta[7]	6.25	5.436	-3.412	16.92	0.237	0.168	402.0	1026.0
	theta[8]	4.544	5.521	-5.665	15.266	0.231	0.163	449.0	1084.0
	tau	4.089	3.001	0.569	9.386	0.252	0.178	79.0	54.0

This give the conclusion that from_mcmcchains may have disrupted the correct sequence of theta parameters.

What do you think? I'm just confused....

Thanks

Add support for arviz v0.12.0

See the changelog. Changes we need to make are:

  • Forward and export extract_dataset
  • Forward and export(plot_ecdf is not documented as part of the arviz API) plot_ecdf
  • Make sure our model comparison API is still up-to-date
  • Mark the following test as no longer broken:
    # NOTE: currently the smoothed weights disagree, while the shapes agree
    # see https://github.com/arviz-devs/arviz/issues/1941
    @test_broken logw_smoothed logw_smoothed2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.