<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I think using HalfCauchy(1.) instead of <code class="

After investigating the model (see my <a href="https://gist.github.com/fehiepsi/c286c9

NUTS doesn't converge on a stan model about numpyro HOT 15 CLOSED

neerajprad commented on August 18, 2024 1

NUTS doesn't converge on a stan model

from numpyro.

Comments (15)

neerajprad commented on August 18, 2024 1

I think using HalfCauchy(1.) instead of Uniform(0, 100) addresses some of these issues. Results on Numpyro using HalfCauchy with 2000 warmup and 2000 samples. The parameter values are at least in the vicinity of what we get from Stan.

a1
[3.0880034 3.0277355 2.8439357 3.0560315]

a2
[0.06595199 0.0670628  0.07130343 0.06875928]

b1
[2.2987626 1.3927749 1.1088973]

b2
[-0.03455285 -0.01366357 -0.01128363]

c
[[1.038227   1.120141   1.0181044 ]
 [1.1059053  0.97992927 1.0885361 ]
 [1.069844   1.0359606  1.0458062 ]
 [1.0400462  1.1041307  1.0638354 ]]

d
[[0.01215129 0.0131588  0.01172877]
 [0.01290246 0.01140532 0.01308942]
 [0.0125756  0.01269642 0.01297002]
 [0.01225842 0.01312897 0.01268698]]

mu_a1
0.29997757

mu_a2
0.068249255

mu_b1
0.15676731

mu_b2
-0.1279364

mu_c
0.1059214

mu_d
0.12550469

sigma_a1
0.38256466

sigma_a2
0.008176872

sigma_b1
1.0645754

sigma_b2
0.057657257

sigma_c
0.15156718

sigma_d
0.0021564718

sigma_y
0.8757536

from numpyro.

fehiepsi commented on August 18, 2024 1

The data and model look correct to me. And Stan did use Uniform prior for bounded parameters. I can't think of any reason for why using Uniform didn't work. Will take a closer look now. :)

from numpyro.

fehiepsi commented on August 18, 2024 1

@neerajprad It turns out that the problem lies at init_params. I used the same init params with Stan and get similar results. In Stan, params is initialized Uniformly in (-2, 2) interval. While here we used initial_trace. The initial_trace makes dependent latent variables such as a1, b1 get wildly initial values. We might consider supporting the same behaviour as in Stan?

from numpyro.

fehiepsi commented on August 18, 2024 1

I think that we can close this now?

from numpyro.

fehiepsi commented on August 18, 2024

It seems that our transform of HalfCauchy is correct while transform of Uniform is wrong. I have fixed that bug in the last PR. We need to support general domain in PyTorch too.

from numpyro.

fehiepsi commented on August 18, 2024

Hmm, the fix seems not be able to fix the issue because low_bound of these uniform priors are 0...

from numpyro.

neerajprad commented on August 18, 2024

the fix seems not be able to fix the issue because low_bound of these uniform priors are 0...

Yeah, I think that's a very important fix, but unrelated to this particular model.

from numpyro.

fehiepsi commented on August 18, 2024

After investigating the model (see my gist, it seems to me that both Stan and numpyro achieve similar performance in term of log likelihood and mean squared error. Because the variables (a1, b1) and c are correlated (both are used for intercept), (a2, b2) and d are correlated (both are used for slope), I think that it is fine to get different posteriors for those variables.

from numpyro.

neerajprad commented on August 18, 2024

After playing around with this some more, here are some observations:

Working with Uniform(0, 100) does not lead to convergence on either Numpyro or Pyro. I am now thinking that maybe Stan uses some regularizing tricks for the default uniform prior. We could just make a recommendation to users to use weakly informative priors when writing out their models, as Stan does too.
Good news is that I am finding very good agreement when working with HalfCauchy(1) in both Numpyro and Stan. This is at least true for all parameter values where r_hat < 1.05.
With Pyro, for the same number of warmup and samples, there is very little agreement. While the inference does not result in blow-up of the scale parameters, the parameter means are different from Numpyro and Stan. This is also way slower, and takes more than 30 mins, so it is hard to debug what might be going wrong. I think this could be something subtle like numerical instability in some Pyro distributions / transform.

from numpyro.

neerajprad commented on August 18, 2024

I think that it is fine to get different posteriors for those variables.

I agree with this general point, and using a uniform prior here is probably a bad idea for this precise reason.

from numpyro.

fehiepsi commented on August 18, 2024

We could just make a recommendation to users to use weakly informative priors when writing out their models, as Stan does too.

Yes, weakly priors rescue correlated variables (replicated from SR book).

But I think that the main problem lies at the modelling part. More works should be done to make this model give reliable samples in both Stan/NumPyro...

In NumPyro, we can force getting small sigma (to be agreed with Stan) by putting generic weakly prior HalfCauchy(1.), but I just see a small gain in log likelihood and a slightly worse in term of mean squared error. So I guess "to be agreed with Stan" is not a good direction to follow. Rather than that, "how to reformulate the model to remove those correlated effects" is a better question IMO.

Anyway, this is a good problem to look at!

from numpyro.

fehiepsi commented on August 18, 2024

I have spent a large amount of time to debug this by disabling adaptation with fixing step_size=0.0004, and even doing inference in float64 (by casting init_params to float64) (not related: @neerajprad we might consider supporting a dtype arg to initialize_model method so that init_params will be cast to float64 automatically, and it will resolve #77, WDYT?).

Here are some observations so far:

potential energy (using Numpyro potential_fn) with final params from Stan NUTS is much less than with final params from Numpyro. So there seems indeed a problem here.

Things are confirmed to be agreed with Stan

potential energy evaluation (with and without jacobian adjustment)
gradient of pe
hence, the model is equivalent to Stan model
and transforms work as expected (including log_det_abs_jacobian);
so Uniform prior does not cause the problem

Things are not checked but less likely to cause the bugs:

turning condition (because in Stan and numpyro, most trajectories spent 1023 steps)

Things might cause the bug, ordered from less likely to most likely:

momentum generator
kinetic energy and its grad
velocity_verlet algorithm
biased transition kernel
uniform transition kernel

from numpyro.

fehiepsi commented on August 18, 2024

For replication Stan result, we can try to replace init params with zero_params

zero_params = {name: np.zeros(value.shape) for name, value in init_params.items()}

@neerajprad I think it is good (to make it easier for R users) to follow Stan approach to initialize params within (-2, 2) intervals but I leave that decision to you. At least, I think that it will make no harm. :)

from numpyro.

neerajprad commented on August 18, 2024

Sorry for a late response on this, and thanks so much for posting all the details about model debugging.

In Stan, params is initialized Uniformly in (-2, 2) interval.

I think that is the reason why we end up seeing values closer to 0 with Stan than with Numpyro. Having uncorrelated initial samples also makes sense. I'm fine with doing changing to this. Do you think we need to support alternate initialization strategies like sampling from the prior (what we are doing now)?

from numpyro.

fehiepsi commented on August 18, 2024

Do you think we need to support alternate initialization strategies like sampling from the prior (what we are doing now)?

Look good to me. I just wonder how we can add default arg/kwarg to initialize_model? I also want to add a dtype kwarg to cast init params to float64 (because random samplers return float32 by default).

from numpyro.

NUTS doesn't converge on a stan model about numpyro HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent