ericmjl / bayesian-stats-modelling-tutorial Goto Github PK
View Code? Open in Web Editor NEWHow to do Bayesian statistical modelling using numpy and PyMC3
License: MIT License
How to do Bayesian statistical modelling using numpy and PyMC3
License: MIT License
We need to modularize nb1 & 2.
Also need to restructure 3 onwards.
NB3 onwards:
the links in the README pull up HTML pages that seem to be missing images for the models.
Perhaps I could convert them to markdown/latex cells so they'll guarantee rendering?
Love this repo, by the way. Writing a software library (and demos for it) based on my thesis work in inverse problems, and this helps motivate what "good tutorials" can look like.
Both @ericmjl and I are firm believers that Probabilistic Programming has a bright and huge future.
I know other people believe the same. @springcoil has said toe me previously that "PP is the new deep learning" and I understand that @twiecki feels similarly.
What I'd like to do here is amass evidence of the bright future of PP and why we think it will garner increasing adoption.
A few things I've thought of
I appreciate this is very limited!
What other evidence/data is there for the future of PPL?
Note: @ericmjl and I are currently drafting a book proposal for O'Reilly, which motivated this question.
Tagging @fonnesbeck, @ericmjl, @betanalpha, @FrizzleFry, @springcoil, @twiecki, @justinbois, @AllenDowney as you all may have thoughts here. Do feel free to tag anybody else you think may have ideas.
thanks!
There's a chance that I may not make SciPy due to family health reasons and @ericmjl will teach the material I would have.
This issue contains notes for Eric to do so.
I have split NBs 1 & 2 into 2 NBs each:
I recall that we were going to spend 90 minutes on this material and that we would cover NBs 1a & 2b in detail, while covering 1b/2a more cursorily.
TBD w/ @ericmjl & we'll drop more notes in here
hey @ericmjl do you know any easy-ish way to create a colab for this repo?
Ideally, without needing to use google drive or anything like that?
e.g. for a single NB in a repo, i can just use the colab chrome extension -- like i did under the colab badges here
but i can't seem to figure out how to do it easily for a repo.
LMK if you know, bro!
This came out of some discussion with @ericmjl in the hallway track at SciPy 2018.
I had asked during the tutorial about choosing a highly informative prior that assumes a false belief. The widget below visualizes what happens in such a case. We model the highly informative prior with a Gaussian with a small standard deviation. It is modeled after the similar visualization in notebook 2.
def gaussian_pdf(x, mu, sd):
"""
The Gaussian probability distribution function. We could import this,
but the visualation is smoother if we define the function here.
"""
return np.exp(-np.power(x - mu, 2) / (2 * np.power(sd, 2)))
def plot_posteriors(p=0.6, N=0, mu=0.5, sd=1):
np.random.seed(42)
n_successes = np.random.binomial(N, p)
x = np.linspace(0.01, 0.99, 100)
prior1 = gaussian_pdf(x, mu, sd)
likelihood = x**n_successes*(1-x)**(N-n_successes)
posterior1 = likelihood * prior1
posterior1 /= np.max(posterior1) # so that peak always at 1
plt.plot(x, posterior1, label='Gaussian prior')
jp = np.sqrt(x*(1-x))**(-1) # Jeffreys prior
posterior2 = likelihood*jp # w/ Jeffreys prior
posterior2 /= np.max(posterior2) # so that peak always at 1 (not quite correct to do; see below)
plt.plot(x, posterior2, label='Jeffreys prior')
plt.legend()
plt.show()
The visualization can be created with
interact(plot_posteriors, p=(0, 1, 0.01), N=(0, 100), mu=(0, 1, 0.1), sd=(0.01, 1, 0.01));
I also wrote up a bit of explanation:
Here we are parameterizing the Gaussian pdf by (no surprise) the mean (mu
) and standard deviation (sd
). We can think of the standard deviation here as a measure of certainty in our prior belief that the probability of getting heads is the mean of the Gaussian.
Interesting things to try:
mu
) to 0.2
and the standard deviation (sd
) to 0.01. This is specifying a prior that we have a strong belief that the probability of flipping a heads is 0.2
(which is wrong assuming that p
is still set to 0.6
). Now change the number of trials (N
) to 100. What happens?sd
) to 1
. What happens now? Why does the posterior corresponding to the Gassian prior converge to the posterior from the Jeffreys prior?As per title.
This was seen using conda
4.6.14.
$ conda env create -f environment.yml
Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.
Collecting package metadata: ...working... failed
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/msys2/noarch/repodata.json.bz2>
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
If your current network has https://www.anaconda.com blocked, please file
a support request with your network engineering team.
SSLError(MaxRetryError('HTTPSConnectionPool(host=\'repo.anaconda.com\', port=443): Max retries exceeded with url: /pkgs/msys2/noarch/repodata.json.bz2 (Caused by SSLError("Can\'t connect to HTTPS URL because the SSL module is not available."))'))
@ericmjl Today we discussed steps for an html version of tutorial:
Did I get that mostly correct?
Update from @ericmjl:
src/bayes_tutorial/<something_appropriate>.py
. Then import it back into the notebook.matplotlibrc
file that @hugobowne provided to style all of the plots.This can be handled post-merge of #5.
I would like a consistent set of model diagrams for each notebook. Time to bust out Illustrator!
I tried your conda instruction (conda env create -f environment.yml -v
) and ran into this error:
pyc file failed to compile successfully (run_command failed)
python_exe_full_path: C:\...\envs\bayesian-modelling-tutorial\python.exe
py_full_path: C:\...\envs\bayesian-modelling-tutorial\Lib\site-packages\missingno\utils.py
pyc_full_path: C:\...\envs\bayesian-modelling-tutorial\Lib\site-packages\missingno\__pycache__\utils.cpython-37.pyc
compile rc: 1
compile stdout: "Did not find VS in registry or in VS140COMNTOOLS env var - exiting"
compile stderr: ERROR: The system was unable to find the specified registry key or value.
The above message was repeated several times for some other packages, following by this at the end:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\...\lib\site-packages\conda\core\link.py", line 558,in _execute
cls._execute_post_link_actions(pkg_idx_tracked, axngroup)
File "C:\...\lib\site-packages\conda\core\link.py", line 664,in _execute_post_link_actions
reverse_excs,
conda.CondaMultiError: post-link script failed for package defaults::qt-5.9.7-vc14h73c81de_0
running your command again with `-v` will provide additional information
location of failed script: C:\...\envs\bayesian-modelling-tutorial\Scripts\.qt-post-link.bat
I am using conda 4.6.14 on Windows 10, with Git Bash shell.
p.s. I also tried a combo of creating a basic conda env and then pip install some stuff into it (several variants of that combo) but even when I managed to start the notebook, the import step crashed the notebook kernel with DLL error.
It's come up in other tutorials, but maybe having a docker container could be good for getting users up-and-running.
I ran into a problem which has similar traceback as conda/conda#8171 . Despite that issue being marked as resolved and that I am using the latest release of conda 4.6.14, it still happened.
Tried some other things I found on the Internet like uninstalling and re-installing pyzmq
and jupyter
with either conda
or pip
. Didn't help.
In my case, the activate
command gave some warnings about not able to find vc/vcvarsall.bat
, which might or might not be related, as I could still use the environment.
At this point, I have wasted enough time on this, and thus my workaround is to re-create the tutorial enviroment without jupyter
, jupyterlab
, and nodejs
, but add ipython
to the YML, then run the examples inside a good ol' IPython session.
Title: Bayesian Data Science: Probabilistic Programming
Abstract: This tutorial will introduce you to the wonderful world of Bayesian data science through the lens of probabilistic programming. In the first hour of the tutorial, we will begin reintroduce the key concept of probability distributions via hacker statistics, hands-on simulation and telling stories of the data-generation processes. We will also cover the basics joint and conditional probability, and Bayes' rule and Bayesian inference. In the latter 2/3 of the tutorial, we will use a series of models to build your familiarity with PyMC3, showcasing how to perform the foundational inference tasks of group comparison and arbitrary curve regression. By the end of this tutorial, you will be equipped with a solid grounding in Bayesian inference, able to write arbitrary models, and have experienced basic model checking workflow.
Notes to registered attendees:
Dear all,
We (Hugo and myself) have reviewed the tutorial schedule, and after discussing with the SciPy tutorial committee and the instructors before and after us in the schedule, we have proposed to deliver a tutorial that focuses on probabilistic programming instead of simulation. The tutorial committee has given us the green light for this change, and we thought it would be only right to inform you of the change and guide you on how to best prepare for the updated tutorial.
The motivation is as follows: Our simulation-based tutorial covers the basic concepts of Bayesian inference through the lens of simulation, which overlaps with Allen Downey's content. In addition, the tutorial after us covers Bayesian model checking using ArviZ, for which it would be very helpful to have had a longer session on probabilistic programming first. Taken together, we envision this as a "Bayes Track"-ish series at SciPy, where our tutorials can exist independently (because we do have enough recap on our own), but complement each other very well nonetheless; the sections that overlap hopefully give sufficient repetition for learning without being overly repetitive.
As such, this tutorial requires basic knowledge in Bayesian inference and probability. You can gain this knowledge for SciPy 2019 in one of two ways. You may either
Attendance in Allen's tutorial is not a mandatory prerequisite, but highly encouraged; that said, we will cover enough material to ground our material in the recap. Additionally, if you would like to have a stronger grounding on probabilistic programming prior to attending the ArviZ tutorial led by Ravin Kumar and Colin Carroll, our tutorial would be a helpful starting point, though they will also be covering the basics of probabilistic programming for their tutorial. With that all said, we think attending all three would be extremely beneficial for your knowledge development in Bayesian inference, and there might probably be a gift for you from the instructors if you've braved through all three this year!
If you have questions regarding the tutorial, please feel free to reach out to us. Our preferred method of communication is on the GitHub issue tracker, where questions we have addressed before can be publicly viewable by others and searchable.
Cheers,
Eric & Hugo
@hugobowne, if you look this commit, you will find that in my branch, I have added one row of data to the finches 2012 dataset.
This simulates the discovery of a new species of finch, for which the only thing we know is that it is genetically related. There is a teaching moment baked into this: if we did not do hierarchical modelling, where we assumed that the new species of finch was genetically related to the known finches, then on the basis of a single observation, we would get posterior estimates for beak length that were unreasonable.
More details will be found in the notebooks that I'm finishing up right now. Working through this particular example just reinforced for me the elegance of using hierarchical modelling where relatedness is a reasonable assumption.
Anyways, that was just a longwinded way of informing you of the minor addition made to the dataset, and I just didn't want to give you unnecessary surprises there!
In the ECDF examples, maybe
plt.plot(x, y, marker='.', linestyle='none');
can be replaced with
plt.scatter(x, y)
?
Title.
I will handle this one.
I'm wrestling with one question right now: I really, really want to include one notebook which is very, very unstructured and open-ended. (This would probably be the last nb.) This simulates the scenario that most of us will be in when faced with a novel modelling question.
Yet, given that the class probably only will have had 2 hours with PyMC3, I'm also worried that this would be a bit "too much" for them.
Thus, I have the following hypothesis:
Do you think this is a reasonable hypothesis? Would it be worthwhile to try this out during the tutorial?
What is your "estimate"?
Leave links to snapshot versions of old tutorials in the README.
Following up on the email thread.
Collating a running list of stylistic issues that I see as they crop up.
For each notebook that I've created:
load_xyz()
.utils.despine_traceplot(traces)
)utils.despine(ax)
)Things to harmonize between Hugo's and my notebooks:
01-...
, 02-...
, 03-...
etc.Will append to this comment as more things crop up.
Hi,
I am trying to set up environment using binder session for our class and I get error below.
"Failed to connect to event stream."
Has anyone else had this issue? Please advise.
Thank you.
PS.
Out of curiosity, what tool did you use to make the red distribution vector art? :)
As seen in problem 1, third notebook.
Eric:
Very much enjoy your tutorials on bayesian inference and PYMC.
There was something I noticed in the repo files that confused me a litte. The ecdf function you defined in the notebook (01a-instructor-probability-simulation.ipynb) differs from the ECDF function in the utils.py script. The one in the notebook is below, and gives me the results I expect:
def ecdf(data):
"""Compute ECDF for a one-dimensional array of measurements."""
# Number of data points
n = len(data)
# x-data for the ECDF
x = np.sort(data)
# y-data for the ECDF
y = np.arange(1, n+1) / n
return x, y
However, the ECDF function in the utils.py script (below) does not give me the results I expect:
utils.py
SCRIPTdef ECDF(data):
x = np.sort(data)
y = np.cumsum(x) / np.sum(x)
return x, y
Is there an error in the utils.py script?
Thanks,
Tom
This tutorial is split into 2 four-hour segments. The first segment deals with the basics of probability. The second deals with probabilistic programming and model formulation.
This is a very hands-on tutorial, including ample time for exploration and discovery.
At the end of Part 1 of this tutorial, participants will be able to:
At the end of Part 2 of this tutorial, participants will be able to:
how do you like to describe HPD @ericmjl @justinbois @betanalpha ?
For people using Jupyter lab, we'll need to instruct them on how to get ipywidgets working in lab notebooks. see here:
https://ipywidgets.readthedocs.io/en/latest/user_install.html#installing-the-jupyterlab-extension
I use them here: https://github.com/ericmjl/bayesian-stats-modelling-tutorial/blob/master/notebooks/02-Instructor-Parameter_estimation_hypothesis_testing.ipynb
it boils down to running the following in a terminal (you'll need node installed)
jupyter labextension install @jupyter-widgets/jupyterlab-manager
The command is taken from here.
2nd note
Also, I thought we had a section in the README on opening a NB or something like point 4 in this README:
https://github.com/datacamp/datacamp_facebook_live_nlp
to do:
Could there be a restructuring of NB2's pre-PPL content, such that the "distributions as 1st-class citizens" could be emphasized? I have a hunch that this might help with learning.
I suspect that maybe using scipy.stats could help? It helps reinforce the idea that:
Following the local install instructions in the README, I end up with these errors:
(base) C:\folder_path\bayesian-stats-modelling-tutorial>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: | b'Exception while loading config file C:\\me\\.jupyter\\jupyter_notebook_config.py\n Traceback (most recent call last):\n File "C:\\me\\Anaconda3\\envs\\bayesian-modelling-tutorial\\lib\\site-packages\\traitlets\\config\\application.py", line 563, in _load_config_files\n config = loader.load_config()\n File "C:\\me\\Anaconda3\\envs\\bayesian-modelling-tutorial\\lib\\site-packages\\traitlets\\config\\loader.py", line 457, in load_config\n self._read_file_as_dict()\n File "C:\\me\\Anaconda3\\envs\\bayesian-modelling-tutorial\\lib\\site-packages\\traitlets\\config\\loader.py", line 489, in _read_file_as_dict\n py3compat.execfile(conf_filename, namespace)\n File "C:\\me\\Anaconda3\\envs\\bayesian-modelling-tutorial\\lib\\site-packages\\ipython_genutils\\py3compat.py", line 198, in execfile\n exec(compiler(f.read(), fname, \'exec\'), glob, loc)\n File "C:\\me\\.jupyter\\jupyter_notebook_config.py", line 1\n c.JupyterLabTemplates.template_dirs = [\'C:\\me\\Anaconda3\\envs\\dsml\\share\\jupyter\\notebook_templates\']\n ^\n SyntaxError: (unicode error) \'unicodeescape\' codec can\'t decode bytes in position 2-3: truncated \\UXXXXXXXX escape\nEnabling notebook extension jupyter-js-widgets/extension...\n - Validating: ok\n'
done
#
# To activate this environment, use
#
# $ conda activate bayesian-modelling-tutorial
#
# To deactivate an active environment, use
#
# $ conda deactivate
What did I miss?
In the second notebook, when participants are introduced new terms like "prior" etc, it is easy to get lost without plot labels.
This should be done pre-merge on PR #5.
Please view the following of constructive criticism!
I think there are three main concepts you can get to in 4 hours. What a probability distribution is, the connection between the posterior distribution (what you want) and the likelihood and prior, and that you can sample out of the posterior (as specified by the likelihood and prior) using MCMC. The logical connection between the prior and posterior as the data updates belief is an important point. I think the hacker stats style sampling and the mathematical expressions/plots might get in the way of a more streamlined message.
American spelling is "modeling" with a single "l". 😬
I have set up according to the instructions given in the Github on three computers. On two of them, the tutorials work, on the third there is a fatal error importing pymc3
. All three computers are up to date, fully patched Windows 10 boxes (I can try to provide more info if needed!). The problem is in the 02-Instructor-Parameter_estimation_hypothesis_testing notebook and happens when running the import
block at top.
On two of the computers, running this first block gives the warning message:
WARNING (theano.configdefaults): g++ not available, if using conda:
`conda install m2w64-toolchain`
C:\......\anaconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\configdefaults.py:560:
UserWarning: DeprecationWarning: there is no c++ compiler.This is
deprecated and with Theano 0.11 a c++ compiler will be mandatory
warnings.warn("DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be
unable to execute optimized C-implementations (for both CPU and GPU)
and will default to Python implementations. Performance will be
severely degraded. To remove this warning, set Theano flags cxx to an
empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation
for BLAS functions.
But things still work! All of the code blocks in the notebook execute, if slowly (for NUTS).
On the third computer, which was set up using the exact same steps, this is the error message that results (and the error is FATAL for using pymc3):
You can find the C code in this temporary file: C:\......\AppData\Local\Temp\theano_compilation_error__knt3a6r
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\lazylinker_c.py in <module>
80 version,
---> 81 actual_version, force_compile, _need_reload))
82 except ImportError:
ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\lazylinker_c.py in <module>
104 version,
--> 105 actual_version, force_compile, _need_reload))
106 except ImportError:
ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
<ipython-input-1-3441c4f46c01> in <module>
4 import seaborn as sns
5 import matplotlib.pyplot as plt
----> 6 import pymc3 as pm
7 from ipywidgets import interact
8 import arviz as az
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\pymc3\__init__.py in <module>
3
4 from .blocking import *
----> 5 from .distributions import *
6 from .glm import *
7 from . import gp
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\pymc3\distributions\__init__.py in <module>
----> 1 from . import timeseries
2 from . import transforms
3
4 from .continuous import Uniform
5 from .continuous import Flat
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\pymc3\distributions\timeseries.py in <module>
----> 1 import theano.tensor as tt
2 from theano import scan
3
4 from pymc3.util import get_variable_name
5 from .continuous import get_tau_sigma, Normal, Flat
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\__init__.py in <module>
108 object2, utils)
109
--> 110 from theano.compile import (
111 SymbolicInput, In,
112 SymbolicOutput, Out,
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\compile\__init__.py in <module>
10 from theano.compile.function_module import *
11
---> 12 from theano.compile.mode import *
13
14 from theano.compile.io import *
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\compile\mode.py in <module>
9 import theano
10 from theano import gof
---> 11 import theano.gof.vm
12 from theano import config
13 from six import string_types
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\vm.py in <module>
672 if not theano.config.cxx:
673 raise theano.gof.cmodule.MissingGXX('lazylinker will not be imported if theano.config.cxx is not set.')
--> 674 from . import lazylinker_c
675
676 class CVM(lazylinker_c.CLazyLinker, VM):
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\lazylinker_c.py in <module>
138 args = cmodule.GCC_compiler.compile_args()
139 cmodule.GCC_compiler.compile_str(dirname, code, location=loc,
--> 140 preargs=args)
141 # Save version into the __init__.py file.
142 init_py = os.path.join(loc, '__init__.py')
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\cmodule.py in compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, py_module, hide_symbols)
2409 # difficult to read.
2410 raise Exception('Compilation failed (return status=%s): %s' %
-> 2411 (status, compile_stderr.replace('\n', '. ')))
2412 elif config.cmodule.compilation_warning and compile_stderr:
2413 # Print errors just below the command line.
. collect2.exe: error: ld returned 1 exit statusindows-10-10.0.18362-SP0-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-3.7.6-64/lazylinker_ext/mod.cpp:976: undefined reference to `__imp__Py_TrueStruct'Error'efined references to `__imp__Py_NoneStruct' followow
At this point, code not using pymc3
works but all pymc3
blocks crash.
Additionally, all three of the computers have trouble starting the environment for this tutorial (at Scipy) and print the following at the shell:
(base) D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>conda activate bayesian-modelling-tutorial
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET DISTUTILS_USE_SDK=1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET MSSdk=1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET "VS_VERSION=15.0"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET "VS_MAJOR=15"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET "VS_YEAR=2017"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "MSYS2_ARG_CONV_EXCL=/AI;/AL;/OUT;/out"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "MSYS2_ENV_CONV_EXCL=CL"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "PY_VCRUNTIME_REDIST=\bin\vcruntime140.dll"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "CXX=cl.exe"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "CC=cl.exe"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "VSINSTALLDIR="
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>for /F "usebackq tokens=*" %i in (`vswhere.exe -nologo -products * -version [15.0,16.0) -property installationPath`) do (set "VSINSTALLDIR=%i\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "" (for /F "usebackq tokens=*" %i in (`vswhere.exe -nologo -products * -requires Microsoft.VisualStudio.Component.VC.v141.x86.x64 -property installationPath`) do (set "VSINSTALLDIR=%i\" ) )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>IF NOT "" == "" (
set "INCLUDE=;"
set "LIB=;"
set "CMAKE_PREFIX_PATH=;"
)
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>call :GetWin10SdkDir
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>call :GetWin10SdkDirHelper HKLM\SOFTWARE\Wow6432Node 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE\Wow6432Node 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKLM\SOFTWARE 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 exit /B 1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>exit /B 0
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>for /F %i in ('dir /ON /B "\include\10.*"') DO (SET WindowsSDKVer=%~i )
The system cannot find the file specified.
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 (echo "Didn't find any windows 10 SDK. I'm not sure if things will work, but let's try..." ) else (echo Windows SDK version found as: "" )
Windows SDK version found as: ""
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>IF "win-64" == "win-64" (
set "CMAKE_GEN=Visual Studio 15 2017 Win64"
set "BITS=64"
) else (
set "CMAKE_GEN=Visual Studio 15 2017"
set "BITS=32"
)
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>pushd C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\
The system cannot find the path specified.
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>CALL "VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.16
The system cannot find the path specified.
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>popd
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>IF "" == "" SET "CMAKE_GENERATOR=Visual Studio 15 2017 Win64"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>call :GetWin10SdkDirHelper HKLM\SOFTWARE\Wow6432Node 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE\Wow6432Node 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKLM\SOFTWARE 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE 1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 exit /B 1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>exit /B 0
(bayesian-modelling-tutorial) D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>
But for emphasis—this shell dump occurs on all 3 computers and two of those computers work, showing only the warning at the start! Based on this output, it seems there might be a mandatory dependency for MS Visual Studio being installed and in some particular location (on one of these machines, VS is around but not found!). On the working machines pymc3
appears to use the python code fallback. For some reason it will not do this the third failing computer.
I hope that these hidden dependencies can be fixed or spelled out, as we were unable to continue with the tutorial as this killed us completely. We can try to watch the replay later if we can get things working.
I am happy to provide more information if I can!
How would we extract the posterior distribution information from a theano.tensor.var.TensorVariable to numpy arrays?
Current method makes it hard to utilize this information to be part of a bigger code where you might want to use the posterior distribution (or say just the mean and HPD bounds?).
Moving this conversation over from this Twitter thread to an issue so others can track.
I'm working through all your nbs to prep for #pycon2019 and ran into an error in the 2nd notebook coming from the pymc3 import. Wanted to let y'all know as I likely won't the the only one to hit this. I've posted a screenshot of the error below, and it appears to be coming from pmc3--specifically the C compiler, from what I can tell from my google searches.
P.S. as requested by @ericmjl -- I did check and this error does NOT appear when using the binder notebooks. So that's a plus!
@hugobowne I'm writing here some post-SciPy 2018 thoughts, gathered from my subjective view + participants' provided feedback in the feedback form.
This was not just from the feedback provided in-person, but also from the feedback provided. Numerical ratings are available on this report on TypeForm.
The amount of material covered in the first two hours might be better expounded on if we were to propose this as a two-part tutorial, each with a difference audience focus, with you leading the theory and hacker statistics section, and me leading the probabilistic programming session.
I came to this tentative conclusion only after reflecting on the tutorial's progress today. Here's my observations and thoughts so far:
For those in the class that needed the in-depth introduction to probability, they really enjoyed your portion of the class. I think if we expanded your section to a full standalone tutorial, it might help learners get into the "distributional thinking" that is needed to work with a probabilistic programming language.
Now that I've had some space away from the material, I think that to introduce joint and conditional probability, starting with two-layer, binary tree-based probability example (e.g. cookie jar problem) might be more useful for audience group (probability novices), while Darwin's finches will be more useful for audience group 1 (probability intermediates/experienced). The cookie jar problem has a very intuitive "path-tracing" intuition to joint and conditional probability, and might help bridge "algebra to pictures" for a beginner (for which we had quite a number of in the SciPy crowd). On the other hand, it might not be complicated enough to be a meaningful refresher for audience group 1. Conditional and joint distribution with the Darwin's finches dataset might be a better fit there.
(This is admittedly longer because I was more involved in it.)
I think the big lesson I learned from leading this section today was that a lecture + discussion format can still yield a meaningful tutorial experience for participants, but it isn't easily made inclusive of the whole class unless there is intentionally group discussion time scheduled in. Yet group discussion time takes time, and can eat into the opportunity to get practice in-class with the probabilistic programming exercises.
I think for the next iteration, I would redo the learning objectives of this section. Thinking more specifically about assumed knowledge, I would most ideally like to build on top of what you taught in Part 1, listed as follows:
numpy
.Then, I would focus on the core, minimal set of Bayesian workflow steps, as described in an updated set of learning objectives. By the end of a workshop, a participant should be able to:
The model building process is iterative, and should involve more of the kind of group discussion that occurred.
To be more inclusive, I would probably schedule more "talk with your neighbours" followed by "share your discussion with the big group". This would allow the shier participants to engage in the class.
I have feedback that the model diagrams (red + white distributions) were helpful for thinking through the model building process. Had feedback that they should be introduced somewhat earlier in the class, though I'm not sure of the utility of this.
checkenv.py
.Latest version of conda on macOS did not install package nodejs
as part of any implicit depedency of the packages in environment.yml
if you install it "manually" with conda install nodejs
after the env is created, and then run the nbextensions install, all the widgets work.
👋 @ericmjl (cc @hugobowne as Eric is busy for good reason — congrats Eric!) would you be open to having the CI get revamped so that it is working again if it is redone in GitHub Actions? Or would do you strongly prefer to have it be done in Azure pipelines?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.