PyMC + ArviZ

Lecture 23

Dr. Colin Rundel

pymc

PyMC is a probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods.

Modern - Includes state-of-the-art inference algorithms, including MCMC (NUTS) and variational inference (ADVI).

User friendly - Write your models using friendly Python syntax. Learn Bayesian modeling from the many example notebooks.

Fast - Uses PyTensor as its computational backend to compile to C and JAX, run your models on the GPU, and benefit from complex graph-optimizations.

Batteries included - Includes probability distributions, Gaussian processes, ABC, SMC and much more. It integrates nicely with ArviZ for visualizations and diagnostics, as well as Bambi for high-level mixed-effect models.

Community focused - Ask questions on discourse, join MeetUp events, follow us on Twitter, and start contributing.

import pymc as pm

ArviZ

ArviZ is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, sample diagnostics, model checking, and comparison.

Interoperability - Integrates with all major probabilistic programming libraries: PyMC, CmdStanPy, PyStan, Pyro, NumPyro, and emcee.

Large Suite of Visualizations - Provides over 25 plotting functions for all parts of Bayesian workflow: visualizing distributions, diagnostics, and model checking. See the gallery for examples.

State of the Art Diagnostics - Latest published diagnostics and statistics are implemented, tested and distributed with ArviZ.

Flexible Model Comparison - Includes functions for comparing models with information criteria, and cross validation (both approximate and brute force).

Built for Collaboration - Designed for flexible cross-language serialization using netCDF or Zarr formats. ArviZ also has a Julia version that uses the same data schema.

Labeled Data - Builds on top of xarray to work with labeled dimensions and coordinates.

import arviz as az

Some history

Model basics

All models are derived from pymc’s Model() class, unlike what we have seen previously PyMC makes heavy use of Python’s context manager using the with statement to add model components to a model.

with pm.Model() as norm:
  x = pm.Normal("x", mu=0, sigma=1)

Note that with blocks do not have their own scope - so variables defined inside are added to the parent scope (becareful about overwriting other variables).

\(\text{x} \sim \operatorname{Normal}(0,~1)\)

type(x)

pytensor.tensor.variable.TensorVariable

Using a component without a context

x = pm.Normal("x", mu=0, sigma=1)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/.pyenv/versions/3.12.3/lib/python3.12/site-packages/pymc/distributions/distribution.py:481, in Distribution.__new__(cls, name, rng, dims, initval, observed, total_size, transform, default_transform, *args, **kwargs)
    479     from pymc.model import Model
--> 481     model = Model.get_context()
    482 except TypeError:

File ~/.pyenv/versions/3.12.3/lib/python3.12/site-packages/pymc/model/core.py:514, in Model.get_context(cls, error_if_none, allow_block_model_access)
    513 if model is None and error_if_none:
--> 514     raise TypeError("No model on context stack")
    515 return model

TypeError: No model on context stack

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 x = pm.Normal("x", mu=0, sigma=1)

File ~/.pyenv/versions/3.12.3/lib/python3.12/site-packages/pymc/distributions/distribution.py:483, in Distribution.__new__(cls, name, rng, dims, initval, observed, total_size, transform, default_transform, *args, **kwargs)
    481     model = Model.get_context()
    482 except TypeError:
--> 483     raise TypeError(
    484         "No model on context stack, which is needed to "
    485         "instantiate distributions. Add variable inside "
    486         "a 'with model:' block, or use the '.dist' syntax "
    487         "for a standalone distribution."
    488     )
    490 if not isinstance(name, string_types):
    491     raise TypeError(f"Name needs to be a string but got: {name}")

TypeError: No model on context stack, which is needed to instantiate distributions. Add variable inside a 'with model:' block, or use the '.dist' syntax for a standalone distribution.

Random Variables

pm.Normal() is an example of a PyMC distribution, which are used to construct models, these are implemented using the TensorVariable class which is used for all of the builtin distributions (and can be used to create custom distributions). Generally you will not be interacting with these objects directly, but they do have some useful methods and attributes:

type(norm.x)

pytensor.tensor.variable.TensorVariable

norm.x.owner.op

NormalRV(name=normal,signature=(),()->(),dtype=float64,inplace=False)

pm.draw(norm.x)

array(0.19112)

Standalone RVs

If you really want to construct a TensorVariable outside of a model it can be done via the dist method for each distribution.

z = pm.Normal.dist(mu=1, sigma=2, shape=[2,3])

normal_rv{"(),()->()"}.out

pm.draw(z)

array([[ 2.36473,  2.77069, -3.95415],
       [-0.54568,  0.05946, -0.46267]])

Modifying models

Because of this construction it is possible to add additional components to an existing (named) model via subsequent with statements (only the first needs pm.Model())

with norm:
  y = pm.Normal("y", mu=x, sigma=1, shape=3)

norm.basic_RVs

[x ~ Normal(0, 1), y ~ Normal(x, 1)]

Variable heirarchy

Note that we defined \(y|x \sim \mathcal{N}(x, 1)\), so what is happening when we use pm.draw(norm.y)?

pm.draw(norm.y)

array([ 0.93819, -1.54829, -2.31653])

obs = pm.draw(norm.y, draws=1000) 
obs

array([[-0.24739,  1.96912,  0.92891],
       [-0.94138, -1.1763 ,  1.52653],
       [-0.60988, -0.95977, -1.36152],
       ...,
       [-0.57515,  1.24518,  0.95842],
       [ 3.64309,  2.79414,  4.08703],
       [ 0.93813,  1.46746,  1.99498]], shape=(1000, 3))

np.mean(obs)

np.float64(0.08511372203814124)

np.var(obs)

np.float64(2.0135529900011018)

np.std(obs)

np.float64(1.4189971775874333)

Each time we ask for a draw from y, PyMC is first drawing from x for us.

Beta-Binomial model

We will now build a basic model where we know what the solution should look like and compare the results.

with pm.Model() as beta_binom:
  p = pm.Beta("p", alpha=10, beta=10)
  x = pm.Binomial("x", n=20, p=p, observed=5)

beta_binom.basic_RVs

[p ~ Beta(10, 10), x ~ Binomial(20, p)]

In order to sample from the posterior we add a call to sample() within the model context.

with beta_binom:
  trace = pm.sample(random_seed=1234, progressbar=False)

`pm.sample()` results

print(trace)

Inference data with groups:
    > posterior
    > sample_stats
    > observed_data

print(type(trace))

<class 'arviz.data.inference_data.InferenceData'>

Xarray - N-D labeled arrays and datasets in Python

Xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures.

Xarray is inspired by and borrows heavily from pandas, the popular data analysis package focused on labelled tabular data. It is particularly tailored to working with netCDF files, which were the source of xarray’s data model, and integrates tightly with dask for parallel computing.

Digging into `trace`

print(trace.posterior)

<xarray.Dataset> Size: 40kB
Dimensions:  (chain: 4, draw: 1000)
Coordinates:
  * chain    (chain) int64 32B 0 1 2 3
  * draw     (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999
Data variables:
    p        (chain, draw) float64 32kB 0.3347 0.3435 0.2629 ... 0.3276 0.3486
Attributes:
    created_at:                 2025-04-11T13:49:19.633849+00:00
    arviz_version:              0.21.0
    inference_library:          pymc
    inference_library_version:  5.22.0
    sampling_time:              0.2469632625579834
    tuning_steps:               1000

print(trace.posterior["p"].shape)

(4, 1000)

print(trace.sel(chain=0).posterior["p"].shape)

(1000,)

print(trace.sel(draw=slice(500, None, 10)).posterior["p"].shape)

(4, 50)

As a DataFrame

Posterior values, or subsets, can be converted to DataFrames via the to_dataframe() method

trace.posterior.to_dataframe()

		p
chain	draw
0	0	0.334673
	1	0.343498
	2	0.262910
	3	0.346714
	4	0.288526
...	...	...
3	995	0.421379
	996	0.432828
	997	0.327570
	998	0.327570
	999	0.348614

4000 rows × 1 columns

trace.posterior["p"][0,:].to_dataframe()

	chain	p
draw
0	0	0.334673
1	0	0.343498
2	0	0.262910
3	0	0.346714
4	0	0.288526
...	...	...
995	0	0.226934
996	0	0.483832
997	0	0.251825
998	0	0.486013
999	0	0.282455

1000 rows × 2 columns

Traceplots with ArviZ

az.plot_trace(trace)
plt.show()

Posterior plot with ArviZ

az.plot_posterior(trace, ref_val=[15/40])
plt.show()

PyMC vs Theoretical

Autocorrelation plots

az.plot_autocorr(trace, grid=(2,2), max_lag=20)
plt.show()

Forest plots

az.plot_forest(trace)
plt.show()

Other useful diagnostics

Standard MCMC diagnostic statistics are available via summary() from ArviZ

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
p	0.376	0.077	0.236	0.52	0.002	0.001	1757.0	2546.0	1.0

individual methods are available for each statistics,

print(az.ess(trace, method="bulk"))

<xarray.Dataset> Size: 8B
Dimensions:  ()
Data variables:
    p        float64 8B 1.757e+03

print(az.ess(trace, method="tail"))

<xarray.Dataset> Size: 8B
Dimensions:  ()
Data variables:
    p        float64 8B 2.546e+03

print(az.rhat(trace))

<xarray.Dataset> Size: 8B
Dimensions:  ()
Data variables:
    p        float64 8B 1.0

print(az.mcse(trace))

<xarray.Dataset> Size: 8B
Dimensions:  ()
Data variables:
    p        float64 8B 0.001843

Demo 1 - Linear regression

Given the below data, we want to fit a linear regression model to the following synthetic data,

np.random.seed(1234)
n = 11; m = 6; b = 2
x = np.linspace(0, 1, n)
y = m*x + b + np.random.randn(n)

Model

with pm.Model() as lm:
  m = pm.Normal('m', mu=0, sigma=50)
  b = pm.Normal('b', mu=0, sigma=50)
  sigma = pm.HalfNormal('sigma', sigma=5)
  
  likelihood = pm.Normal('y', mu=m*x + b, sigma=sigma, observed=y)
  
  trace = pm.sample(progressbar=False, random_seed=1234)

Posterior summary

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
b	2.166	0.778	0.685	3.596	0.022	0.019	1230.0	1576.0	1.0
m	5.627	1.321	3.179	8.103	0.037	0.032	1282.0	1596.0	1.0
sigma	1.374	0.406	0.721	2.041	0.010	0.016	1614.0	1260.0	1.0

Trace plots

ax = az.plot_trace(trace)
plt.show()

Regression line posterior draws

post_m = trace.posterior['m'].sel(chain=0, draw=slice(0,None,10))
post_b = trace.posterior['b'].sel(chain=0, draw=slice(0,None,10))

plt.figure(layout="constrained")
plt.scatter(x, y, s=30, label='data')
for m, b in zip(post_m.values, post_b.values):
    plt.plot(x, m*x + b, c='gray', alpha=0.1)
plt.plot(x, 6*x + 2, label='true regression line', lw=3., c='red')
plt.legend(loc='best')
plt.show()

Regression line posterior draws

Posterior predictive draws

Draws for observed variables can also be generated (posterior predictive draws) via the sample_posterior_predictive() method.

with lm:
  pp = pm.sample_posterior_predictive(trace, progressbar=False)

pp

Plotting the posterior predictive distribution

az.plot_ppc(pp, num_pp_samples=500)
plt.show()

PP draws

plt.figure(layout="constrained")
plt.scatter(x, y, s=30, label='data')
plt.plot(x, pp.posterior_predictive['y'].sel(chain=0).T, c="grey", alpha=0.01)
plt.plot(x, np.mean(pp.posterior_predictive['y'].sel(chain=0).T, axis=1), c='red', label="PP mean")
plt.legend()
plt.show()

PP HDI

plt.figure(layout="constrained")
plt.scatter(x, y, s=30, label='data')
plt.plot(x, np.mean(pp.posterior_predictive['y'].sel(chain=0).T, axis=1), c='red', label="PP mean")
az.plot_hdi(x, pp.posterior_predictive['y'])
plt.legend()
plt.show()

Model revision

with pm.Model() as lm2:
  m = pm.Normal('m', mu=0, sigma=50)
  b = pm.Normal('b', mu=0, sigma=50)
  sigma = pm.HalfNormal('sigma', sigma=5)
  
  y_hat = pm.Deterministic("y_hat", m*x + b)
  
  likelihood = pm.Normal('y', mu=y_hat, sigma=sigma, observed=y)
  
  trace = pm.sample(random_seed=1234, progressbar=False)
  pp = pm.sample_posterior_predictive(trace, var_names=["y_hat"], progressbar=False)

\(\hat{y}\) - PP

pm.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
b	2.166	0.778	0.685	3.596	0.022	0.019	1230.0	1576.0	1.0
m	5.627	1.321	3.179	8.103	0.037	0.032	1282.0	1596.0	1.0
sigma	1.374	0.406	0.721	2.041	0.010	0.016	1614.0	1260.0	1.0
y_hat[0]	2.166	0.778	0.685	3.596	0.022	0.019	1230.0	1576.0	1.0
y_hat[1]	2.729	0.672	1.476	4.006	0.019	0.016	1294.0	1537.0	1.0
y_hat[2]	3.291	0.576	2.240	4.433	0.016	0.013	1420.0	1648.0	1.0
y_hat[3]	3.854	0.497	2.936	4.811	0.012	0.011	1715.0	1959.0	1.0
y_hat[4]	4.417	0.444	3.556	5.218	0.009	0.009	2448.0	2475.0	1.0
y_hat[5]	4.980	0.427	4.176	5.762	0.007	0.009	3762.0	2262.0	1.0
y_hat[6]	5.542	0.450	4.717	6.393	0.007	0.010	4249.0	2635.0	1.0
y_hat[7]	6.105	0.507	5.187	7.077	0.009	0.011	3614.0	2566.0	1.0
y_hat[8]	6.668	0.589	5.499	7.713	0.011	0.012	2804.0	2318.0	1.0
y_hat[9]	7.230	0.687	5.911	8.473	0.014	0.014	2358.0	2265.0	1.0
y_hat[10]	7.793	0.794	6.226	9.198	0.018	0.016	2088.0	2214.0	1.0

\(\hat{y}\) - PP draws

plt.figure(layout="constrained")
plt.plot(x, pp.posterior_predictive['y_hat'].sel(chain=0).T, c="grey", alpha=0.01)
plt.scatter(x, y, s=30, label='data')
plt.show()

\(\hat{y}\) - PP HDI

plt.figure(layout="constrained")
plt.scatter(x, y, s=30, label='data')
plt.plot(x, np.mean(pp.posterior_predictive['y_hat'].sel(chain=0).T, axis=1), c='red', label="PP mean")
az.plot_hdi(x, pp.posterior_predictive['y_hat'])
plt.legend()
plt.show()

Demo 2 - Bayesian Lasso

n = 50
k = 100

np.random.seed(1234)
X = np.random.normal(size=(n, k))

beta = np.zeros(shape=k)
beta[[10,30,50,70]] =  10
beta[[20,40,60,80]] = -10

y = X @ beta + np.random.normal(size=n)

Naive model

with pm.Model() as bayes_naive:
  b = pm.Flat("beta", shape=k)
  s = pm.HalfNormal('sigma', sigma=2)
  
  pm.Normal("y", mu=X @ b, sigma=s, observed=y)
  
  trace = pm.sample(progressbar=False, random_seed=12345)

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta[0]	242.625	995.218	-1000.355	2950.690	434.973	319.426	6.0	11.0	2.16
beta[1]	-574.362	656.294	-1402.057	358.625	298.857	116.629	6.0	28.0	1.83
beta[2]	-1014.882	528.442	-1886.908	167.967	168.015	136.215	9.0	17.0	1.39
beta[3]	78.346	788.822	-860.682	1472.876	370.525	171.538	5.0	11.0	2.48
beta[4]	551.583	777.056	-454.753	1975.072	332.438	175.438	6.0	13.0	1.86
...	...	...	...	...	...	...	...	...	...
beta[96]	-702.573	880.101	-2461.944	820.584	409.925	269.151	5.0	11.0	2.54
beta[97]	117.851	813.313	-1336.065	1519.139	387.280	201.783	5.0	12.0	2.98
beta[98]	-297.373	995.011	-2420.511	1270.081	430.002	236.156	5.0	26.0	2.27
beta[99]	493.648	1248.412	-956.249	2742.285	598.336	261.016	5.0	15.0	2.65
sigma	2.498	1.021	1.298	4.470	0.363	0.063	9.0	78.0	1.44

101 rows × 9 columns

Weakly informative model

with pm.Model() as bayes_weak:
  b = pm.Normal("beta", mu=0, sigma=10, shape=k)
  s = pm.HalfNormal('sigma', sigma=2)
  
  pm.Normal("y", mu=X @ b, sigma=s, observed=y)
  
  trace = pm.sample(progressbar=False, random_seed=12345)

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta[0]	0.311	7.613	-13.351	15.058	0.105	0.137	5329.0	2526.0	1.00
beta[1]	1.233	6.587	-11.725	13.485	0.087	0.115	5716.0	2590.0	1.00
beta[2]	-0.120	7.743	-14.607	13.999	0.104	0.175	5597.0	2117.0	1.00
beta[3]	-1.485	6.990	-14.770	11.764	0.089	0.125	6156.0	2846.0	1.00
beta[4]	0.877	7.454	-13.828	14.448	0.103	0.125	5269.0	2631.0	1.00
...	...	...	...	...	...	...	...	...	...
beta[96]	-0.377	6.414	-12.167	11.563	0.089	0.107	5098.0	2949.0	1.00
beta[97]	-0.783	6.470	-13.404	10.882	0.089	0.109	5318.0	2669.0	1.00
beta[98]	1.443	6.720	-10.492	14.634	0.105	0.120	4091.0	2324.0	1.00
beta[99]	-1.201	6.854	-13.770	12.981	0.093	0.121	5426.0	2993.0	1.00
sigma	2.259	1.056	0.635	4.159	0.117	0.055	75.0	143.0	1.05

101 rows × 9 columns

az.summary(trace).iloc[[10,20,30,40,50,60,70,80]]

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta[10]	3.951	7.004	-9.375	16.974	0.115	0.127	3729.0	2376.0	1.0
beta[20]	-4.504	7.036	-18.159	8.207	0.097	0.115	5239.0	2944.0	1.0
beta[30]	5.473	6.723	-7.496	17.680	0.104	0.126	4182.0	2049.0	1.0
beta[40]	-4.949	8.112	-19.896	11.253	0.097	0.162	7039.0	2573.0	1.0
beta[50]	5.392	6.492	-6.410	17.520	0.089	0.106	5294.0	3012.0	1.0
beta[60]	-5.692	7.026	-18.329	8.414	0.097	0.128	5259.0	2191.0	1.0
beta[70]	4.644	7.427	-8.624	19.128	0.110	0.146	4585.0	1979.0	1.0
beta[80]	-7.800	6.097	-19.094	3.737	0.082	0.095	5551.0	3325.0	1.0

ax = az.plot_forest(trace)
plt.tight_layout()
plt.show()

Plot helper

def plot_slope(trace, prior="beta", chain=0):
  post = (trace.posterior[prior]
          .to_dataframe()
          .reset_index()
          .query(f"chain == {chain}")
         )
  
  sns.catplot(x="beta_dim_0", y="beta", data=post, kind="boxen", linewidth=0, color='blue', aspect=2, showfliers=False)
  plt.tight_layout()
  plt.xticks(range(0,110,10))
  plt.show()

plot_slope(trace)

Laplace Prior

with pm.Model() as bayes_lasso:
  b = pm.Laplace("beta", 0, 1, shape=k)
  s = pm.HalfNormal('sigma', sigma=1)
  
  pm.Normal("y", mu=X @ b, sigma=s, observed=y)
  
  trace = pm.sample(progressbar=False, random_seed=1234)

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta[0]	0.079	0.846	-1.540	1.765	0.015	0.020	3605.0	2149.0	1.00
beta[1]	0.212	0.738	-1.148	1.709	0.011	0.016	4542.0	2608.0	1.00
beta[2]	-0.057	0.803	-1.692	1.382	0.013	0.017	3748.0	2732.0	1.00
beta[3]	-0.280	0.777	-1.826	1.146	0.014	0.017	3336.0	1972.0	1.00
beta[4]	0.075	0.859	-1.596	1.708	0.017	0.026	2979.0	1849.0	1.00
...	...	...	...	...	...	...	...	...	...
beta[96]	0.061	0.752	-1.435	1.558	0.017	0.018	2233.0	1258.0	1.00
beta[97]	-0.143	0.718	-1.536	1.257	0.013	0.016	2932.0	2380.0	1.00
beta[98]	0.300	0.743	-0.997	1.752	0.016	0.016	2268.0	2275.0	1.01
beta[99]	-0.289	0.775	-1.839	1.144	0.015	0.017	2792.0	1483.0	1.00
sigma	0.902	0.497	0.276	1.848	0.058	0.032	70.0	141.0	1.03

101 rows × 9 columns

az.summary(trace).iloc[[10,20,30,40,50,60,70,80]]

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta[10]	8.349	1.248	5.734	10.493	0.027	0.023	2153.0	2169.0	1.00
beta[20]	-8.315	1.294	-10.647	-5.822	0.027	0.021	2249.0	2290.0	1.00
beta[30]	8.627	1.004	6.788	10.530	0.026	0.017	1458.0	1200.0	1.00
beta[40]	-8.772	1.641	-11.756	-5.558	0.043	0.035	1556.0	1303.0	1.00
beta[50]	9.003	1.010	7.089	10.897	0.022	0.018	2027.0	2524.0	1.00
beta[60]	-9.230	1.174	-11.417	-6.972	0.034	0.027	1178.0	597.0	1.01
beta[70]	8.512	1.172	6.203	10.570	0.025	0.021	2260.0	2035.0	1.00
beta[80]	-9.919	0.869	-11.531	-8.265	0.018	0.015	2224.0	2657.0	1.00

plot_slope(trace)

Demo 3 - Logistic Regression

Based on PyMC Out-Of-Sample Predictions example

Data

	x1	x2	y
0	-3.207674	0.859021	0
1	0.128200	2.827588	0
2	1.481783	-0.116956	0
3	0.305238	-1.378604	0
4	1.727488	-0.926357	1
...	...	...	...
245	-2.182813	3.314672	0
246	-2.362568	2.078652	0
247	0.114571	2.249021	0
248	2.093975	-1.212528	1
249	1.241667	-2.363412	0

250 rows × 3 columns

Test-train split

from sklearn.model_selection import train_test_split

y, X = patsy.dmatrices("y ~ x1 * x2", data=df)
X_lab = X.design_info.column_names
y_lab = y.design_info.column_names
y = np.asarray(y).flatten()
X = np.asarray(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7)

Model

with pm.Model(coords = {"coeffs": X_lab}) as model:
    # data containers
    X = pm.MutableData("X", X_train)
    y = pm.MutableData("y", y_train)
    # priors
    b = pm.Normal("b", mu=0, sigma=3, dims="coeffs")
    # linear model
    mu = X @ b
    # link function
    p = pm.Deterministic("p", pm.math.invlogit(mu))
    # likelihood
    obs = pm.Bernoulli("obs", p=p, observed=y)

Visualizing models

pm.model_to_graphviz(model)

Fitting

with model:
    post = pm.sample(progressbar=False, random_seed=1234)

az.summary(post)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
b[Intercept]	-0.895	0.330	-1.513	-0.294	0.010	0.005	1203.0	2281.0	1.0
b[x1]	1.299	0.341	0.662	1.913	0.010	0.006	1172.0	2033.0	1.0
b[x2]	-1.634	0.352	-2.339	-1.024	0.010	0.006	1297.0	1947.0	1.0
b[x1:x2]	2.714	0.496	1.843	3.660	0.014	0.009	1199.0	1766.0	1.0
p[0]	0.938	0.036	0.871	0.992	0.001	0.001	2382.0	2704.0	1.0
...	...	...	...	...	...	...	...	...	...
p[170]	0.000	0.000	0.000	0.000	0.000	0.000	1480.0	2089.0	1.0
p[171]	0.222	0.109	0.047	0.430	0.002	0.002	3924.0	3036.0	1.0
p[172]	1.000	0.000	1.000	1.000	0.000	0.000	1936.0	2541.0	1.0
p[173]	0.792	0.065	0.670	0.907	0.001	0.001	2632.0	2880.0	1.0
p[174]	0.623	0.075	0.480	0.760	0.001	0.001	3234.0	3044.0	1.0

179 rows × 9 columns

Trace plots

az.plot_trace(post, var_names="b", compact=False)
plt.show()

Posterior plots

az.plot_posterior(
  post, var_names=["b"], ref_val=[intercept, beta_x1, beta_x2, beta_interaction], 
  figsize=(15, 6)
)
plt.show()

Posterior samples

post

with model:
  pm.set_data({"X": X_test, "y": y_test})
  post = pm.sample_posterior_predictive(
    post, progressbar=False, var_names=["obs", "p"],
    extend_inferencedata = True
  )

post

Posterior predictive summary

az.summary(
  post.posterior_predictive  
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
obs[0]	1.000	0.000	1.000	1.000	0.000	NaN	4000.0	4000.0	NaN
obs[1]	0.454	0.498	0.000	1.000	0.008	0.001	3987.0	3987.0	1.0
obs[2]	0.999	0.027	1.000	1.000	0.000	0.008	4021.0	4000.0	1.0
obs[3]	0.485	0.500	0.000	1.000	0.008	0.000	3785.0	3785.0	1.0
obs[4]	0.008	0.088	0.000	0.000	0.001	0.008	4078.0	4078.0	1.0
...	...	...	...	...	...	...	...	...	...
p[70]	0.577	0.109	0.374	0.782	0.002	0.001	3240.0	2810.0	1.0
p[71]	0.995	0.008	0.983	1.000	0.000	0.000	2655.0	2907.0	1.0
p[72]	0.038	0.027	0.003	0.087	0.001	0.001	953.0	1377.0	1.0
p[73]	0.965	0.035	0.903	1.000	0.001	0.001	2377.0	2163.0	1.0
p[74]	0.856	0.057	0.741	0.947	0.001	0.001	2679.0	2752.0	1.0

150 rows × 9 columns

Evaluation

post.posterior["p"].shape

(4, 1000, 175)

post.posterior_predictive["p"].shape

(4, 1000, 75)

p_train = post.posterior["p"].mean(dim=["chain", "draw"])
p_test  = post.posterior_predictive["p"].mean(dim=["chain", "draw"])

print(p_train)

<xarray.DataArray 'p' (p_dim_0: 175)> Size: 1kB
array([0.93819, 0.26132, 0.01864, 0.99999, 0.4655 , 0.94175, 0.18459, 0.99135,
       0.28936, 0.95823, 0.53909, 0.12185, 0.63173, 0.97815, 1.     , 0.99997,
       0.48929, 0.44769, 1.     , 0.25201, 0.41885, 0.00006, 0.63208, 1.     ,
       0.     , 0.00025, 0.99912, 0.21269, 0.85564, 0.87999, 0.00411, 0.90838,
       0.00001, 0.58379, 0.90272, 0.91664, 0.00165, 0.85988, 0.09589, 0.     ,
       0.00006, 0.86009, 1.     , 0.99999, 0.6737 , 0.47217, 0.99937, 0.39814,
       0.64579, 0.11999, 0.44915, 0.17473, 0.8131 , 0.     , 0.04294, 0.19902,
       0.64672, 0.06343, 0.14008, 1.     , 1.     , 0.     , 0.96862, 0.00049,
       0.00011, 0.48045, 0.04294, 0.00011, 1.     , 1.     , 0.00125, 0.5352 ,
       0.43427, 0.99984, 0.00117, 0.99389, 0.80949, 0.40137, 0.00007, 1.     ,
       0.21644, 0.302  , 0.64459, 0.84646, 0.00032, 0.00032, 0.21478, 0.99995,
       0.     , 0.00174, 0.29865, 0.48853, 0.84389, 0.00308, 0.99328, 0.68351,
       0.0127 , 0.99999, 1.     , 0.00036, 0.99944, 0.92848, 0.40274, 0.03276,
       0.99735, 0.00578, 0.0001 , 1.     , 0.05347, 0.00139, 0.82088, 0.94333,
       0.0212 , 0.00001, 0.00001, 0.82958, 0.99836, 0.29717, 1.     , 0.9247 ,
       0.00946, 0.     , 0.0855 , 0.94712, 0.41086, 0.99915, 1.     , 0.42349,
       0.99999, 0.12598, 0.00345, 0.61756, 0.84217, 0.99998, 0.08445, 0.77662,
       0.00063, 0.00373, 0.45406, 0.56222, 0.99155, 0.71999, 0.99996, 1.     ,
       0.55747, 0.18935, 0.90859, 0.40357, 0.86634, 0.91812, 0.36738, 0.96028,
       0.00303, 0.11039, 0.05068, 0.38197, 0.36004, 0.00267, 1.     , 0.99857,
       0.92466, 0.01816, 0.00052, 0.13198, 0.     , 0.96729, 0.00001, 0.     ,
       0.99188, 0.37199, 0.00001, 0.22199, 0.99999, 0.79222, 0.6226 ])
Coordinates:
  * p_dim_0  (p_dim_0) int64 1kB 0 1 2 3 4 5 6 7 ... 168 169 170 171 172 173 174

print(p_test)

<xarray.DataArray 'p' (p_dim_0: 75)> Size: 600B
array([1.     , 0.45196, 0.99975, 0.46804, 0.00615, 1.     , 0.98427, 0.0004 ,
       0.99558, 0.08307, 0.17285, 0.78742, 0.99932, 0.9984 , 0.11223, 0.00537,
       1.     , 0.99499, 1.     , 0.     , 1.     , 0.     , 0.00011, 0.00268,
       0.99795, 0.08286, 0.     , 0.36999, 0.88805, 0.76588, 0.91828, 0.48579,
       0.80057, 0.95772, 0.21037, 0.00643, 0.     , 0.98847, 0.032  , 0.70961,
       0.21745, 0.1475 , 0.00636, 0.00013, 0.054  , 0.06816, 0.49145, 0.     ,
       0.55927, 0.     , 0.19585, 0.99921, 0.00419, 0.00002, 0.08313, 1.     ,
       0.     , 0.73422, 1.     , 1.     , 0.99977, 0.     , 0.48587, 0.68699,
       0.92828, 0.43828, 0.99233, 0.01992, 0.99996, 0.     , 0.57684, 0.99494,
       0.03784, 0.96468, 0.8558 ])
Coordinates:
  * p_dim_0  (p_dim_0) int64 600B 0 1 2 3 4 5 6 7 8 ... 67 68 69 70 71 72 73 74

ROC & AUC

from sklearn.metrics import RocCurveDisplay, accuracy_score, auc, roc_curve

# Test data
fpr_test, tpr_test, thd_test = roc_curve(y_true=y_test, y_score=p_test)
auc_test = auc(fpr_test, tpr_test); auc_test

np.float64(0.9495021337126599)

# Training data
fpr_train, tpr_train, thd_train = roc_curve(y_true=y_train, y_score=p_train)
auc_train = auc(fpr_train, tpr_train); auc_train

np.float64(0.9553291536050157)

ROC Curves

fig, ax = plt.subplots()
roc = RocCurveDisplay(fpr=fpr_test, tpr=tpr_test).plot(ax=ax, label="test")
roc = RocCurveDisplay(fpr=fpr_train, tpr=tpr_train).plot(ax=ax, color="k", label="train")
plt.show()

PyMC + ArviZ

pymc

ArviZ

Some history

Model basics

Using a component without a context

Random Variables

Standalone RVs

Modifying models

Variable heirarchy

Beta-Binomial model

pm.sample() results

Xarray - N-D labeled arrays and datasets in Python

Digging into trace

As a DataFrame

Traceplots with ArviZ

Posterior plot with ArviZ

PyMC vs Theoretical

Autocorrelation plots

Forest plots

Other useful diagnostics

Demo 1 - Linear regression

Model

Posterior summary

Trace plots

Regression line posterior draws

Regression line posterior draws

Posterior predictive draws

Plotting the posterior predictive distribution

PP draws

PP HDI

Model revision

\(\hat{y}\) - PP

\(\hat{y}\) - PP draws

\(\hat{y}\) - PP HDI

Demo 2 - Bayesian Lasso

Naive model

Weakly informative model

Plot helper

Laplace Prior

Demo 3 - Logistic Regression

Data

Test-train split

Model

Visualizing models

Fitting

Trace plots

Posterior plots

Posterior samples

Out-of-sample predictions

Posterior predictive summary

Evaluation

ROC & AUC

ROC Curves

`pm.sample()` results

Digging into `trace`