JAGS, and a bayesian coin toss

In the previous post, I talked about Bayesian stats and MCMC methods in general. In this post, I’ll work through an example where we try to infer how fair a coin-toss is, based on the results of ten coin flips. Most people use JAGS via an R interface, but I’m going to use JAGS directly to avoid obfuscation.

(Note: a coin-toss is a physical event determined by physics, so the “randomness” arises only through uncertainty of how hard it’s tossed, how fast it spins, where it lands etc, and therefore is open to all sorts of evil)

Firstly, we have to tell JAGS about our problem – eg. how many coin tosses we’ll do, and that we believe each coin toss is effectively a draw from a Bernoulli distribution with unknown proportion theta, and what our prior beliefs about theta are.

To do this, we create “example.model” containing:

model {
  for (i in 1:N){
    x[i] ~ dbern(theta)
  theta ~ dunif(0,1)

This says that we’ll have N coin-flips, and each coin flip is assumed to be drawn from the same Bernoulli distribution with unknown proportion theta. We also express our prior belief that all values of theta from zero to one are equally likely.

We can now launch “jags” in interactive mode:

$ jags
Welcome to JAGS 4.2.0 on Sun Feb 26 14:31:57 2017
JAGS is free software and comes with ABSOLUTELY NO WARRANTY
Loading module: basemod: ok
Loading module: bugs: ok

.. and tell it to load our example.model file ..

. model in example.model

If the file doesn’t exist, or the model is syntactically invalid you’ll get an error – silence means everything has gone fine.

Next, we need the data about the coin flip, which corresponds to the x[1] .. x[N] in our model. We create a file called “” containing:

N < - 10
x <- c(0,1,0,1,1,1,0,1,0,0)

The format for this file matches what R’s dump() function spits out. Here we’re saying that we have flipped ten coins (N is 10) and the results were tails/heads/tails/heads/heads etc. I’ve chosen the data so we have the same number of heads and tail, suggesting a fair coin.

We tell JAGS to load this file as data:

. data in
Reading data file

Again, it’ll complain about syntax errors (in an old-school bison parser kinda way) or if you have duplicate bindings. But it won’t complain yet if you set N to 11 but only provided 10 data points.

Next, we tell JAGS to compile everything. This combines your model and your data into an internal graph structure, ready for evaluating. It’s also where JAGS will notice if you’ve got too few data points or any unbound names in your model.

. compile
Reading data file
. compile
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 10
   Unobserved stochastic nodes: 1
   Total graph size: 14

The graph consists of ten “observed” nodes (one per coin flip) and one unobserved stochastic node (the unknown value of theta). The other nodes presumably include the bernoulli distribution and the uniform prior distribution.

At this stage, we can tell JAGS where it should start its random walk by providing an initial value for theta. To do this, we create a file “example.inits” containing:

theta < - 0.5

.. and tell JAGS about it ..

. parameters in example.inits
Reading parameter file example.inits

Finally, we tell JAGS to initialize everything so we’re ready for our MCMC walk:

. initialize
Initializing model

Now we’re ready to start walking. We need to be a bit careful at first, because we have to choose a starting point for our random walk (we chose theta=0.5) and if that’s not a good choice (ie. it corresponds to a low posterior probability) then it will take a while for the random walk to dig itself out of the metaphorical hole we dropped it in. So, we do a few thousand steps of our random walk, give it a fancy name like “burn-in period” and cross our fingers that our burn-in period was long enough:

. update 4000
Updating 4000
-------------------------------------------------| 4000
************************************************** 100%

(JAGS give some enterprise-level progress bars when in interactive mode, but not in batch mode).

JAGS has happily done 4000 steps in our random walk, but it hasn’t been keeping track of anything. We want to know what values of theta is jumping between, since that sequence (aka “chain”) of values is what we want as output.

To tell JAGS to start tracking where it’s been, we create a sampler for our ‘theta’ variable, before proceeding for another 4000 steps, and then writing the results out to a file:

. monitor theta
. update 4000
-------------------------------------------------| 4000
************************************************** 100%
. coda *

The last command causes two files to be written out – CODAindex.txt and CODAchain1.txt. CODA is a hilariously simple file format, coming originally from the “Convergence Diagnostic and Output Analysis” package in R/S-plus. Each line contains a step number (eg. 4000) and the value of theta at that step (eg. 0.65).

Here’s an interesting thing – why would we need a “Convergence Diagnostic” tool? When we did our “burn-in” phase we crossed our fingers and hoped we’d ran it for long enough. Similarly, when we did the random walk we also used 4000 steps. Is 4000 enough? Too many? We can answer these questions by looking at the results of the random walk – both to get the answer to our original question, but also to gain confidence that our monte-carlo approximation has thrown enough darts to be accurate.

At this point, we’ll take our coda files and load them into R to visualize the results.

$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

> require(coda)
Loading required package: coda

> c < - read.coda(index.file="CODAindex.txt",output.file="CODAchain1.txt")
Abstracting theta ... 5000 valid values

> summary(c)
Iterations = 4001:9000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE Time-series SE 
      0.501658       0.139819       0.001977       0.001977 

2. Quantiles for each variable:

  2.5%    25%    50%    75%  97.5% 
0.2436 0.4000 0.5022 0.6017 0.7675 

This is telling us that, given ten coin flips and our prior uniform belief and our bernoulli assumption, the most probably value for theta (the proportion of coin-flip yielding heads) is close to 0.5. Half of the probability mass lies between theta=0.4 and theta=0.6, and 95% of the probability mass lies between theta=0.25 and theta=0.75.

So it’s highly unlikely that the coin flip is extremely biased – ie. theta<0.25 or theta>0.75. Pleasantly, “highly unlikely” means “probability is less than 5%”. That’s a real common-or-garden probability. Not any kind of frequencist null-hypothesis p-value. We can make lots of other statements too – the probability that the bias is greater than 0.75 is about 40%. If we had a second coin (or coin flipper) we could make statement like “the probability that coin2 has a higher bias than coin1 is xx%”.

Let’s briefly revisit the question of convergence. There’s a few ways to determine how well your random walk represents (or “has converged to”) the true posterior distribution. One way, by Rubin and Gelman, is to run several random walks and look at the variance between them. The coda package in R comes with a function gelman.diag() for this purpose. However, in our simple example we only did one chain so we can’t run it on our coda files. (Incidentally, Gelman writes a great blog about stats).

In the next post, I’m will look at the performance characteristics of JAGS – how it scales with the number of data points, and what tools you can use to track this.