No BUGS, instead JAGS

JAGS is a useful statistics tool, helping you decide how to generalise experimental results. For example, if you roll a die ten times and it comes up “six” half of the time, is that strong evidence that the die is loaded? Or if you toss a coin ten times and it comes up heads nine times, what the probability that the coin is a “normal” coin?

JAGS is based on the bayesian approach to statistics, which uses Bayes rule to go from your experimental results to a (probabilistic) statement of how loaded the dice is. This is different approach from the frequencist approach to statistics which most textbooks cover – with p values and null hypothesis test. The upside of the bayesian approach is that it answers the kind of questions you want to ask (like, “what is the probability that drug A is better than drug B at treating asthma”) as opposed to the convoluted questions which frequencist statistics answer (“assuming that there’s no difference between drug A and drug B, what’s the probability that you’d get a measured difference at least as large as the one you saw?”). The downside of Bayesian stats is that you have to provide a “prior” probability distribution, which expresses your beliefs of how likely each outcome is prior to seeing any experiment results. That can seem a bit ugly, since it introduces a subjective element to the calculation. Some people find that unacceptable, and indeed that’s why statistics forked in the early 1900s from its bayesian origins to spawn the frequencist school of probability with it’s p-values driven by popular works by Ronald Fisher. But on the other hand, no experiment is run in a vacuum. We do not start each experiment in complete ignorance, nor is our selection of which experiments to run, or which hypotheses to check, determined objectively. The prior allows us to express information from previous knowledge, and we can rerun our analysis with a range of priors to see how sensitive our results are to the choice of prior.

Although Bayes rule is quite simple, only simpler textbook examples can be calculated exactly using algebra. This does include a few useful cases, like the coin-flipping example used earlier (so long as your prior comes from a particular family of probability distribution). But for more real-world examples, we end up using numerical techniques – in particular, “Markov Chain Monte Carlo” methods. “Monte Carlo” methods are anything where you do simple random calculations which, when repeated enough times, converge to the right answer. A nice example is throwing darts towards a circular darts board mounted on a square piece of wood – if the darts land with uniform probability across the square, you can count what fraction land inside the circle and from that get an approximation of Pi. As you throw more and more darts, the approximation gets closer and closer to the right answer. “Markov Chain” is the name given to any approach where the next step in your calculation only depends on the previous step, but not any further back in history. In Snakes and Ladders, your next position depends only on your current position and the roll of the die – it’s irrelevant where you’ve been before that.

When using MCMC methods for Bayesian statistics, we provide our prior (a probability distribution) and some data and a choice of model with some unknown parameters, and the task is to produce probability distributions for these unknown parameters. So our model for a coin toss might be a Bernoilli distribution with unknown proportion theta, our prior might be a uniform distribution from 0 to 1 (saying that we think all values of theta are equally likely) and the data would be a series of 0’s or 1’s (corresponding to heads and tails). We run our MCMC algorithm of choice, and out will pop a probability distribution over possible values of theta (which we call the ‘posterior’ distribution). If our data was equally split between 0’s and 1’s, then the posterior distribution would say that theta=0.5 was pretty likely, theta=0.4 or theta=0.6 fairly likely and theta=0.1 or theta=0.9 much less likely.

There’s several MCMC methods which can be used here. Metropolis-Hasting, created in 1953 during the Teller’s hydrogen bomb project, works by jumping randomly around the parameter space – always happy to jump towards higher probability regions, but will only lump to lower probability regions some of the time. This “skipping around” yields a sequence (or “chain”) of values for theta drawn from the posterior probability distribution. So we don’t ever directly get told what the posterior distribution is, exactly, but we can draw arbitrarily many values from it in order to answer our real-world question to sufficient degree of accuracy.

JAGS uses a slightly smarter technique called Gibbs Sampling which can be faster because, unlike Metropolis-Hasting, it never skips/rejects any of the jumps. Hence the name JAGS – Just Another Gibbs Sampler. You can only use this if it’s easy to calculate the conditional posterior distribution, which is often the case. But it also frees you from the Metropolis-Hasting need to have (and tune) a “proposal” distribution to choose potential jumps.

In the next post, I’ll cover pragmatics of running JAGS on a simple example, then look at the performance characteristics.

Radian in not-a-unit shocker

One of the nice things about scmutils is that it tracks units, so you can’t accidentally add 10 seconds to 5 metres.

(+ 
 (& 10 &second)
 (& 5 &meter))
=> Units do not match: + (*with-units* 10 (*unit* SI ... 1)) (*with-units* 5 (*unit* SI ... 1))

When dealing with angles, it initially seems to do the right thing too:

(+
 (& pi/2 &radian)
 (& 90 °ree))
=> 3.141... (ie. its converting everything to radians)

But this is less cool:

(/ (& pi &radian) (& 1 &second))
=> (& 3.141592653589793 &hertz)

Err, pi radians should be 0.5Hz. The trouble is, scmutils treats radians as a unit-less number.

To check whether this was a reasonable thing to do, I checked my old favourite Frink. In frink’s units.txt files, we have the following:


// Alan’s editorializing:
// Despite what other units programs might have you believe,
// radians ARE dimensionless units and making them their own
// unit leads to all sorts of arbitrary convolutions in
// calculations (at the possible expense of some inclarity if
// you don’t know what you’re doing.)
// If you really want radians to be a fundamental unit,
// replace the above with “angle =!= radian”
// (This will give you a bit of artificiality in calculations.)
//
// The radian was actually a fundamental base unit in the SI
// up until 1974, when they changed it, making it no longer
// be a special unit, but just a dimensionless number (which
// it is.) See the definition of the “Hz” below for a
// discussion of how this broke the SI’s definitions of
// basic circular / sinusoidal measures, though.

And down a bit, on the section about hertz, we have:

//
// Alan’s Editorializing: Here is YET ANOTHER place where the SI made a
// really stupid definition. Let’s follow their chain of definitions, shall
// we, and see how it leads to absolutely ridiculous results.

// The Hz is currently defined simply as inverse seconds. (1/s).
// See: http://physics.nist.gov/cuu/Units/units.html
//
// The base unit of frequency in the SI *used* to be “cycles per second”.
// This was fine and good. However, in 1960, the BIPM made the
// change to make the fundamental unit of frequency to
// be “Hz” which they defined as inverse seconds (without qualification.)
//
// Then, in 1974, they changed the radian from its own base unit in the SI
// to be a dimensionless number, which it indeed is (it’s a length divided by
// a length.) That change was correct and good in itself.
//
// However, the definition of the Hz was *not* corrected at the same
// time that the radian was changed. Thus, we have the conflicting SI
// definition of the radian as the dimensionless number 1 (without
// qualification) and Hz as 1/s. (Without qualification.)
//
// This means that, if you follow the rules of the SI,
// 1 Hz = 1/s = 1 radian/s which is simply inconsistent and violates basic
// ideas of sinusoidal motion, and is simply a stupid definition.
// The entire rest of the world, up until that point, knew that 1 Hz needs to
// be equal to *2 pi* radians/s or be changed to mean *cycles/second* for
// these to be reconcilable. If you use “Hz” to mean cycles/second, say,
// in sinusoidal motion, as the world has done for a century, know that the SI
// made all your calculations wrong. A couple of times, in different ways.
//
// This gives the wonderful situation that the SI’s Hz-vs-radian/s definitions
// have meant completely different things in the timeperiods:
//
// * pre-1960
// * 1960 to 1974
// * post-1974
//
//
// Thus, anyone trying to mix the SI definitions for Hz and angular
// frequencies (e.g. radians/s) will get utterly wrong answers that don’t
// match basic mathematical reality, nor match any way that Hz was ever used
// for describing, say, sinusoidal motion.
//
// Beware the SI’s broken definition
// of Hz. You should treat the radian as being correct, as a fundamental
// dimensionless property of the universe that falls out of pure math like
// the Taylor series for sin[x], and you should treat the Hz as being a
// fundamental property of incompetence by committee.
//
// One could consider the CGPM in 1960 to have made the original mistake,
// re-defining Hz in a way that did not reflect its meaning up to that point,
// or the CGPM in 1974 to have made the absolutely huge mistake that made
// the whole system inconsistent and wrong, and clearly broke the definition
// of Hz-vs-radian/s used everywhere in the world, turning it into a broken,
// self-contradictory mess that it is now.
//
// Either way, if I ever develop a time machine, I’m going to go back and
// knock both groups’ heads together. At a frequency of about 1 Hz. Or
// better yet, strap them to a wheel and tell them I’m going to spin one group
// at a frequency of 1 Hz, and the other at 1 radian/s and let them try to
// figure out which one of those stupid inconsistent definitions means what.
// Hint: It’ll depend on which time period I do it in, I guess, thanks to
// their useless inconsistent definition changes.
//
// It’s as if this bunch of geniuses took a well-understood term like “day”
// and redefined it to mean “60 minutes”. It simply breaks every historical
// use, and present use, and just causes confusion and a blatant source of
// error.
//
// In summary: Frink grudgingly follows the SI’s ridiculous, broken definition
// of “Hz”. You should not use “Hz”. The SI’s definition of Hz should be
// considered harmful and broken. Instead, if you’re talking about circular
// or sinusoidal motion, use terms like “cycles/sec” “revolutions/s”,
// “rpm”, “circle/min”, etc. and Frink will do the right thing because it
// doesn’t involve the stupid SI definition that doesn’t match what any
// human knows about sinusoidal motion. Use of “Hz” will cause communication
// problems, errors, and make one party or another look insane in the eyes
// of the other.

Quantum Scheme

I’m doing the Stanford “Quantum Physics for Engineers” online course just now. Separately, a few months ago I was reading the Sussman “Structure And Interpretation of Classical Mechanics” book which is notable for using scheme as a mathematical notation, thereby avoiding a lot of the ambiguities of ‘normal’ maths notation (a big win in Lagrangian mechanics, which makes heavy use of partial derivatives).

Anyhow, the Stanford Quantum course requires you to do various exercises, such as the following:

An electron has a 1nm wavelength. Is it reasonable to treat this electron as an approximately non-relativistic particle (i.e. traveling much slower than the speed of light)?

As usual, this requires plugging the supplied numbers and a bunch of physics constants into the right equation. At school, I would’ve done this by hand – hopefully remembering constants like ‘c’ (3e8 m/s) and h (6.62e-34).

But I can also do this using scheme, as per the SICM book. The ‘scmutils’ library comes with a bunch of built-in constants, with the correct units:

:c
=> (& 299792458. (* &meter (expt &second -1)))

:h
=> (& 6.62606896e-34 (* (expt &meter 2) &kilogram (expt &second -1)))

In scmutils, the ampersand function attaches units to a number.

So now I can use de Broglie’s wavelength relation to find velocity as a function of mass and wavelength:

(define (velocity mass wavelength) (/ :h (* mass wavelength)))

then plug in the appropriate values to find the velocity:

(velocity :m_e (& 1e-9 &meter))
=> (& 727389.4676462485 (* &meter (expt &second -1)))

The question actually asked “can you treat it as non-relativistic” so we want to know if it’s close to the speed of light or not:

(/ (velocity :m_e (& 1e-9 &meter)) :c)
=> 2.43e-3

So it’s much slower than the speed of light, and the answer is “yes, it’s reasonable to treat this as a non-relativistic particle). But thanks to scheme/scmutils, I’m also pretty confident I haven’t made errors with units (because scheme tracked them for me) or constants (because I didn’t have to enter them).

Although not required for this exercise, the scmutils package also handles symbolic differentiation which is pretty nifty! For example:

(define (foo x) (log x))

(foo 'a)
 => (log a)

((D foo) 'x)
 => (/ 1 x)

The scmutils library is very elegant once you realise how it works. The definition of the scheme ‘foo’ function is just that – a scheme function. You can use it in one of two ways. You can pass a number to it – eg. (foo 5) – and it’ll evaluate it numerically – eg. 1.609. Or you can pass that same function a symbol, such as ‘a, and it’ll give you back a symbolic expression – eg. “log a”. It has a built-in simplifier too, as seen here:

(define (addaddadd x) (+ x x x))
=> #| addaddadd |#

(addaddadd 'a)
=> #| (* 3 a) |#

Reading++

A while ago, I wrote an emacs ‘reading mode’. It highlights a single sentence at a time, fading the rest of the text into a gentle grey, and a keypress moves onto the next sentence. It retains the familiarity and consistency of normal text layout, but provides additional cues about the extent of the current sentence.

Tonight, I played with the idea of including smarter parsing into this reading mode. The Stanford Parser parses english sentences. It tells you about the grammatical structure (noun phrases, verb phrases, etc) and dependencies between words. This is just about enough to do what I had in mind – a “superfluous word” highlighter. The whole world is absolutely packed full of so many documents with wholly unnecessary words. Ideally, I’d like to just delete the pointless words. But it’s rare for a word to be completely devoid of semantic meaning. So, my compromise is just to highlight those decorative words – adjectival and adverbial modifiers – which are commonly guilty.

Here’s some examples, not completely perfect, but useful nonetheless:

I REALLY want some SUPER TASTY chocolate.
The system has been VERY CAREFULLY designed, and will cope admirably with all 
  CONCEIVABLE combinations of circumstances.
I wanted to leave my SMALL pond and see HOW I'd fare in a BIG one, with some 
  of the BEST developers in the world.
You define HOW you want your data to be structured ONCE, THEN you can 
  use SPECIAL GENERATED source code to EASILY write and read your STRUCTURED data.

Bias

“Wald applied his statistical skills in World War II to the problem of bomber losses to enemy fire. A study had been made of the damage to returning aircraft and it had been proposed that armor be added to those areas that showed the most damage. Wald’s unique insight was that the holes from flak and bullets on the bombers that did return represented the areas where they were able to take damage. The data showed that there were similar patches on each returning bomber where there was no damage from enemy fire, leading Wald to conclude that these patches were the weak spots that led to the loss of a plane if hit, and that must be reinforced.”

– http://en.wikipedia.org/wiki/Abraham_Wald