The Sun Also Rises

I was walking north in London. I was definitely walking north. My phone map showed I was going north. The road signs pointed straight ahead to go to Kings Cross, which was to the north. I knew I going north. But, in a post-apocalyptic way, I also like trying to navigate by using the sun. It was afternoon. The sun would be setting in the west. I was heading north. Shadows would stretch out to my right. I looked at the shadows on the ground. They were stretching out to the left. Left, not right. Wrong. I stopped dead, now completely unsure which direction I was going and what part of my reasoning chain was broken.

It took me a while, but I figured it out. There were tall buildings to my left blocking the direct sun. On my right was tall glass-fronted offices. The windows were bouncing sunlight from up high down onto the pavement. From the “wrong” direction!

Moral of the story: don’t trust the sun!

NTSB, Coop Bank

My two interests of air crash investigations and financial systems are coinciding today as I read through the Coop Bank annual results. Unlike RBS’s decline in 2008, this isn’t a dramatic story of poorly understood risk lurking behind complex financial instrument, it’s a bit more straightforward. But, since I spent some time picking through the numbers I thought I’d capture it for posterity.

A traditional high-street bank makes money from loans because customers have to pay interest on their mortgages and car loans, hence banks consider loans to be assets. The money which you or I have in our current or instant-access saving accounts, “demand deposits”, are liabilities of the bank. The bank pays out interest to savers. Unsurprisingly, the interest rate on loans is higher than what the bank pays to savers, and the difference (called “net interest income”) is income for the bank which ideally helps increase the banks equity (ie. money owned by the bank, which shareholders have claims on).

At first glance, Coop Bank are doing fine here. They have £15.3bn of loans to people (14.8bn) and businesses (0.4bn). They have £22.1bn of customer deposits [page 16], spread fairly evenly between current accounts, instant savings accounts, term savings accounts and ISAs, being a mixture of individuals (£19.4bn) and companies (£2.7bn). A quick check of their website shows they pay savers around 0.25%, and mortgage rates around something like 3%, which directly gets you to their “net interest income” of £394m from their high-street (aka “retail operations”). So that’s a big bunch of money coming in the door, good news!

(They used to be big into commercial property loans, but by 2014 their £1650m of loans included about £900m which were defaulting, and they sold off the rest and got out of that business)

But every business has day-to-day costs, like rent and staff salaries to pay. Staff costs were £187m which sounds like a lot of money, but a UK-wide bank has a lot of staff – 4266 [page 33] of which 3748 were fulltime and 1018 part-time. That’s an average of £43k each, but it’s not spread evenly – the four executive directors got £4172k between them [page 92], and the eleven non-exec directors got £1052k between them [page 95]. In addition, they paid £121m on contractors [page 178]. So, total staff costs were £300m. Hmm, now that £394m income isn’t looking so peachy. We’ve only got £94m left – let’s hope there’s nothing else we have to pay for.

Oops, forgot about the creaking IT infrastructure! The old IT setup was pretty bad it seems. The bank themselves warned investors in 2015 that “the Bank does not
currently have a proven end-to-end disaster recovery capability, especially in the case of a
mainframe failure or a significant data centre outage.” (page 75). The FCA (Financial Conduct Authority) who regulate banks and check that they are at least meeting some basic minimum standards told the Coop Bank in 2015 that they were in breach of those basic standards. So, they came up with a cunning plan to get off their clunky mainframes and onto a whizzy IBM “managed service platform” which, one would hope, is much shinier and has a working and tested backup solution. All of this “remediation” work wasn’t cheap though, clocking in at £141m for the year. The good news is that the FCA are all happy again and it should be a one-off cost, but we’re looking at loss overall for the year of £47m.

But we’re not done yet! We also have some “strategic” projects on the go, which managed to burn £134m [page 19]. A while back, Coop decided to “outsource” its retail mortgage business to Capita, and then spent a lot of time bickering with them, before finally making up this year. Nonetheless, a planned “transformation” of IT systems is getting canned, the demise of which is somehow costing the bank £82m! At the more sensible end, £10m went into “digital” projects, which I assume includes their shiny new mobile app [page 12]. But all in all, those “strategic” projects means we’re now up to a £181m loss.

Only one more big thing to go. Back in 2009, Coop Bank merged/acquired Britannia Building Society, gaining about £4bn of assets in the form of risky commercial property loans, and some liabilities. Those liabilities included IOUs known as Leek Notes which Britannia had issued to get money in the short-term. When Coop acquired Britannia, there was some accountancy sleight of hand done to make the liability look smaller [page 26 in Kelly Review] but nonetheless a £100 IOU still has to ultimately be paid back with £100, and so now Coop Bank is drudging through the reality of a paying back (aka “unwinding”, gotta love the euphemisms) a larger-then-expected liability. In 2016, that was to the tune of £180m.

So now we’re up to a £361m loss. Chuck in a few more projects like EU Payment Directives, some “organizational design changes” which cost £20m and you get to a final overall loss for the year of £477m.

Now, in the same way that I (as a person) can have money that I own in my pocket, Banks can have money that they (as a company) own – which is their equity. In good times, some of that equity gets paid out to shareholders as a dividend, and some is retained within the company to fund future growth. But in bad times, that equity is eroded by losses. Coop Bank started the year with about £1100m of (tier 1) equity, and the £400m loss has chopped that down to £700m. If you’re losing £400m in a year, £700m doesn’t look like a lot of runway and that’s why they’re trying to sell the business or bit of it, or raise equity by converting bonds to shares or issuing bonds.

Like any business, you’ve got to have more assets than liabilities otherwise your creditors can have you declared insolvent. And Coop Bank certainly has more assets than liabilities. But the loans which make up part of the banks assets are fairly illiquid, meaning they can’t be readily turned into cash. Furthermore, they’re somewhat risky since the borrower might run away and default on the loan. So, in order to be able to soak up defaulting loans and have enough money around to people to withdraw their deposits on demand, banks need to have a certain level of equity in proportion to their loans. You can either look at straight equity/assets, aka leverage ratio, which is 2.6% for Coop Bank (down from 3.8% last year). Or you can do some risk-weighting of assets, and get the Tier 1 Capital ratio of 11% (down from 15%). The Bank of England says that “the appropriate Tier 1 equity requirement …be 11% of risk-weighted assets” so Coop Bank is skirting the edges of that.

All in all, if interest rates stay unchanged and Coop Bank’s loans and deposits stay where they are, then you could imagine a small profit from net interest income minus staff/related costs. But the burden of bad acquisitions, failed integration projects and massive IT overhauls are overshadowing all of that and that’s what’s put Coop Bank where it is today.

The cost of generality?

The nice thing about BUGS/JAGS/Stan/etc is that they can operate on arbitrarily complex bayesian networks. You can take my running ‘coin toss’ example and add extra layers. Imagine that we believe that the mint who made the coin produces coins who bias ranges between theta=0.7 and theta=0.9 uniformly. Now we can take data about coin tosses, and use it to infer not only knowledge about the bias of one coin, but also about the coins made by the mint.

But this kind of generality comes at a cost. Let’s look at a simpler model: we have ten datapoints, drawn from a normal distribution with mean mu and standard deviation sigma, and we start with uniform priors over mu and sigma.

For particular values of mu and sigma, the posterior density is proportional to the likelihood, which is a product of gaussians. However, we can avoid doing a naive N exponentials with a bit of algebra, instead doing a single exponential involving a summation. So, as we add more data points, the runtime cost of evaluating the posterior (or at least something proportional to it) will rise, but only at the cost of a few subtractions/squares/divides rather than more exponentials.

In contrast, when I use JAGS to evaluate 20 datapoints, it does twice as many log() calls as it does for 10 datapoints, so seems not to be leveraging any algebraic simplifications.

Next step: write a proof of concept MCMC sampler which runs faster than JAGS for the non-hierarchical cases which are most useful to me.

JAGS: normal and not-normal performance

Previously, we’ve walked through a coin-tossing example in JAGS and looked at the runtime performance. In this episode, we’ll look at the cost of different distributions.

Previously, we’ve used a uniform prior distribution. Let’s baseline on 100,000 step chain, with 100 datapoints. For uniform prior, JAGS takes a mere 0.2secs. But change to a normal prior, such as dnorm(0.5,0.1), and JAGS takes 3.3sec – with __ieee754_log_avx called from DBern::logDensity taking up 80% of the CPU time, according to perf:


  70.90%  jags-terminal  libm-2.19.so         [.] __ieee754_log_avx
   9.27%  jags-terminal  libjags.so.4.0.2     [.] _ZNK4jags20ScalarStochasticNode10logDensityEjNS_7PDFTypeE
   5.09%  jags-terminal  bugs.so              [.] _ZNK4jags4bugs5DBern10logDensityEdNS_7PDFTypeERKSt6vectorIPKdSaIS5_EES5_S5_

If we go from bernoulli data with normal prior, to normal data with normal priors on mean/sd, it gets more expensive again – 4.8 seconds instead of 3.3 – as the conditional posterior gets more complex. But still it’s all about the logarithms.

Logarithms aren’t straightforward to calculate. Computers usually try to do a fast table lookup, falling back to a series-expansion approach if that doesn’t work. On linux, the implementation comes as part of the GNU libc, with the name suggesting that it uses AVX instructions if your CPU is modern enough (there’s not a “logarithm” machine code instruction, but you can use AVX/SSE/etc to help with your logarithm implementation).

Notably, JAGS is only using a single core throughout all of this. If we wanted to compute multiple chains (eg. to check convergence) then the simple approach of running each chain in a separate JAGS process works fine – which is what the jags.parfit R package does. But could you leverage the SIMD nature of SSE/AVX instruction to run several chains in parallel? To be honest, the last time I was at this low level, SSE was just replacing MMX! But since AVX seems to have registers which hold eight 32-bit floats perhaps there’s the potential to do 8 chains in blazing data-parallel fashion?

(Or alternatively, how many people in the world care both about bayesian statistics and assembly-level optimization?!)

Just for reference, on my laptop a simple “double x=0; for (int i=0; i<1e8; i++) x = log(x);” loop takes 6.5 seconds, with 75% being in __ieee754_log_avx – meaning each log() is taking 48ns.

To complete the cycle, let’s go back to JAGS with a simple uniform prior and bernoulli likelihood, only do ten updates, with one datapoints and see how many times ‘log’ is called. For this, we can use ‘ltrace’ to trace calls to shared objects like log():


$ ltrace -xlog   $JAGS/jags/libexec/jags-terminal  example.jags  2>&1 | grep -c log@libm.so

Rather surprisingly, the answer is not stable! I’ve seen anything from 20 to 32 calls to log() even though the model/data isn’t changing (but the random seed presumably is). Does that line up with the 3.4 seconds to do 10 million steps @ 10 data points, if log() takes 48ns? If we assume 2 calls to log(), then 10e6 * 10 * 2 * 48e-9 = 9.6secs. So, about 2.8x out but fairly close.

Next step is to read through the JAGS code to understand the Gibbs sampler in detail. I’ve already read through the two parsers and some of the Graph stuff, but want to complete my understanding of the performance characteristics.

Performance impact of JAGS

In the previous two posts (here and here) I walked through an example of using JAGS directly to analyse a coin-toss experiment.

I’m interested to learn how the runtime of JAGS is affected by model choice and dataset size, and where the time is spent during evaluation. JAGS is open-source, and written in C++, so it’s quite easy to poke around the innards.

First, let’s do some high level black-box tests. We’ll take the coin-flip example from the previous post, and see how the wall-clock time on my rather old laptop increases as a) we increase the chain length, and b) as we increase the dataset size. My expectation is that both will be linear, since JAGS only uses a single core.

For ten coin flips, and 10/20/30 million steps it takes 3.4/6.9/10.4 seconds without monitors, and 4.5/8.8/12.9 seconds with a monitor. Plugging that into R shows a nice linear relationship, and we can get R to build a linear model for us to stop us having to think too hard:


> t < - read.table(stdin(), header=T)
0: steps time
1: 10000000 3.4
2: 20000000 6.9
3: 30000000 10.4

> lm( time ~ steps, t)
Coefficients:
(Intercept)        steps  
   -1.0e-01      3.5e-07  

In other words, each step takes 0.3 microseconds.

Similarly, if we stick to 10 million steps and no monitors, but increase the dataset size across 10/20/50/100, it takes 3.4/4.2/5.8/8.6 seconds which R also shows is linear albeit with a 3 second intercept.


> t < - read.table(stdin(), header=T)
0: datapoints time
1: 10 3.5
2: 20 4.2
3: 50 5.8
4: 100 8.6
5: 
> lm(time ~ datapoints, t)
Coefficients:
(Intercept)   datapoints  
    3.00408      0.05602  

So this mean that it takes 3 seconds to do a 10 million step walk, and although adding more datapoints makes each step more expensive, it’s only a little bit more expensive – 10 datapoints being about 0.5 seconds more than 1 datapoint. However, if we desired to go to “big data” with, say 10 million data points, we’d be talking about half a million seconds – ie. 11 days. So let’s hope we don’t need 10 million steps on a 10 million point dataset!

Next thing on the list is to understand where all that time is going. For this we can use the lovely perf tools which were added to linux in 2.6:


$ perf record jags example.jags 
Welcome to JAGS 4.2.0 on Sun Feb 26 16:00:26 2017
...
Initializing model
Updating 10000000
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.578 MB perf.data (14737 samples) ]

$ perf report
Overhead  Command        Shared Object        Symbol                                                                                                                           
  33.32%  jags-terminal  libm-2.19.so         [.] __ieee754_log_avx
  19.87%  jags-terminal  basemod.so           [.] _ZN4jags4base15WichmannHillRNG7uniformEv
  11.90%  jags-terminal  libjrmath.so.0.0.0   [.] jags_rbeta
  10.08%  jags-terminal  libm-2.19.so         [.] __ieee754_exp_avx

So this shows that the vast majority of time is being spent calculating logs and exponentials. Wichmann Hill is a pseudo-random uniform number generator. But why would you need exp/log to generate bernoulli distribution using a uniform prior?

Let’s use a debugger to see why it’s calling the log function ..


$ jags -d gdb
(gdb) b __ieee754_log_avx
Breakpoint 1 (__ieee754_log_avx) pending.
(gdb) r example.jags
Starting program: /home/adb/tmp/jags/libexec/jags-terminal example.jags
Welcome to JAGS 4.2.0 on Sun Feb 26 16:14:24 2017
...
Initializing model
Updating 10000000

Breakpoint 1, __ieee754_log_avx (x=16.608779218128113) at ../sysdeps/ieee754/dbl-64/e_log.c:57
57	../sysdeps/ieee754/dbl-64/e_log.c: No such file or directory.
(gdb) bt
#0  __ieee754_log_avx (x=16.608779218128113) at ../sysdeps/ieee754/dbl-64/e_log.c:57
#1  0x00007ffff66abce9 in jags_rbeta (aa=1, bb=0.5, rng=0x63b700) at rbeta.c:102
#2  0x00007ffff690e42e in jags::bugs::ConjugateBeta::update (this=0x63c200, chain=0, rng=0x63b700) at ConjugateBeta.cc:157
#3  0x00007ffff7b8b464 in jags::ImmutableSampler::update (this=0x63c170, rngs=std::vector of length 1, capacity 1 = {...}) at ImmutableSampler.cc:28

Our uniform prior is equivalent to a beta(1,1) prior, and since beta and bernoulli distributions are conjugate, our posterior will be a beta distribution. For Gibbs sampling, each “jump” is a draw from a single parameter conditional distribution – and since we only have one parameter theta, each “jump” sees us draw from a beta distribution.

Of course, we could’ve used this fact to calculate the posterior distribution algebraically and avoid all of this monkeying about with MCMC. But the purpose was to explore the performance of the JAGS implementation rather than solve a coin-toss problem per-se.

In the next article, I’ll look at the performance cost of switching to other distributions, such as normal and lognormal.