{"id":1388,"date":"2017-02-26T18:01:40","date_gmt":"2017-02-26T17:01:40","guid":{"rendered":"http:\/\/www.nobugs.org\/blog\/?p=1388"},"modified":"2017-02-26T18:45:06","modified_gmt":"2017-02-26T17:45:06","slug":"jags-normal-and-not-normal-performance","status":"publish","type":"post","link":"https:\/\/www.nobugs.org\/blog\/archives\/2017\/02\/26\/jags-normal-and-not-normal-performance\/","title":{"rendered":"JAGS: normal and not-normal performance"},"content":{"rendered":"<p><a href=\"http:\/\/www.nobugs.org\/blog\/archives\/2017\/02\/26\/no-bugs-instead-jags\/\">Previously<\/a>, we&#8217;ve walked through a coin-tossing example in JAGS and looked at the runtime performance.  In this episode, we&#8217;ll look at the cost of different distributions.<\/p>\n<p>Previously, we&#8217;ve used a uniform prior distribution.  Let&#8217;s baseline on 100,000 step chain, with 100 datapoints.  For uniform prior, JAGS takes a mere 0.2secs.  But change to a normal prior, such as  dnorm(0.5,0.1), and JAGS takes 3.3sec &#8211; with __ieee754_log_avx called from DBern::logDensity taking up 80% of the CPU time, according to perf:<\/p>\n<pre><code>\r\n  70.90%  jags-terminal  libm-2.19.so         [.] __ieee754_log_avx\r\n   9.27%  jags-terminal  libjags.so.4.0.2     [.] _ZNK4jags20ScalarStochasticNode10logDensityEjNS_7PDFTypeE\r\n   5.09%  jags-terminal  bugs.so              [.] _ZNK4jags4bugs5DBern10logDensityEdNS_7PDFTypeERKSt6vectorIPKdSaIS5_EES5_S5_\r\n<\/code><\/pre>\n<p>If we go from bernoulli data with normal prior, to normal data with normal priors on mean\/sd, it gets more expensive again &#8211; 4.8 seconds instead of 3.3 &#8211; as the conditional posterior gets more complex.  But still it&#8217;s all about the logarithms.<\/p>\n<p>Logarithms aren&#8217;t straightforward to calculate.  Computers usually try to do a fast table lookup, falling back to a series-expansion approach if that doesn&#8217;t work.  On linux, the implementation comes as part of the GNU libc, with the name suggesting that it uses <a href=\"https:\/\/en.wikipedia.org\/wiki\/Advanced_Vector_Extensions\">AVX instructions<\/a> if your CPU is modern enough (there&#8217;s not a &#8220;logarithm&#8221; machine code instruction, but you can use AVX\/SSE\/etc to help with your logarithm implementation).  <\/p>\n<p>Notably, JAGS is only using a single core throughout all of this.  If we wanted to compute multiple chains (eg. to check convergence) then the simple approach of running each chain in a separate JAGS process works fine &#8211; which is what the <a href=\"https:\/\/artax.karlin.mff.cuni.cz\/r-help\/library\/dclone\/html\/jags.parfit.html\">jags.parfit<\/a> R package does.  But could you leverage the SIMD nature of SSE\/AVX instruction to run several chains in parallel?  To be honest, the last time I was at this low level, SSE was just replacing MMX!  But since AVX seems to have registers which hold eight 32-bit floats perhaps there&#8217;s the potential to do 8 chains in blazing data-parallel fashion?<\/p>\n<p>(Or alternatively, how many people in the world care both about bayesian statistics and assembly-level optimization?!)<\/p>\n<p>Just for reference, on my laptop a simple &#8220;double x=0; for (int i=0; i&lt;1e8; i++) x = log(x);&#8221; loop takes 6.5 seconds, with 75% being in __ieee754_log_avx &#8211; meaning each log() is taking 48ns.<\/p>\n<p>To complete the cycle, let&#8217;s go back to JAGS with a simple uniform prior and bernoulli likelihood, only do ten updates, with one datapoints and see how many times &#8216;log&#8217; is called.  For this, we can use &#8216;ltrace&#8217; to trace calls to shared objects like log():<\/p>\n<pre><code>\r\n$ ltrace -xlog   $JAGS\/jags\/libexec\/jags-terminal  example.jags  2>&1 | grep -c log@libm.so\r\n<\/code><\/pre>\n<p>Rather surprisingly, the answer is not stable!  I&#8217;ve seen anything from 20 to 32 calls to log() even though the model\/data isn&#8217;t changing (but the random seed presumably is).  Does that line up with the 3.4 seconds to do 10 million steps @ 10 data points, if log() takes 48ns?  If we assume 2 calls to log(), then 10e6 * 10 * 2 * 48e-9 = 9.6secs.  So, about 2.8x out but fairly close.<\/p>\n<p>Next step is to read through the JAGS code to understand the Gibbs sampler in detail.  I&#8217;ve already read through the two parsers and some of the Graph stuff, but want to complete my understanding of the performance characteristics. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Previously, we&#8217;ve walked through a coin-tossing example in JAGS and looked at the runtime performance. In this episode, we&#8217;ll look at the cost of different distributions. Previously, we&#8217;ve used a uniform prior distribution. Let&#8217;s baseline on 100,000 step chain, with 100 datapoints. For uniform prior, JAGS takes a mere 0.2secs. But change to a normal [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1388","post","type-post","status-publish","format-standard","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/1388","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/comments?post=1388"}],"version-history":[{"count":3,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/1388\/revisions"}],"predecessor-version":[{"id":1391,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/1388\/revisions\/1391"}],"wp:attachment":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/media?parent=1388"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/categories?post=1388"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/tags?post=1388"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}