Computational algebra

I’ve been studying electromagnetism recently, and consequently been doing more maths-by-hand. Every time I do this, I think about using computer algebra systems to check my working and help me with calculus. But the only computer algebra system I’ve used was back at uni (Maple, I think) – and whilst Mathematica looks whizzy it’s also quite pricey. So I thought I’d give a few ‘free’ ones a go.

I have a particular ‘test’ problem in mind, because I’ve just worked through it by hand whilst reading Feynman Vol 2. It’s calculating the Laplacian of a radially symmetric potential – which involves a nice mixture of partial and total derivatives, chain rules and product rules. Turns out, I do actually remember how to do all this stuff, but having done it by hand it makes a nice concrete example for doing it on computer.

Another motivation is that I hate the imprecise notation that physicists and some mathematician use. What is the derivative of phi(r) with respect to x? My brain says that phi is a unary function with parameter r, so the only thing that can cause it to change is changes in it’s input r. In physics you have to ‘understand’ that this means phi is really a function over R^3 (space) that’s defined like phi(x,y,z) = g(sqrt(x^2+y^2+z^2)) with the helper g(r)=… telling you how things change with distance. And quite often, phi will also quietly become a function of time too. In SICM they fix this problem by using a scheme-based computer algebra system. Here I’m trying to do the same thing but using a more mainstream maths package.

I just ran sagemath via docker, rather than worry about ‘installing it’:

docker run -it sagemath/sagemath:latest

And so we start to tell it about our world …

%display latex

# We'll use g(r) as our "how things change with distance" function, and we'll leave it abstrac

g = function("g")

# Now we can define phi to be a concrete function over all space that delegates to g

phi(x,y,z) = g(sqrt(x^2+y^2+z^2))
# result
# x^2*D[0, 0](g)(sqrt(x^2 + y^2 + z^2))/(x^2 + y^2 + z^2) - x^2*D[0](g)(sqrt(x^2 + y^2 + z^2))/(x^2 + y^2 + z^2)^(3/2) + D[0](g)(sqrt(x^2 + y^2 + z^2))/sqrt(x^2 + y^2 + z^2)

Assuming that D[0] means ‘derivative with respect to 0th arg’ then this is right but boy is it ugly!

Specifically, we’re not giving a short name ‘r’ to the expression sqrt(x^2+y^2+z^2). We can try to cajoule sagemath along:

r = var('r')
phi.differentiate(x,2).subs({sqrt(x^2+y^2+z^2): r})

Sadly that only improves the g(r) expressions, but not the usages in the denominators.

Perhaps we should tell it about r^2 rather than r?

phi.differentiate(x,2).subs({(x^2+y^2+z^2): r^2})

This does affect the denominator, but weirdly leaves lots of sqrt(x^2) terms. Odd.

We can blunder on ahead and complete the Laplacian:

(phi.differentiate(x,2) + phi.differentiate(y,2) + phi.differentiate(z,2)).simplify_full()

.. which yields the right answer, but again in a not-very-simplified form. Still, this is a pretty impressive feat. It takes a lots of pencil + paper to do these three second derivatives, chain rules and everything else. Doing it in one line shows that sagemath is a pretty useful maths assistant!

So far, we’ve just modelled phi as a three-arg function – but sagemath actually knows what scalar and vector fields are, and can already do operations on them!

from sage.manifolds.operators import *
E.<x,y,z> = EuclideanSpace()
g = function("g")
phi = E.scalar_field(g(sqrt(x^2+y^2+z^2)), name='phi')

… which gives the same answer as above, but now we’ve got “batteries included” and can calculate div/grad/curl and all that.

So this is now good enough to solve electrostatic problems, where the laplacian of the scalar potential is proportional to charge density. For dynamic cases however, we’ll need a function of x,y,z and also t. Not sure how that’ll work with the manifold support. Do I need a function from time to scalar fields? A field of time-to-scalar functions? A 4d spacetime structure?

This page is working with E/B fields using 4d manifolds. I’ve done courses on special relativity, so I can handle spacetime but the language of differential geometry (manifolds, 2-forms) is beyond me currently (although Sussman of SICP fame has a similarly computational approach to it). Perhaps it’s better to leave the sage.manifolds alone for now and return to explicit x/y/z/t approach (hey, it was good enough for Maxwell!).


Voltage: undefined

When I was little, I played around with batteries and wires and various electrical things that I’d taken to bits – enough to both burn my fingers (from shorting the battery) and give myself shocks (using a transformer to transiently step up my 9v battery to much higher voltages). Consequently, I was familiar with the word “voltage” and the vague hand-wavy description of it “being like water pressure” or “being like height”.

But then when I learned physics, I found that the primary focus was on the electric (really electrostatic) field – being a 3d vector field. Voltage aka potential difference made an appearance here, thanks to the nice property that for electrostatic fields you can think in terms of a scalar field (“the potential”), with the electric field being its gradient.

But it seemed like a big gulf between the “circuit theory viewpoint” and the “electrostatic viewpoint”. For example, if I take a lump of neutral material and grab some of its electrons and move them over a bit, I end up with a positively charged lump and a negative lump, and the surrounding E-field is a particular curvy shape (aka a “dipole field”). If I then think of this as a battery, and try to connect some wires to it, then my “circuit theory” viewpoint says that the electric field inside the wire must point along the wire the whole way – even if I tangle the wire into crazy knots. But how do we get from the nice dipole field to the twisty turny field required to follow the wire? It seems like two different worlds. Fortunately, there is an explanation – just not one that my Uni physics book mentioned. The best description of this comes from Matter and Interaction textbook (eg. here). When the wires are connected, there is an initial transient in which charges are displaced by the E-field, in such a way that a charge imbalance is created on the surfaces of the wires which causes the local E-field to reorient itself along the wire.

This is good news, because it means that you can indeed use all your electrostatic intuition of voltage and electric fields when dealing with circuits with batteries and resistors. The electric vector fields are all of the simple variety where you can succinctly summarise them using scalar fields. One key consequence of these ‘simple’ fields is that the work done by (or against) the electric field between two points is the same regardless of the route you take. This means we can talk about “the” voltage between point A and point B even when there’s multiple paths between them, because even though the E-field might be different along different paths, the net work done by (or against) the E-field along the entire path is always the same. Going one step further, if we agree to use a particular point (often labelled “ground”) as our reference point, we can even talk about “the voltage at point A” – but we understand that to mean “the potential difference between A and our chosen reference point”.

Voltmeters essentially rely on this “path independence” to do their job. In our nice electrostatic world, it doesn’t matter that we connect our voltmeter to points A and B using long winding leads, because even though the E-field might vary in space, we KNOW that the net work done by the E-field on a charge going through our voltmeter is the same as would be done by a charge going through the resistor (or whatever component we’re using the voltmeter to measure). Note: most measuring devices are named alluringly to suggest they directly measure the property you care about, when in practise they measure something else and use it to infer the value of interest. For example, an analog voltmeter actually measures the current through a known resistance, and we trust/hope that this is the same as the “volts” across whatever we’ve connected the leads to).

However, this lovely simple land of electrostatic fields with their path-independence isn’t the whole story. You can also create an electric field from a time-varying magnetic field (Faraday’s Law) and they’re of a very different nature – they form loops, quite unlike electrostatic fields which never form loops. With this kind of field, the work done going from point A to point B now DOES depend on exactly which path you choose. These fields cannot be represented as the gradient of a scalar potential. Consequently, we lose the lovely simple world in which we could think of point A as just having a single number (the potential) and point B having some other number/potential, and the work done by moving between then is just potentialB-potentialA – all of this is gone!

We do still have some useful properties – we can relate the work done around any closed path to the changing magnetic field (its flux) over a surface ‘capping’ that loop. But the notion of voltage as some path-independent value between point A and point B is lost. If we connect a “voltmeter” to points A and B, we’ll measure something pertaining to the path taken by the voltmeter’s leads and the voltmeter itself. But out ability to say “and that will be the same as any other path (such as through the component under test)” is gone. If we physically move the voltmeter and its leads in space, we will get a different reading (depending on the changing flux within the ‘loop’ consisting of the voltmeter and its leads and the circuit under test) – even though it is still connected to the same pointA and pointB in the circuit.

This paper is a great summary of what’s going on, and I was relieved to finally find it after many years to failing to reconcile my “physics” and “circuit” viewpoints! This other paper is also very good.

Incidentally, this makes it clear what Kirchhoff’s voltage law is coming from. It’s just a restatement of the properties of an electrostatic field – if you go from pointA to pointA in an electrostatic field, there’s no work done on a charge. But Kirchhoff came up with his rule long before anyone started thinking in terms of electric fields. It was also well before Faraday started futzing with induction and creating electric fields for which that property doesn’t hold. Essentially, Kirchhoff’s voltage law is a special case of Faraday’s law when there is no changing magnetic flux. If there is a changing magnetic field, the work done around a loop (the emf) is non-zero and Kirchhoff’s Law no longer holds.


COVID + Hydroxychloroquine

I’m not a medic, but I’ve worked with data, analysis and experiments for a while. This blog post is a “what’s going on” summary of hydroxychloroquine use in treating COVID-19.

Hydroxychloroquine (HCQ for short) is a medicine that helps treat a few different conditions, like malaria and arthritis. Maybe it’ll help COVID too? Noone was sure, so many hospitals around the world started trying it and recording the outcome.

The question we want to answer is: does giving this medicine 1) help people, 2) harm people, or 3) does nothing at all when given to COVID patients. If it does have an effect, it might have a big effect or a small effect. The effect might be different in different patients, possibly due to age, genetics (including gender, race/ethnicity) and existing health conditions. We always start in a “we don’t know” state, and use data to end up in one of the three answers.

What’s happened recently is that we went from “don’t know” to “it harms people”, based on one research group’s analysis of some hospital data. But, it looks like their analysis might not be done to a high enough standard. So we’re shifting back towards “don’t know”. The fact that the evidence that “it harms” has gone away does NOT mean that “it helps”. Lack of evidence is not evidence of lack. It just takes us back to “we don’t know”.

So how do try to answer the help/harms question? The ideal thing to do would be a “randomized control trial”, where we find a group of patients suffering from COVID and then randomly select half them to receive HCQ. This approach gives you the best ability to measure the true effect that HCQ has. However, this is not what’s happened in this story. Randomised controlled trials are slow to set up – you usually need to find consenting volunteers. COVID is more like war-time – doctors are trying lots of things quickly. They are tracking the outcomes, but they’re basing their decision on whether to give HCQ on their experience and belief that it’s the best course of action, rather than the coin-flip of a randomized control trial. Everyone wants the certainty of a randomized controlled trials (and the authors of the controversial paper explicitly call for one). But all we have just now is “observational data” – a record of who got what and when, and what the outcome was.

So can we use the outcome data to answer our question? To get enough data to answer the question, we need access to data from more than one hospital. Hospitals are rightly careful about sharing patient data so this isn’t an easy task. Fortunately, some companies have put in the effort to get contracts signed with several hospitals around the world and so the human race can potentially benefit from insights that are made possible by having this data in one place. One such company is Surgisphere. Surgisphere (and their legal team) have got agreements with 671 hospitals around the world. This gives them access to data about individual patients – their age/gender/etc as well as medical conditions, treatments they’ve received and outcomes.

Surgisphere therefore have a very useful dataset. For now, let’s assume that they’ve managed to pull all this data together without making any systematic mistakes (for example, some countries measure a patients height in centimetres whereas other might use inches – would Surgisphere have noticed this?).

Within Surgisphere’s dataset, they had information about 96032 patients who tested positive for covid. Of those patients, it so happens that the various hospitals had chosen to give HCQ (or chloroquine) to 14,888 patients. The dataset doesn’t tell us specifically why those 14888 got given HCQ – presumably the doctors thought it was their best option at the time based on the patient’s condition, age, weight etc.

Naively, you might expect that we could just compare the death rate in patients who got HCQ (those who we given the drug) with the death rate in patient who didn’t receive HCQ and see if it’s different.

Unfortunately, it’s not that simple. I’ll explain why shortly, but one key message here is “statistical data analysis isn’t simple, there’s a bunch of mistakes that are easy to make, even if you do this a lot”. Consequently, it’s important that people “show their working” by sharing their dataset and analysis so that others can check whether they’ve made any mistakes. If other people don’t have access to the same raw data, they can’t check for these easy-to-make mistakes – and lots of papers get published every year which end up being retracted because they made a data analysis mistakes. Sharing raw data is hard in a medical setting – Surgisphere’s contracts with hospitals probably don’t allow them to share it. But without the raw data being shared and cross-checked by others, it’s reasonable to expect that any analysis has a good chance of having some flaws.

Why can’t we simply compare death rates? It’s because something like your age is a factor in both your chance of dying and whether you end up receiving HCQ from a doctor. Let’s assume for a moment that COVID is more deadly in elderly people (it is). Let’s also assume that doctors might decide the HCQ was the best treatment option for older people, but that younger people had some other better treatment option. In this scenario, even if HCQ has no effect, you’d expect the HCQ-treated patients to have a higher death rate than non-HCQ patients, simply due to their greater age. This kind of mixup is possible to try and fix though – if we know patient ages, we can make sure we’re comparing (say) the group of 80 year olds who got HCQ against the group of 80 year olds who didn’t get HCQ. We’ll look at some of the difficulties in this approach shortly.

The same reasoning applies for other patient factors like gender/race/ethnicity, existing health conditions etc. It also applies to other things that might influence patient outcome, such as what dose of HCQ was given, or how badly ill a patient was when they received HCQ. In an ideal world, we’d have data on all of these factors and we’d be able to adjust our analysis to take it all into account. But the more factors we try to take into account, the larger the dataset we need to do our analysis – otherwise we end up with just 1 or 2 patients in each ‘group’.

The whole dataset itself can easily be skewed. The hospitals which gave Surgisphere their data might all be expensive private hospitals with fancy equipment and good connections to whizzy American medical corporations, whereas hospitals in poorer areas might be too busy treating basic needs to worry about signing data sharing contracts. Private hospitals are more likely to be treating affluent people who suffer less from poverty-related illness. We can try to correct for known factors (like existing medical conditions) in our data analysis, but if the selection of hospitals itself was skewed then we’re starting the game with a deck stacked against us.

One worry is always that you can only adjust for factors that are mentioned in your dataset. For example, let’s suppose asthma makes COVID more deadly (I’m making this up as an example) but that our dataset did not provide details of patient asthma. It might the case that all patients with asthma all ended up in the HCQ group (could happen if some alternative treatment was available but known to be not-safe if you have asthma). But if our dataset doesn’t tell us about asthma, we just see that, overall, more HCQ patients died. We wouldn’t be able to see that this difference in death was actually due to a common underlying factor. We might WRONGLY go on to believe that the increased death rate was CAUSED by HCQ, when actually all that happened was higher-risk patients had systematically ended up in the HCQ group.

Back to the story: our plan is to try to pair up each patient in the HCQ group with a “twin” in the non-HCQ group who has exactly the same age, weight, health conditions etc. Doing so allows us tease apart the effect of age/weight/etc from the effect of getting given HCQ. But we almost certainly won’t find an “exact twin” for each HCQ patient – ie. someone who matches on ALL characteristics. Instead, we typically try to identify a subset of non-HCQ patients who are similar in age/weight/etc to the group of patients who were give HCQ. (This is called “propensity score matching analysis”).

The important work here is “try”. There’s usually not a good way to check whether you’ve done a good job here. I might do a rubbish job – perhaps the subset of non-HCQ patients I pick contains way more smokers than are in the HCQ group. We hope that our dataset contains all the important characteristics that allow us to make a genuinely representative set, but if it doesn’t then any comparisons we make between the HCQ group and our non-HCQ “twins” will not be telling us solely about the effect HCQ has. This is the fundamental problem with observational studies, and the only real solution is to do a randomised trial. (BTW, all of economics is based on observational data and suffers this problem throughout).

That’s enough stats details. The main point is that this kind of analysis is hard, and there’s a number of choices that the researcher has to make along the way which might be good or bad choices. The only way to check those choices is to have other experts look at the data.

This brings us to the objections that were raised against this initial publication. There are three kinds of objections raised:

1. The “we know it’s easy to make mistakes, and sharing data is the best way to catch mistakes” stuff. (objection 2). There’s no implication of malicious intent here; Surgisphere need to honour their contracts. But the societal important of understanding COVID is so high that we need to find ways to meet in the middle.
2. The “despite not releasing your raw data, there’s enough data in your paper that we can already spot data mistakes” (objection 5,6,7,8,9). Things like “the reported average HCQ dose is higher than the US max dose, and 66% of the data came from the US”. Or “your dataset says more people died in australia from covid than actually died”. It just doesn’t smell right. If you can spot two mistakes that easily, how many more are lingering in the data.
3. The “you skipped some of the basics” objections – no ethics review, no crediting of hospitals that gave data (objection 3+4)
4. The “you’ve not done the stats right” stuff – (objections 1 and 10)

None of this means that the researchers were definitely wrong; it just means they might be wrong. It also doesn’t mean the researchers were malicious; countless papers are published every year which contain errors that are then picked up by peers. To me that’s a science success – it help us learn what is true and false in the world. But it does mean that a single scientific paper that hasn’t been reproduced by other groups is “early stages” as far as gaining certainty goes.

The best way to know for sure what HCQ does to COVID patients is to run a controlled trial, and this had already started. But if you believe there’s evidence that HCQ causes harm, then ethically you would stop any trial immediately – and this is what happened (WHO trial and UK trial were both paused). But now the “evidence” of harm is perhaps not so strong, and so perhaps it makes sense to restart the controlled trials and learn more directly what the effect of HCQ on COVID patients actually is.


Hydrogen Atom 2

Bohr’s 1913 paper which presented the idea of electrons “jumping” between fixed orbitals was a huge step forward, although its predictions only worked for single-electron hydrogen atoms and did not predict the correct wavelength of spectral lines for more complex atoms.

The world that Bohr grew up in was based on Newton’s mechanics (which explained how particles accelerate due to net forces) and the force of gravity and Maxwell’s electromagnetism along with statistical explanations of heat. But Bohr could see that those “rules” were wrong in some way – they predicted that the hydrogen electron (being an accelerating charge) would cause EM waves thereby losing energy and spiralling into the nucleus. Since this didn’t actually happen, it was clear to Bohr that new rules would be needed. But he didn’t rip up the whole rulebook – after all, the existing rules had done a good job of explaining all sorts of other phenomena. Instead he looked to add a minimal set of new rules or postulates and keep the rest of existing physics “in play”. He chose to retain the Rutherford picture of orbiting electrons, where electrons are like little planets with known mass, velocity and position at all times. To this, he added the new rule that electrons orbited in circles, and the angular momentum of the electron was only allowed to take on discrete values.

To stay in a circular orbit at some distance, there’s only one velocity that works (any other velocity gives an elliptical orbit). Since mass is fixed, and the orbital radius and velocity are interrelated, this means that discrete angular momentum only allow discrete orbits each with a specific radius and velocity and therefore kinetic and potential energy. Specifically, in the first allowed orbit, the electron is moving at about 1/137th the speed of light, the orbital radius is 0.05nm and the energy is -13.6eV (the zero point is taken to be an electron very far away).

How far does this model get us in terms of explaining our experimental data? It describes the hydrogen lines well – the visible Balmer lines are understood to due to electrons “jumping” to the 2nd lowest orbit from the 3rd/4th/5th/etc orbits. But it doesn’t explain what happens in multi-electron atoms like Helium. Nor does it explain why some lines are more intense than others. It doesn’t explain the Zeeman effect splitting of line. And finally, it is not a general explanation of how particles move in the presence of forces: it only describes the special case of a negative charge moving in a central electric field caused by the positive charge of the nucleus. It doesn’t tell you how a free electron would move, nor an electron in a linear electric field. Finally, even the foundations are flawed – the choice to explain the discrete energy levels in terms of discrete angular momentum isn’t right – we now know that the ground state of hydrogen has zero angular momentum, not the ? amount that Bohr modelled.

But still, it was a huge breakthrough – making it clear that the explanation of atom-level phenomena was going to require a fresh set of rules.

Bohr’s choice to focus on circular orbits was curious, since every physicist is familiar the fact that particles in a central inverse-square force move in elliptical orbits in general. Consequently, Sommerfeld tried to extend Bohr’s reasoning to include elliptical orbits, guided by the requirement that the resulting orbits still needed to have the discrete Bohr energies necessary to cause the hydrogen spectral lines. Sommerfeld realised that the eccentricity (the shape of the ellipse) had to also be quantised to achieve this. But initially, this extra step didn’t seem to yield anything useful except more complexity – it just gave the same ‘jumps’ as Bohr although there were now many more ways to achieve them. You now need two ‘quantum numbers’ to describe the orbital – Bohr’s original ‘n’ and Sommerfelds new ‘l’ but since the energy of the orbital is determined by ‘n’, what’s the point? Who cares if there’s a few different shapes of orbital if they all have the same energy, and it’s the energy we care about.

However, the nice things about elliptical orbits is that they’re not symmetric – the electron moves more in the long axis of the ellipse than the short, and creates the possibility of explaining the Stark and Zeeman effect as being the interaction of this motion with the direction of electric and magnetic fields. This gives a hint that Sommerfeld might’ve been onto something, but in the early days it was definitely just a “guess with some hope.

Bohr’s circular orbits imply that there is an ‘orbital plane’ and therefore a special distinguished axis. If you had a 100 hydrogen atoms, you might expect them to be randomly aligned. But since a charge moving in a circle causes a magnetic field, you could also argue that they might tend to line up with each other. Or, if you applied a strong external magnetic field, you could cause the axes to all align in a single direction. Or if you fired the atoms through an inhomogenous magnetic field, the amount they were deflected would tell you about the angle their axis made with the magnetic field direction.

However, Sommerfeld’s work added something surprising. Sommerfeld tried to generalize Bohr’s one-parameter circle orbits to two parameters (to allow for ellipses) and then three (to allow for ellipses oriented in 3d space) whilst retaining the spirit of Bohr’s quantization condition for angular momentum. What he found was, rather confusingly, that in 3d space the quantization condition only allowed for elliptical orbit planes in particular orientations. This seems very odd, since it presupposes that there is some ‘preferred’ direction in the universe against which these allowed orientations are measured. (Skipping ahead, we now understand this in terms measurement in a chosen axis, but with the particle state being in general a superposition of the possible basis states, but the idea of superpositions of quantum states was several years in the future). Weird as it may sounds, it’s nonetheless a prediction that you can design an experiment to test. A charge orbiting in a plane acts like a little magnet. If you fire a suitable atom through an inhomogenous field, they get deflected by an amount related to the alignment of the “little magnet” with the inhomogenous field. If the electrons really could only live in discrete orbital planes, the atoms ought to get deflected in a few discrete directions. If the electrons could live in any orbital plane, you’d get a continuous spread of deflections.

If you think the idea that orbital planes can only exist in certain orientations relative to an arbitrary choice of axis sounds, well, wrong – then you’re not alone. Even Debye, who had also derived the same idea, said to one of the people proposing to actually measure it “You surely don’t believe that [space quantization] is something that really exists; it is only a computational recipe”. In other words, even to the people who came up with the idea it was little more than a utilitarian heuristic – a mathematical procedure that got the right answers by a wrong route. Even Stern, one of the experimenters, later said he performed the experiment in order to show that the whole idea was incorrect. And his supervisor, Born, told him there was “no sense” in doing the experiment. Furthermore, according to classical physics when you put ‘little magnets’ into an external magnetic field, they precess around the axis of the magnetic field rather than doing any kind of ‘aligning’.

At this point in history, a rather surprising thing happens. We now know that Bohr/Sommerfeld’s prediction of the magnetic moment and angular momentum was wrong – they predicted it was ? whereas we now know it is zero. But Stern and Gerlach, who performed the inhomogeous magnetic field experiment, didn’t know that. Had that been the full story, they would’ve found no deflection. But in fact, they found that their beam of atoms did split nicely into two. What they didn’t know about – noone knew at that time – was that electrons have an intrinsic magnetic moment of their own that can take on two values. This electron “spin” was the mechanism that produced their observed result. But, being unaware of spin, they wrongly concluded that they had demonstrated the reality of Sommerfeld’s “space quantization” – in fact, they had demonstrated a different kind of quantization.

(Interestingly, although most descriptions focus on angular momentum as the important concept, Stern’s own nobel lecture doesn’t mention angular momentum at all. It only talks about the magnetic moment. There’s an assumption implicit that magnetic moments are what you get when you have charge and angular momentum, but since it’s the magnetic moment that determines the deflection in the Stern-Gerlach experiment I, like Stern, prefer to talk about magnetic moments and leave it for someone else to worry about how that magnetic moment comes about).

So where does that leave Sommerfeld’s ellipses? They’re still supported both by their ability to explain the Stern and Zeeman effect (partially) and also for the fact that Sommerfeld also calculated a relativistic correction for his elliptical orbits which made the prediction of spectral line wavelengths match experimental data slightly more accurately (in Bohr’s circular orbits, the electrons travel at c/137 or gamma=1.00002, and the speed will be higher in ellipses that do “close passes” to the nucleus, so you start to get close to the point where special relativity starts making an impact).

Spin now enters the picture, as a highly “unclassical” concept. The story starts with simple pattern spotting. In 1871, Mendeleev organised the known elements into a table based on their chemical properties. He didn’t know it at the time but he’d stumbled upon the sequence of atoms with increasing number of electrons, and the groups he perceived gained their commonality through having the same number of electrons in their outermost shells. But several steps were required to make this connection. Firstly, the Bohr model gave the idea of discrete orbits each with different energy. Then Sommerfelds elliptical orbits gave several different alternative shapes for a particular energy of orbit (“degeneracy”). A paper by Stoner in 1924 made a connection between the number of spectral lines of an element (once degenerate states had been split out using the Zeeman effect) and the number of electrons in the subsequent noble gas. (Stoner’s career prior to this point had been rather desperate). This observation lead Pauli to realise that a simple rule of “only one electron is allowed in each quantum state” was possible, but only if an extra two-valued quantum number was used. Initially Pauli didn’t offer up any explanation of what this two-valued thing was. Goudsmit and Uhlenbeck subsequently proposed that it could be caused by the electron spinning around its own axis, something which was later shown to be wrong (electrons seem to have no size, at least every attempt to measure their size finds it smaller than we can measure, and so to create enough angular momentum the tiny tiny spinning top would have to rotate very quickly, such that its surface would be going faster than the speed of light). But although the picture was wrong, the idea that electrons have their own intrinsic two-valued angular momentum and magnetic moment is correct – as, in fact, the Stern Gerlach experiment showed.

Like Sommerfeld’s ellipses, the two possible electron spin states don’t have much effect on the energy – it’s still dominated by the original Bohr ‘n’. But spin does make small changes to the energy. A particle with spin is like a small magnet, and a small magnet orbiting a positive nucleus has an electromagnetic interaction – Lamour interaction and Thomas precession. This causes small changes to the orbit energy, resulting in splitting of spectral lines – a processes now named “spin-orbit interaction”. Sommerfeld’s ellipses cause a

But how was Pauli to incorporate his new “two valued” quantity into the Bohr-Sommerfeld model. It seems that he didn’t. Pauli published his exclusion principle in January 1925. Heisenberg wrote his matrix mechanics paper in July 1925, and Schrodinger published his wave mechanics in 1926. These approaches were much more general than the Bohr-Sommerfeld approach – a genuine ‘mechanics’ explaining how particles evolve over time due to forces. In 1927, Pauli formulated the “Pauli Equation” which is an extension of the Schrodinger equation for spin-1/2 particles that takes into account the interaction between spin and external electromagnetic fields.

Although initially the Heisenberg and Schrodinger approach looked very different, Dirac was able to show that both are just different realisations of a kind of vector space, and that quantum mechanics was a big game of linear algebra which didn’t care if you thought of those vectors were ‘really functions’ or not. Dirac was happy to go somewhat off-piste mathematically, using his “Dirac delta” functions which are non-zero except at a point yet their integral is one. His work was followed up by von Neumann whose book took a more formal rigorous mathematical approach, objecting to Dirac’s use of “mathematical fictions” and “improper functions with self-contradictory properties”. The approach is much the same, but the foundations are made solid.

In the Schrodinger picture, a particle is described by a complex-valued wave function in space. The Schrodinger equation shows how the wave evolves in time, as a function of the curvature of the wave and a term describing the spatial potential. In the case where a particle is constrained within a potential well, such as an electron experiencing the electrostatic attraction of a nucleus, the waves form ‘stationary’ patterns (the wave continues to change phase over time, but the amplitude is not time-dependent). In a hydrogen atom, the stationary states in three dimension are combinations of radial, polar and azimuthal half-waves which result in amplitudes that vary spatially but not with time. The radial, polar and azimuthal contributions match up with the three quantum numbers from the Bohr model (n,l,m) reflecting the fact that the Schrodinger approach is much more general – the Bohr model “falls out” as being the special case of a single particle in a central electrostatic field.

As is often the case, although the Schrodinger equation is very general, only a few simple symmetric cases (such as the Hydrogen atom) result in a nice compact mathematical expression. For more complex cases, one can do numeric simulation (ie. rather than viewing the Schrodinger equation as stating a criteria for a solution in terms of it’s time derivative and spatial curvature, you can view it as an algorithm for evolving a function forward in time). Alternatively, one can apply perturbation methods, originally invented when studying planetary motion. Perturbation methods are similar to approximating a function using the first few terms of a power series; you take a state you can solve exactly (hydrogen atom) and assume that a small change (small electric field) can be modelled roughly using a simplified term for the difference. For example, this can be used to show the Stark effect (approximately) – where the lines of Hydrogen are split by an electric field.

But the new ‘quantum mechanics’ were quite different to the Bohr model. The Bohr model painted a picture of electrons being “in” some orbital then (for reasons unknown) deciding to jump to some other orbital. But in the Schrodinger/Dirac picture there were two very different processes going on. As time passed, the system would evolve according to the wave equation. But if a measurement of position, energy or momentum was made the wave function would “collapse” into a basis state (eigenvector) of the linear operator associated with that observable quantity. This collapse was evident because subsequent measurements would give the same answer, since the system had not had a chance to evolve away from the eigenstate. However, in general, the state would exist in some weighted linear combination (“superposition”) of any choice of basis states. If you made two different measurements (say position and momentum) whose linear operators did not have the same set of eigenvectors, then the result is dependent on the order you perform the measurements.

Schrodinger did not consider the effect of spin in his original equation (ie. the spin-orbit coupling, or the interaction of spin with an externally applied field). Thus, it required an extension by Pauli to reflect the fact that an electron’s state wasn’t just captured in the wave function. To include spin into the system state isn’t just as simple as recording a “spin up” or “spin down” for a given electron. The particle can be in a linear combination of two spin basis states. And, much like how multi-particle systems are modelled with tensor products to yield joint-probabilities, there can be dependencies between the spin state and the rest of the state.


The hydrogen atom

A lot of the early development of quantum mechanics focused on the hydrogen atom. Fortunately, the hydrogen atoms that are around today are just as good as the ones from the 1910’s and furthermore we’ve got the benefit of hindsight and improved instruments to help us. So let’s take a look at what raw experimental data we can get from hydrogen and use that to trace the development of ideas in quantum mechanics.

Back in 1740’s, lots of people were messing around with static electricity. For example, the first capacitor (the Leyden jar) was invented in 1745, allowing people to store larger amounts of electrical energy. Anyone playing around with electricity – even just rubbing your shoes across a carpet – is familiar with the fact that electricity can jump across small distances of air. In 1749, Abbe Nollet was experimenting with “electrical eggs” which was a glass globe with some of the air pumped out, with two wires poking into it. Pumping the air out allowed longer sparks, apparently giving enough light to read by at night. (Aside: one of these eggs featured in a painting from around 1820 by Paul Lelong). a video of someone with a hydrogen-filled tube so we don’t all have to actually buy one.

By passing the light through a diffraction grating (first made in 1785, although natural diffraction gratings such as feathers were in use by then) the different wavelengths of light get separated out to different angles. When we do this with the reddish glow of the hydrogen tube, it separates out into three lines – a red line, a cyan line, and a violet line. Although many people were using diffraction gratings to look at light (often sunlight) it was Ångström who took the important step of quantifying the different colours of light in terms of their wavelength (Kirchoff and Bunsen used a scale specific to their particular instrument). This accurately quantified data, published in 1868 in Ångström’s book was crucial. Although Ångström’s instrument allowed him to make accurate measurements of lines, he was still just using his eyes and therefore could only measure lines in the visible part of the spectrum (380 to 740nm). The three lines visible in the youtube video are at 656nm (red), 486nm (cyan), 434nm (blue) and there’s a 4th line at 410nm that doesn’t really show up in the video.

These four numbers are our first clues, little bits of evidence about what’s going on inside hydrogen. But the next breakthrough came apparently from mere pattern matching. In 1885 Balmer (an elderly school teacher) spotted that those numbers have a pattern to them. If you take the series n^2/(n^2-2^2) for n=3,4,5… and multiply it by 364.5nm then the 4 hydrogen lines pop out (eg. for n=3 we have 365.5 * 9/(9-4) = 656nm and for n=6 we have 365.5 * 36/32 = 410nm). Alluringly, that pattern suggests that there might be more than just four lines. For n=7 it predicts 396.9nm which is just into the ultraviolet range. As n gets bigger, the lines bunch up as they approach the “magic constant” 365.5nm.

We now know those visible lines are caused when the sole electron in a hydrogen atom transitions to the second-lowest energy state. Why second lowest and not lowest? Jumping all the way to the lowest gives off photons with more energy, so they are higher frequency aka shorter wavelengths and are all in the ultraviolet range that we can’t see with our eyes.

Balmer produced his formula in 1885, and it was a while until Lyman went looking for more lines in the ultraviolet range in 1906 – finding lines starting at 121nm then bunching down to 91.175nm – and we now know these are jumps down to the lowest energy level. Similarly, Paschen found another group of lines in the infrared range in 1908, then Brackett in 1922, Pfund in 1924, Humphreys in 1953 – as better instruments allowed them to detect those non-visible.

Back in 1888, three years after Balmers discovery, Rydberg was trying to explain the spectral lines from various different elements and came up with a more general formula, of which Balmer’s was just a special case. Rydberg’s formula predicted the existence (and the wavelength) of all these above groups of spectral lines. However, neither Rydberg or Balmer suggested any physical basis for their formula – they were just noting a pattern.

To recap: so far we have collected a dataset consisting of the wavelengths of various spectral lines that are present in the visible, ultraviolet and infrared portions of the spectrum.

In 1887, Michelson and Morley (using the same apparatus they used for their famous ether experiments) were able to establish that the red hydrogen line ‘must actually be a double line’. Nobody had spotted this before, because it needed the super-accurate interference approach used by Michelson and Morley as opposed to just looking at the results of a diffraction grating directly. So now we start to have an additional layer of detail – many of the lines we thought were “lines” turn out to be collections of very close together lines.

In order to learn about how something works, it’s a good idea to prod it and poke it to see if you get a reaction. This was what Zeeman did in 1896 – subjecting a light source (sodium in kitchen salt placed in a bunsen burner flame) to a strong magnetic field. He found that turning on the magnet makes the spectral lines two or three times wider. The next year, having improved his setup, he was able to observe splitting of the lines of cadmium. This indicates that whatever process is involved in generating the spectral lines is influenced by magnetic fields, in a way that separates some lines into two, some into three, and some don’t split at all.

Another kind of atomic prodding happened in 1913 when Stark did an experiment using strong electric fields rather than magnetic fields. This also caused shifting and splitting of spectral lines. We now know that the electric field alters the relative position of the nucleus and electrons, but bear in mind that the Rutherford goil foil experiment which first suggested that atoms consist of a dense nucleus and orbiting electrons was published in 1913 and so even the idea of a ‘nucleus’ was very fresh at that time.

Finally, it had been known since 1690 that light exhibited polarization. Faraday had shown that magnets can affect the polarization of light, and ultimately this had been explained by Maxwell in terms of the direction of the electric field. When Zeeman had split spectral lines using magnetic field, he noticed the magnetic field affected polarization too.

So that concludes our collection of raw experimental data that was available to the founders of quantum mechanics. We have accurate measurements of the wavelength of spectral lines for various substances – hydrogen, sodium etc – and the knowledge that some lines are doublets or triplets and those can be shifted by both electric and magnetic fields. Some lines are more intense than others.

It’s interesting to note what isn’t on that list. The lines don’t move around with changes in temperature. They do change if the light source is moving away from you at constant velocity, but this was understood to be the doppler effect due to the wave nature of light rather than any effect on the light-generating process itself. I don’t know if anyone tried continuously accelerating the light source, eg. in a circle, to see if that changed the lines, or to see if nearby massive objects had any impact.