September 2019 – Andrew Birkett's blog

Bohr’s 1913 paper which presented the idea of electrons “jumping” between fixed orbitals was a huge step forward, although its predictions only worked for single-electron hydrogen atoms and did not predict the correct wavelength of spectral lines for more complex atoms.

The world that Bohr grew up in was based on Newton’s mechanics (which explained how particles accelerate due to net forces) and the force of gravity and Maxwell’s electromagnetism along with statistical explanations of heat. But Bohr could see that those “rules” were wrong in some way – they predicted that the hydrogen electron (being an accelerating charge) would cause EM waves thereby losing energy and spiralling into the nucleus. Since this didn’t actually happen, it was clear to Bohr that new rules would be needed. But he didn’t rip up the whole rulebook – after all, the existing rules had done a good job of explaining all sorts of other phenomena. Instead he looked to add a minimal set of new rules or postulates and keep the rest of existing physics “in play”. He chose to retain the Rutherford picture of orbiting electrons, where electrons are like little planets with known mass, velocity and position at all times. To this, he added the new rule that electrons orbited in circles, and the angular momentum of the electron was only allowed to take on discrete values.

To stay in a circular orbit at some distance, there’s only one velocity that works (any other velocity gives an elliptical orbit). Since mass is fixed, and the orbital radius and velocity are interrelated, this means that discrete angular momentum only allow discrete orbits each with a specific radius and velocity and therefore kinetic and potential energy. Specifically, in the first allowed orbit, the electron is moving at about 1/137th the speed of light, the orbital radius is 0.05nm and the energy is -13.6eV (the zero point is taken to be an electron very far away).

How far does this model get us in terms of explaining our experimental data? It describes the hydrogen lines well – the visible Balmer lines are understood to due to electrons “jumping” to the 2nd lowest orbit from the 3rd/4th/5th/etc orbits. But it doesn’t explain what happens in multi-electron atoms like Helium. Nor does it explain why some lines are more intense than others. It doesn’t explain the Zeeman effect splitting of line. And finally, it is not a general explanation of how particles move in the presence of forces: it only describes the special case of a negative charge moving in a central electric field caused by the positive charge of the nucleus. It doesn’t tell you how a free electron would move, nor an electron in a linear electric field. Finally, even the foundations are flawed – the choice to explain the discrete energy levels in terms of discrete angular momentum isn’t right – we now know that the ground state of hydrogen has zero angular momentum, not the ? amount that Bohr modelled.

But still, it was a huge breakthrough – making it clear that the explanation of atom-level phenomena was going to require a fresh set of rules.

Bohr’s choice to focus on circular orbits was curious, since every physicist is familiar the fact that particles in a central inverse-square force move in elliptical orbits in general. Consequently, Sommerfeld tried to extend Bohr’s reasoning to include elliptical orbits, guided by the requirement that the resulting orbits still needed to have the discrete Bohr energies necessary to cause the hydrogen spectral lines. Sommerfeld realised that the eccentricity (the shape of the ellipse) had to also be quantised to achieve this. But initially, this extra step didn’t seem to yield anything useful except more complexity – it just gave the same ‘jumps’ as Bohr although there were now many more ways to achieve them. You now need two ‘quantum numbers’ to describe the orbital – Bohr’s original ‘n’ and Sommerfelds new ‘l’ but since the energy of the orbital is determined by ‘n’, what’s the point? Who cares if there’s a few different shapes of orbital if they all have the same energy, and it’s the energy we care about.

However, the nice things about elliptical orbits is that they’re not symmetric – the electron moves more in the long axis of the ellipse than the short, and creates the possibility of explaining the Stark and Zeeman effect as being the interaction of this motion with the direction of electric and magnetic fields. This gives a hint that Sommerfeld might’ve been onto something, but in the early days it was definitely just a “guess with some hope.

Bohr’s circular orbits imply that there is an ‘orbital plane’ and therefore a special distinguished axis. If you had a 100 hydrogen atoms, you might expect them to be randomly aligned. But since a charge moving in a circle causes a magnetic field, you could also argue that they might tend to line up with each other. Or, if you applied a strong external magnetic field, you could cause the axes to all align in a single direction. Or if you fired the atoms through an inhomogenous magnetic field, the amount they were deflected would tell you about the angle their axis made with the magnetic field direction.

However, Sommerfeld’s work added something surprising. Sommerfeld tried to generalize Bohr’s one-parameter circle orbits to two parameters (to allow for ellipses) and then three (to allow for ellipses oriented in 3d space) whilst retaining the spirit of Bohr’s quantization condition for angular momentum. What he found was, rather confusingly, that in 3d space the quantization condition only allowed for elliptical orbit planes in particular orientations. This seems very odd, since it presupposes that there is some ‘preferred’ direction in the universe against which these allowed orientations are measured. (Skipping ahead, we now understand this in terms measurement in a chosen axis, but with the particle state being in general a superposition of the possible basis states, but the idea of superpositions of quantum states was several years in the future). Weird as it may sounds, it’s nonetheless a prediction that you can design an experiment to test. A charge orbiting in a plane acts like a little magnet. If you fire a suitable atom through an inhomogenous field, they get deflected by an amount related to the alignment of the “little magnet” with the inhomogenous field. If the electrons really could only live in discrete orbital planes, the atoms ought to get deflected in a few discrete directions. If the electrons could live in any orbital plane, you’d get a continuous spread of deflections.

If you think the idea that orbital planes can only exist in certain orientations relative to an arbitrary choice of axis sounds, well, wrong – then you’re not alone. Even Debye, who had also derived the same idea, said to one of the people proposing to actually measure it “You surely don’t believe that [space quantization] is something that really exists; it is only a computational recipe”. In other words, even to the people who came up with the idea it was little more than a utilitarian heuristic – a mathematical procedure that got the right answers by a wrong route. Even Stern, one of the experimenters, later said he performed the experiment in order to show that the whole idea was incorrect. And his supervisor, Born, told him there was “no sense” in doing the experiment. Furthermore, according to classical physics when you put ‘little magnets’ into an external magnetic field, they precess around the axis of the magnetic field rather than doing any kind of ‘aligning’.

At this point in history, a rather surprising thing happens. We now know that Bohr/Sommerfeld’s prediction of the magnetic moment and angular momentum was wrong – they predicted it was ? whereas we now know it is zero. But Stern and Gerlach, who performed the inhomogeous magnetic field experiment, didn’t know that. Had that been the full story, they would’ve found no deflection. But in fact, they found that their beam of atoms did split nicely into two. What they didn’t know about – noone knew at that time – was that electrons have an intrinsic magnetic moment of their own that can take on two values. This electron “spin” was the mechanism that produced their observed result. But, being unaware of spin, they wrongly concluded that they had demonstrated the reality of Sommerfeld’s “space quantization” – in fact, they had demonstrated a different kind of quantization.

(Interestingly, although most descriptions focus on angular momentum as the important concept, Stern’s own nobel lecture doesn’t mention angular momentum at all. It only talks about the magnetic moment. There’s an assumption implicit that magnetic moments are what you get when you have charge and angular momentum, but since it’s the magnetic moment that determines the deflection in the Stern-Gerlach experiment I, like Stern, prefer to talk about magnetic moments and leave it for someone else to worry about how that magnetic moment comes about).

So where does that leave Sommerfeld’s ellipses? They’re still supported both by their ability to explain the Stern and Zeeman effect (partially) and also for the fact that Sommerfeld also calculated a relativistic correction for his elliptical orbits which made the prediction of spectral line wavelengths match experimental data slightly more accurately (in Bohr’s circular orbits, the electrons travel at c/137 or gamma=1.00002, and the speed will be higher in ellipses that do “close passes” to the nucleus, so you start to get close to the point where special relativity starts making an impact).

Spin now enters the picture, as a highly “unclassical” concept. The story starts with simple pattern spotting. In 1871, Mendeleev organised the known elements into a table based on their chemical properties. He didn’t know it at the time but he’d stumbled upon the sequence of atoms with increasing number of electrons, and the groups he perceived gained their commonality through having the same number of electrons in their outermost shells. But several steps were required to make this connection. Firstly, the Bohr model gave the idea of discrete orbits each with different energy. Then Sommerfelds elliptical orbits gave several different alternative shapes for a particular energy of orbit (“degeneracy”). A paper by Stoner in 1924 made a connection between the number of spectral lines of an element (once degenerate states had been split out using the Zeeman effect) and the number of electrons in the subsequent noble gas. (Stoner’s career prior to this point had been rather desperate). This observation lead Pauli to realise that a simple rule of “only one electron is allowed in each quantum state” was possible, but only if an extra two-valued quantum number was used. Initially Pauli didn’t offer up any explanation of what this two-valued thing was. Goudsmit and Uhlenbeck subsequently proposed that it could be caused by the electron spinning around its own axis, something which was later shown to be wrong (electrons seem to have no size, at least every attempt to measure their size finds it smaller than we can measure, and so to create enough angular momentum the tiny tiny spinning top would have to rotate very quickly, such that its surface would be going faster than the speed of light). But although the picture was wrong, the idea that electrons have their own intrinsic two-valued angular momentum and magnetic moment is correct – as, in fact, the Stern Gerlach experiment showed.

Like Sommerfeld’s ellipses, the two possible electron spin states don’t have much effect on the energy – it’s still dominated by the original Bohr ‘n’. But spin does make small changes to the energy. A particle with spin is like a small magnet, and a small magnet orbiting a positive nucleus has an electromagnetic interaction – Lamour interaction and Thomas precession. This causes small changes to the orbit energy, resulting in splitting of spectral lines – a processes now named “spin-orbit interaction”. Sommerfeld’s ellipses cause a

But how was Pauli to incorporate his new “two valued” quantity into the Bohr-Sommerfeld model. It seems that he didn’t. Pauli published his exclusion principle in January 1925. Heisenberg wrote his matrix mechanics paper in July 1925, and Schrodinger published his wave mechanics in 1926. These approaches were much more general than the Bohr-Sommerfeld approach – a genuine ‘mechanics’ explaining how particles evolve over time due to forces. In 1927, Pauli formulated the “Pauli Equation” which is an extension of the Schrodinger equation for spin-1/2 particles that takes into account the interaction between spin and external electromagnetic fields.

Although initially the Heisenberg and Schrodinger approach looked very different, Dirac was able to show that both are just different realisations of a kind of vector space, and that quantum mechanics was a big game of linear algebra which didn’t care if you thought of those vectors were ‘really functions’ or not. Dirac was happy to go somewhat off-piste mathematically, using his “Dirac delta” functions which are non-zero except at a point yet their integral is one. His work was followed up by von Neumann whose book took a more formal rigorous mathematical approach, objecting to Dirac’s use of “mathematical fictions” and “improper functions with self-contradictory properties”. The approach is much the same, but the foundations are made solid.

In the Schrodinger picture, a particle is described by a complex-valued wave function in space. The Schrodinger equation shows how the wave evolves in time, as a function of the curvature of the wave and a term describing the spatial potential. In the case where a particle is constrained within a potential well, such as an electron experiencing the electrostatic attraction of a nucleus, the waves form ‘stationary’ patterns (the wave continues to change phase over time, but the amplitude is not time-dependent). In a hydrogen atom, the stationary states in three dimension are combinations of radial, polar and azimuthal half-waves which result in amplitudes that vary spatially but not with time. The radial, polar and azimuthal contributions match up with the three quantum numbers from the Bohr model (n,l,m) reflecting the fact that the Schrodinger approach is much more general – the Bohr model “falls out” as being the special case of a single particle in a central electrostatic field.

As is often the case, although the Schrodinger equation is very general, only a few simple symmetric cases (such as the Hydrogen atom) result in a nice compact mathematical expression. For more complex cases, one can do numeric simulation (ie. rather than viewing the Schrodinger equation as stating a criteria for a solution in terms of it’s time derivative and spatial curvature, you can view it as an algorithm for evolving a function forward in time). Alternatively, one can apply perturbation methods, originally invented when studying planetary motion. Perturbation methods are similar to approximating a function using the first few terms of a power series; you take a state you can solve exactly (hydrogen atom) and assume that a small change (small electric field) can be modelled roughly using a simplified term for the difference. For example, this can be used to show the Stark effect (approximately) – where the lines of Hydrogen are split by an electric field.

But the new ‘quantum mechanics’ were quite different to the Bohr model. The Bohr model painted a picture of electrons being “in” some orbital then (for reasons unknown) deciding to jump to some other orbital. But in the Schrodinger/Dirac picture there were two very different processes going on. As time passed, the system would evolve according to the wave equation. But if a measurement of position, energy or momentum was made the wave function would “collapse” into a basis state (eigenvector) of the linear operator associated with that observable quantity. This collapse was evident because subsequent measurements would give the same answer, since the system had not had a chance to evolve away from the eigenstate. However, in general, the state would exist in some weighted linear combination (“superposition”) of any choice of basis states. If you made two different measurements (say position and momentum) whose linear operators did not have the same set of eigenvectors, then the result is dependent on the order you perform the measurements.

Schrodinger did not consider the effect of spin in his original equation (ie. the spin-orbit coupling, or the interaction of spin with an externally applied field). Thus, it required an extension by Pauli to reflect the fact that an electron’s state wasn’t just captured in the wave function. To include spin into the system state isn’t just as simple as recording a “spin up” or “spin down” for a given electron. The particle can be in a linear combination of two spin basis states. And, much like how multi-particle systems are modelled with tensor products to yield joint-probabilities, there can be dependencies between the spin state and the rest of the state.

A lot of the early development of quantum mechanics focused on the hydrogen atom. Fortunately, the hydrogen atoms that are around today are just as good as the ones from the 1910’s and furthermore we’ve got the benefit of hindsight and improved instruments to help us. So let’s take a look at what raw experimental data we can get from hydrogen and use that to trace the development of ideas in quantum mechanics.

Back in 1740’s, lots of people were messing around with static electricity. For example, the first capacitor (the Leyden jar) was invented in 1745, allowing people to store larger amounts of electrical energy. Anyone playing around with electricity – even just rubbing your shoes across a carpet – is familiar with the fact that electricity can jump across small distances of air. In 1749, Abbe Nollet was experimenting with “electrical eggs” which was a glass globe with some of the air pumped out, with two wires poking into it. Pumping the air out allowed longer sparks, apparently giving enough light to read by at night. (Aside: one of these eggs featured in a painting from around 1820 by Paul Lelong). a video of someone with a hydrogen-filled tube so we don’t all have to actually buy one.

By passing the light through a diffraction grating (first made in 1785, although natural diffraction gratings such as feathers were in use by then) the different wavelengths of light get separated out to different angles. When we do this with the reddish glow of the hydrogen tube, it separates out into three lines – a red line, a cyan line, and a violet line. Although many people were using diffraction gratings to look at light (often sunlight) it was Ångström who took the important step of quantifying the different colours of light in terms of their wavelength (Kirchoff and Bunsen used a scale specific to their particular instrument). This accurately quantified data, published in 1868 in Ångström’s book was crucial. Although Ångström’s instrument allowed him to make accurate measurements of lines, he was still just using his eyes and therefore could only measure lines in the visible part of the spectrum (380 to 740nm). The three lines visible in the youtube video are at 656nm (red), 486nm (cyan), 434nm (blue) and there’s a 4th line at 410nm that doesn’t really show up in the video.

These four numbers are our first clues, little bits of evidence about what’s going on inside hydrogen. But the next breakthrough came apparently from mere pattern matching. In 1885 Balmer (an elderly school teacher) spotted that those numbers have a pattern to them. If you take the series n^2/(n^2-2^2) for n=3,4,5… and multiply it by 364.5nm then the 4 hydrogen lines pop out (eg. for n=3 we have 365.5 * 9/(9-4) = 656nm and for n=6 we have 365.5 * 36/32 = 410nm). Alluringly, that pattern suggests that there might be more than just four lines. For n=7 it predicts 396.9nm which is just into the ultraviolet range. As n gets bigger, the lines bunch up as they approach the “magic constant” 365.5nm.

We now know those visible lines are caused when the sole electron in a hydrogen atom transitions to the second-lowest energy state. Why second lowest and not lowest? Jumping all the way to the lowest gives off photons with more energy, so they are higher frequency aka shorter wavelengths and are all in the ultraviolet range that we can’t see with our eyes.

Balmer produced his formula in 1885, and it was a while until Lyman went looking for more lines in the ultraviolet range in 1906 – finding lines starting at 121nm then bunching down to 91.175nm – and we now know these are jumps down to the lowest energy level. Similarly, Paschen found another group of lines in the infrared range in 1908, then Brackett in 1922, Pfund in 1924, Humphreys in 1953 – as better instruments allowed them to detect those non-visible.

Back in 1888, three years after Balmers discovery, Rydberg was trying to explain the spectral lines from various different elements and came up with a more general formula, of which Balmer’s was just a special case. Rydberg’s formula predicted the existence (and the wavelength) of all these above groups of spectral lines. However, neither Rydberg or Balmer suggested any physical basis for their formula – they were just noting a pattern.

To recap: so far we have collected a dataset consisting of the wavelengths of various spectral lines that are present in the visible, ultraviolet and infrared portions of the spectrum.

In 1887, Michelson and Morley (using the same apparatus they used for their famous ether experiments) were able to establish that the red hydrogen line ‘must actually be a double line’. Nobody had spotted this before, because it needed the super-accurate interference approach used by Michelson and Morley as opposed to just looking at the results of a diffraction grating directly. So now we start to have an additional layer of detail – many of the lines we thought were “lines” turn out to be collections of very close together lines.

In order to learn about how something works, it’s a good idea to prod it and poke it to see if you get a reaction. This was what Zeeman did in 1896 – subjecting a light source (sodium in kitchen salt placed in a bunsen burner flame) to a strong magnetic field. He found that turning on the magnet makes the spectral lines two or three times wider. The next year, having improved his setup, he was able to observe splitting of the lines of cadmium. This indicates that whatever process is involved in generating the spectral lines is influenced by magnetic fields, in a way that separates some lines into two, some into three, and some don’t split at all.

Another kind of atomic prodding happened in 1913 when Stark did an experiment using strong electric fields rather than magnetic fields. This also caused shifting and splitting of spectral lines. We now know that the electric field alters the relative position of the nucleus and electrons, but bear in mind that the Rutherford goil foil experiment which first suggested that atoms consist of a dense nucleus and orbiting electrons was published in 1913 and so even the idea of a ‘nucleus’ was very fresh at that time.

Finally, it had been known since 1690 that light exhibited polarization. Faraday had shown that magnets can affect the polarization of light, and ultimately this had been explained by Maxwell in terms of the direction of the electric field. When Zeeman had split spectral lines using magnetic field, he noticed the magnetic field affected polarization too.

So that concludes our collection of raw experimental data that was available to the founders of quantum mechanics. We have accurate measurements of the wavelength of spectral lines for various substances – hydrogen, sodium etc – and the knowledge that some lines are doublets or triplets and those can be shifted by both electric and magnetic fields. Some lines are more intense than others.

It’s interesting to note what isn’t on that list. The lines don’t move around with changes in temperature. They do change if the light source is moving away from you at constant velocity, but this was understood to be the doppler effect due to the wave nature of light rather than any effect on the light-generating process itself. I don’t know if anyone tried continuously accelerating the light source, eg. in a circle, to see if that changed the lines, or to see if nearby massive objects had any impact.