Wednesday, July 2, 2014

Work! What do I actually do?

Some time before 28 March, 2011, my dear great aunt R asked me to do a blog post on what my actual research is. What is it that I actually do on a day-to-day basis?

The reasons for the absurd delay in writing this are many, but I think it's time to get on with it, since I finally published something. It can be found at http://arxiv.org/pdf/1406.7029v1.pdf .

In this post, I'll talk generally about what my day-to-day existence is like when I'm not doing exciting stuff I write blog posts about, and then I'll give an executive summary in plain English of what the paper is about.

Here's a photo of me in my office.

I spend a LOT of time here. On the window providing some shade is a gigantic chart of all the isotopic nuclides. It's like a periodic table, but with isotopes too. The story behind the mask is too detailed to relate here. 

I'm wearing a shirt I had printed with the Maxwell equation written in exterior differential form - it's a very compact way to write the equations which completely describe electromagnetism. To my right on the desk can be seen some toy magnets, a titanium spork, a spec sheet for an orbital telescope I worked on, some 3D printing stuff, a rocket design manual, and a giant spanner. To my left on the desk is a pile of papers on true polar wander (the process by which the earth tips over), multiple Rubiks cubes, a jar of vegemite, a toothbrush, a pile of books on Chinese, Russian, Indian history, and geophysics. Hanging from my laptop is one of my small drones charging on a USB cable. 

Concealed behind me is a gigantic pile of papers dealing with my actual research, which is on gravitational wave propagation algorithms. Plural, because all but one don't work. Proof by exhaustion states that there is a large space of possible algorithms, and the correct one is in there somewhere, and all you have to do is optimize on every possible dimension until you find it. Unfortunately the number of dimensions, and thus potential causes of errors, is practically infinite.

Don't think for a second, though, that simply getting a nerdy shirt and a cluttered desk will result in (possibly correct) algorithm generation. Many, many long hours I've frittered away my youth experimenting with various schemes that not even I think might work. When I started this post, more than three years ago, I'd just started out, and the problem had not even formed. A year later, I hard started on the problem plus a bunch of preliminary work. Two years after starting, I discovered that the general consensus was that the problem was impossible to solve, at about the same time that I had a scheme that seemed like it was thinking about working - enough to present a poster at APS. Since then, every last part of the algorithm has been chopped and changed, like a car that, over its lifetime, becomes an entirely different car through part substitution. Part of the fun of using an object oriented code is that this is even possible with some degree of confidence.

The code is written in C++, one of the most powerful and most widely used high-performance programming languages, and one of the hardest to master. After about 3 years of using it, I can say with confidence I still have not plumbed the depths of my ignorance on the topic. Common knowledge suggests a decade of incidental use will result in reasonably good understanding. I must emphasize, however, that even the most bizarre, difficult to use programming language is still orders of magnitude more simple than the simplest natural languages, even made-up ones like Esperanto.

All that said, what does the paper actually talk about? It's not the sort of scientific paper that announces a new result, or some great discovery. It's more representative of the bulk of scientific papers, representing incremental progress. It's also a methods paper, in that it establishes a new method for solving a known problem. It's also a general paper, in that future papers (probably from me, maybe from other people) will refer back to it as establishing the basic method, and laying out the formalism in modern notation.

The paper contains a lot of equations. None of the equations are particularly new, in the sense that I did not have to discover 'the equation of gravity' in order to do this work. That equation (The Einstein Field Equation) has been known for nearly 100 years. Any new equation would not be correct, because that is the correct equation. But it's actually a horrendously complicated equation, and we have to slice it up into (about 50 different) manageable parts to solve on the computer. People have been trying since the 1960s to solve this equation on the computer, but only succeeded starting in 2005. Many things can make computers go wrong when solving equations, and this equation can and does produce all those problems, plus some new ones. For example, when simulating black holes, the simplest choice of coordinate system is called inertial, which means the coordinate system you'd get if each grid point was in freefall. It's simple because there are no accelerations, and it's the coordinate system you get in space when there's no apparent gravity. But a blackhole quickly swallows that coordinate system, which is very sad.

Fortunately, I work with an existing code base of more than 100,000 lines, written by many collaborators in my group over the last decade or so. This provides a LOT of functionality which just doesn't exist in many other codes or programming languages. This allows me to focus on the algorithms themselves, instead of getting bogged down with a lot of low level functions.

The paper starts with a general discussion of the problem that my work directly addresses. The existing code is tremendously powerful and successful, and simulates black holes that inspiral, merge, and ring down, emitting gravitational waves in the process. We hope to detect those waves on earth, which will provide us with new insight into these peculiar and high-energy regimes. The simulation volume is typically similar in size to earth's moon. Even at its outer edge, the grid points are very close to the action, and the coordinate system is shaken by the gravitational waves. This makes unique determination of the gravitational waves difficult, because it's difficult to see what's going on when everything is shaking around. 

Unfortunately, even a little bit of error is unacceptable if we want to have a hope of getting useful physics out of gravitational wave astronomy. So there's an equation which allows extraction of gravitational radiation from the simulations, by simulating the space between the simulation and infinity. Unfortunately, the equation is so complicated that there is only one other code that does it. Written in the 90s, it is now too slow. Moreover, it does not exploit the advances that have been made with the code we use in our group. 

Often in science progress is made when a fresh face (i.e. me) approaches a problem they don't know everyone else thinks is impossible. Or at least extremely technically annoying. And the first 10 or so attempts at solving the problem didn't work. At first, I tried to understand the process on paper, and later by prototyping using a simpler computer code and a simplified problem. Eventually I built up my understanding of the tools and system behaviour, which allowed better guesses at avenues for exploration. At the same time I was learning how the code worked and how to write new parts of it. 

There were two central technical tricks I developed and exploited for this research. To the best of my knowledge, they've never been employed together in this fashion, and for the well being of grad students everywhere, I hope they never are again!

The first trick is one that just shouldn't work on computers, but somehow it does. The problem arises because the equations explicitly diverge on both sides of the equals sign. Eg, we want to solve for Q, where F(Q) = RHS(other stuff). Q is finite in the domain, but F(Q) goes to infinity as radius goes to infinity. Using some moderately complicated maths it's possible to invert F and write Q = F^(-1)(RHS(other stuff)), but getting it to work on a computer, where numbers can't be infinite and so on, took some finagling. I explain this in the paper in great detail, but suffice to say, without the code's ability to perform extremely accurate and quick radial integrals, this method would not work at all. All hail the Fast Fourier Transform!

The second trick is both better established in the literature and less well known among my advisers, and it's called the Magnus expansion. As it happened I worked out the basics for how the method would work, tried to devise a name for it, googled the name, and found someone else (Magnus, as it turns out) had already discovered this rather obvious technique and, what's more, had written a paper doing all the hard sums for me. In this case, the Magnus expansion was applied to a matrix which represented some complex numbers. The equation had an annoying part which could be thought of as a phase. The Magnus expansion allowed this phase to be set to zero, and ignored until it could be dealt with in a consistent fashion. Again, incredibly seat-of-the-pants numerical methods, and I'm just lucky that when all was said and done I'd only wasted about 14 bits of precision, leaving me with perhaps 6 orders of magnitude to play with. For people who know the difference between single and double precision floating point numbers, the difference makes ALL the difference.

Okay, I had the basics of this method finalized a year ago, and permitted myself a shy smile. There were a bunch of other technicalities covered in the appendices related to translating information from the simulations into the form where my algorithm could deal with it. I crushed that problem on only the third try. At this point I could have published yet another theory paper, but if you want the method to be used, or for anyone to take notice, you need to actually prove that it works. Especially when the underlying methodology is so left-field and the apparent efficiency increase (hundreds or thousands of times faster) so unlikely seeming.

To prove that it works you need to show that its stable by feeding it garbage and watching it resolutely refusing to explode. You also need to show that it converges. That is, as you increase the resolution, it gets closer and closer to a particular value in a predictable and most importantly, extremely rapid way. The prescription for these sorts of graphs is well known, and its in this phase that cryptic bugs are eventually discovered. Sometimes they even have a comment nearby saying "\\this will cause problems I bet...".

When the code behaves itself, you finally need to compare it to results produced with another similar code. Afterall, it could converge to garbage, which would be useful to noone. In this case, it was compared to the code it was designed to replace. First I had to convince this no-nonsense older code to do what I wanted it to do, and then get the parameters right to prevent its own menagerie of stability and convergence issues. Principle problems include it never being designed to do what we had in mind. Similarly, my code was never designed to operate with parameters that could compare to the older code. After a few decades of wasted CPU time and more than a few false starts, we generated comparable data sets. By this time, enough of the problems had been worked out that they were close enough to publish, as you can see in Figure 8 of the paper. This figure shows that over the absurdly long time scale we ran this code, both codes were pretty accurate to begin with. Moreover, when the resolution was increased, the error dropped by a certain, predictable amount. Lastly, the difference between the two codes is also really tiny. Of course, there are always a few problems. For instance, the older code begins to lose convergence toward the end. Also, the codes resolve the junk radiation to differing extents.

Junk radiation is nonsense signal at the beginning caused by the fact that the simulations have to start somewhere. This insult to well behaved computers causes unphysical fake waves to bounce around inside the simulation and sometimes even stop it working properly. We think my code might be able to stop or reduce junk radiation in the future. Stay tuned!

It bears mentioning that this work was a team effort! I don't name names on this blog, but the most overt contributors are listed on the paper. For a more complete list, you'll have to wait for my thesis! But you know who you are, and without you, there is a good chance we would have never known this problem was, indeed, possible to solve.