Saturday, August 10, 2013

Space-filling experiment

I remember reading in a paper once that contour advection simulations are not space-filling.  I thought this was interesting, but my normal reaction is first, to be skeptical and second, well why not try it out myself and see? There are actually practical applications for this result--I once did a derivation attempting to link Lyapunov exponents to the diffusivity in which I assumed the fractal dimension of an advected contour was constant, thus not space-filling.

And so was born the space-filling experiment.  My first attempt used a swap file to hold the advected contour as it grew to (hopefully) epic proportions. I realized much later that this was a dumb approach, especially in the world of multiple cores and distributed memory.  Instead of working on one massive contour--which required a specialized executable--why not just string together a whole bunch of smaller contours?  Each contour is advected as a separate process.  Once a contour gets too big, it is halved and each half given its own process.  Later on, the contours can be spliced back together.  To facilitate this, we use an aggregate naming system: every time a contour is split, the output file names are updated by appending a '0' to the first file and a '1' to the second file.  Simple, lexical ordering can tell you which contour goes where.

The main issue with this exercise is the recursive, exponential process allocation.  If things go wrong and start to run away, it can eat your computer rather quickly.  It is possible of course to delete the whole process tree. Someday I might even write that script.  The other issue, unfortunately, was that it didn't work.  Even at 60 million nodes, the fractal dimension was still well under two and showed no signs of flattening out.

What I find most interesting about this exercise, however, was how it unlocked for me a big part of the philosophy of Unix programming.  I had already realized how much better small, specialized programs strung together work for most applications than monolithic, general programs, but hadn't realized how well the philosophy 'scales' to parallel programming. Here, instead of running one program which subsequently spreads itself over several processors, we split the program into multiple processes each of which are run separately.  It works especially well for "trivially parallel" problems, such as this one, however Unix does provide a plethora of message-passing facilities that allow processes to communicate with one another.

The main point is the division of labour into processor-intensive problems which are solved by compiled executables and non-processor-intensive problems which are solved by shell scripts calling the executables.  Each represents a distinct "level" and there often seems to be a point at which the problem handled by an executable--normally viewed as a discrete granule by the operating system--is about the right size. The real irony is that I spent quite a long time devising an extremely clever algorithm for parallel processing of contour advection. It involved splitting the contour over a fixed number of processors and passing points between them to ensure a balanced load such that each processor would finish in roughly the same amount of time.  Presumeably it would've been written using MPI. Screw that.  Just divide the problem into a larger number of processes than there are processors and let the operating system balance out the load for you.

One of the reasons the methods I'm discussing here are less known and used than in the past is because of increases in computer power.  In the past, to perform a series of operations on matrices it would have made sense to output one result to a file then call another executable to operate on that result. In today's world, where operations on even large matrices are trivial and the storage requirements even more so, you can do everything in a single interactive program, storing the results in RAM as you go along. Nonetheless, it's interesting to examine the parallels between a programming language and the Unix operating system--Unix is in fact a programming language designed to operate at the file system level. Files are akin to variables, executables are subroutines and pipes are like nested function calls.








No comments: