Wednesday, August 7, 2013

Miscellaneous ramblings: ctraj and higher-order functions in C

Paul Graham is a well known personality in the tech world and he has a number of outspoken opinions about computer programming. While I enjoy reading his essays, I don't often agree with him. A constant claim of his is that some programming languages are more powerful than others because they are more expressive. For instance, in Lisp, you can create a function (a higher-order function) that returns another function, whereas in C this is quite a lot more difficult. You can also perform similar types of meta-programming--programs that write programs--very easily in Lisp, whereas in C this is a bit more tricky.

I would argue that this is, in fact, a deliberate design decision in C, and a very wise one at that. The chief reason for it is efficiency--the style of programming possible in Lisp is often very difficult to optimize. Suppose we have a function that returns a function. What if this returns another function that returns a function? It's impossible for a compiler to predict how far this recursive loop will unroll, thus such higher-order functions would almost by necessity be left uncompiled until run time.

In fact, a C program can extend itself at run-time--with higher-order functions and whatnot--and it's actually quite easy to implement. The latest version of ctraj demonstrates how it can be done, more or less. All the main algorithms--trajectory integration, tracer simulation and contour advection--in ctraj have been refactored to make them more general. All of them call a base-class that represents a velocity field that, at least in theory, could be implemented in just about any way.

I was most interested in integrating velocity fields with simple, analytic representations, such as the blinking vortex. An obvious, but naive, way of doing this would be to write a new class for each velocity field. Instead, I wrote a class that is just a wrapper for a function that, from the program's point of view, doesn't even exist yet. When the program is called, the user must supply the name of the object file containing the function, and it's entry point. The function is then dynamically linked in, allowing the main routine to integrate any arbitrary function, without recompiling the main "engine."

In this example, the user must both write and compile the function to be linked in--the software is after all designed for scientific researchers--but it's easy to see how both could be done automatically by the program itself. In other words, once the programming language has access to the operating system and a dynamic linking library, writing higher order functions is always possible.

To download the latest version of ctraj, click here.

Another puzzling claim that Graham makes, right at the beginning of "On Lisp," is that Lisp somehow makes bottom up design--writing a language on top of the language that better solves your problem--easier. Every good programmer is very familiar with bottom-up design and uses it extensively and any 3GL allows it. What are subroutines other than user-defined commands? Commands that extend the language and allow you to implement a new language on top of the old language. Within limits, obviously: the syntax is fixed after all. Relatively few languages actually allow you to extend the syntax. Lisp is no different: unlike most languages, there is in fact only one syntactical construct! --which is perhaps both its greatest advantage and its greatest limitation. Perhaps it is easier to write a compiler or interpretter in Lisp--it probably is, I've never tried it so I don't know--but until those nuts and bolts are sorted out, you're stuck with the usual prefix operator form--commonly called a function--just like every other language. I am actually working on a language that allows the syntax to be arbitrarily extended. It's not much of a priority, so I don't know when I'll finish it. The main issue is what the base language should look like.

It's not just 3GL's that allow abstracting larger pieces of code into a single or fewer commands. Any software tool that contains something like functions, procedures or macros will allow this--even a gosub command will do. Practically the whole business of computer programming is the act of cleverly abstracting a problem into a set of more convenient sub-problems. The very nature of programming languages mean that any language can be re-written into any other language. One of the reasons that I haven't bothered to learn Lisp yet is that it would take a lot less time to write a Lisp-like interpretter than to learn all of the commands and syntactical sugar of the actual language.

In the past I thought that using C and C++ instead of Fortran 77 (still popular to this day) was a great boon for scientific programming since it allowed for dynamic memory allocation and pointers made it easy to create powerful and flexible data structures such as linked lists. I thought switching to Lisp or even Prolog might offer up similar advantages. It's easy, however, to exaggerate such advantages. In practice, I find that I rarely use specialized data structures except within algorithms that are well encapsulated to begin with (such as in a sort routine, for instance.)

Going back once again to ctraj, a good example is the contour advection algorithm.  My first attempt at contour advection, written in IDL, used a linked list.  The version in ctraj, however, uses cubic spline interpolation to calculate the curvature, so the coordinate list has either to be converted to an array or already be in the form of an array.  Therefore it uses a pair of arrays that are constantly re-allocated.

In Fortran, I would solve the problem by just using fixed-length arrays.  Since the program bums out at about a million points, there's our length.  Of course arbitrary limits are another pet-peeve of real computer programmers.  It's not as bad as all that--in real life, there is always a limit, whether it be the maximum size of the executable allowed in RAM, or the RAM itself.  The fact that the contour advection simulation bums out at a million points or so is not so bad either--it's quite easy to split the simulation into two or more separate processes which can then be farmed out to different processors.  More on this in the next article.

As another example, it's quite easy to write a linked list in Fortran 77. It takes a bit more thought and requires passing a "work-space" variable to the subroutines (common practice in Fortran 77 programming) but once it's done it can be re-used as needed, just like any suite of subroutines. Maybe if I get around to it I'll post my version of a linked list written in Fortran 77. I guarantee that from the interface level it won't look much different from the equivalent code written in C.

No comments: