Friday, July 20, 2012

My design philosophy

I've generally found that the software I write seems to work better in so many ways from other free software I've found on the 'net.  Part of this is just familiarity: if something goes wrong, I know exactly where to go to fix it and if something's missing, I know exactly how to extend it.  But it's also that I know how to code.  I remember once writing a subroutine for another scientist to help him process his data.  When we compared it to what he had done, we found that my code was better in just about every way: faster, smaller and more general.

Here are a bunch thoughts on software design and development.  At this point it could hardly be called a unified or integrated philosophy, just a roughed out, loosely connect set of ideas.  For instance, #6 says, "Test you algorithm with the problem at hand."  Well, this is obvious and to properly test a program, you usually need a lot more test-cases.

In the process of laying this down, I realized my ideas on program development most closely matched those of the Unix/Linux communities.  For more information, I would recommend interested readers to read up on the Unix Philosophy, which is now quite mature and has many adherents:
Not that I'm advocating this, but I picked both The Art of Unix Programming and The Unix Programming Environment as free pdf e-books.  Wikipedia states:
This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. (Doug McIllroy)
I tend to write first a suite of libraries.  I then encapsulate those libraries in simple, stand-alone executables which I string together using a makefile which defines a set of data-dependences.

Without further ado, here is what I came up with:

1. Primitive types exist for a reason.  Do not use an object class or defined type if a primitive type will do.  Using primitive types makes the code easier to understand, produces less overhead when calling subroutines and makes it easier to call from different languages (e.g. calling C from Fortran).

2. By the same token, if an algorithm can work with primitive types, write it for primitive types.  When using the algorithm with defined types, translate from those to the primitive types rather than adding unnecessary indirection.

 E.g. when working with dates from climate data, I almost always perform Runge-Kutta integrations using a floating point value for the times even though the templated routines will work with a more complex date type.

3. Do not use an object or class hierarchy when a function or subroutine will do.  Again, the tendency is to reduce overhead, e.g. associated with the class definition itself and with initializing and setting up each object instantiation.

4. Build highly general tools and use those tools as building blocks for more specific algorithms.

5. Form re-useable libraries from the general tools.

 E.g. (4. & 5.), my trajectory libraries have very little in them, they are mostly pieced together from bits and pieces taken from several other libraries.  In the semi-Lagrangian tracer scheme, the wind fields are interpolated using generalized data structures from another library, they are integrated using a 4th-order Runge-Kutta subroutine from still another library, intermediate results are output using a general sparse matrix class and the final fields are integrated using sparse matrix multiplication, an entirely separate piece of software.

6. Test your algorithms with the problem at hand.

7. Use makefiles for your test cases.

8. In the beginning, write a single main routine that solves the simplest case.  Later, once it is working, you can form it into a subroutine or object class and flesh it out with extra parameters and other features.

9. If a parameter usually takes on the same or similar values, make it default to that value/one of those values.  This is easy if the syntactical structure is a main routine called from the command line or an object class, but can be difficult if it is a subroutine and the language does not support keyword parameters.

10. Do not use more indirection than absolutely necessary.  The call stack should rarely be more than three or four levels high (excluding system calls or recursive algorithms).

11. Enforced data hiding is rarely necessary.  If you need to access the fields in your object class directly from unrelated classes or subroutines, maybe the data shouldn't be inside of a class in the first place.

12. Generally, the closer the structural match of the syntactical structures (data, subroutines and classes) to the problem at hand, the more transparent the program and the better it works.

E.g., simulation software for the Gaspard-Rice scattering system has two classes: one for the individual discs and one for the scattering system as a whole.

13. Given the choice between a simple solution and a complex solution, always choose the simpler one.  The gains in speed or functionality are rarely worth it.

E.g., to overcome the small time step required by the presence of gravity waves in an ocean GCM, one can use an inverse method to solve for surface pressure for a "flat-top" ocean, or one can have a separate simulation for the surface-height with a smaller time step, or one can have layers with variable thickness in which the gravity waves move independently (and more slowly because the layers are thin).  I would choose the last of the three options because it is the simplest and most symmetric, that is, elegant.

14. It is not always possible or desirable to strive for the most general solution.  Sometimes you have to pick a design from amongst many possible different and even divergent designs and stick with it.  Once you have more information, the code can be refactored later.

E.g. in my libagf software, there are only two choices of kernel function (the kernel function is not that critical) and with the Gaussian kernel, there is only one way to solve for the bandwidth.  These choices were not easy to make but the alternative (make it more general) was just too fiddly and had too many divergent paths for the user to easily choose amongst.

15. Unless the old code is written by an expert (and I will let the reader decide what I mean by that) and well documented, it is usually quicker and easier to rewrite it from scratch rather than work with someone else's code.  This is especially true in relation to scientific software which is not usually written by professional programmers and tends to be poorly documented.

16. If you do decide to use someone else's code, it is usually better to encapsulate it using function and system calls rather than delve deep into the bowels.

17. Do not be afraid to re-invent the wheel.  Re. 4. and 16., there is now a plethora of standard libraries available for just about every language, but something you write yourself will often fulfill your needs better, provide fewer surprises and give you a greater understanding of your own program as there will be no "black boxes".  Obviously, it will also improve your understanding of computer programming in general.  If you are a good programmer, your implementation may also be better in every way.

18. Avoid side effects: write subroutines and executables that take a set of inputs and return a set of outputs and that's it.

19. Try to separate the major components of the program into modules: IO should be separated from the data processing or "engine."  By the same token, the GUI interface should also be separated.

20. Despite recent improvements in memory management, core memory that creeps into the swap space is a sure way to destroy performance.  Customized paging algorithms that operate directly on the input/output files can significantly alleviate this problem.

Download these guidelines as an ASCII text file.

Wednesday, July 4, 2012

Support free software, support free science

For those of you who have been following the Peteysoft sites and software, and I know there are quite a few of you, you know that I have been dutifully posting free software and free scientific content at least since 2007. Recently I've taken the step of monetized one of my software websites, , by including advertising, in an attempt to recoup some of the costs of my time and effort.  If you want your free software to remain truly free, please do us all a favour by clicking on the donation button. Even if all you can spare is $5 dollars, that may make all the difference to help keep the Peteysoft projects free for all and free of advertising.

Monday, July 2, 2012

Thoughts on music

When I was in Vancouver, living in its notorious East Side, I spent a lot of time hanging out with an aspiring musician.  He said he was going to teach me to play base and wanted me to manage his band.  He even restrung his guitar left-handed for me.  At the time I didn't take this too seriously as I'd hardly picked up an instrument and didn't think I had any talent to speak of.  A couple of years later my sister and her husband gave me a guitar for my birthday and I've been playing ever since.

One of the things I remember about my time with Adam was a silly argument we got into.  I was willing to go along with his plans as long as he was willing to go along with mine--being at loose ends with no prior commitments, I wanted to go travelling.

"We can take it to the road," I would tell him.  "Bring guitars and advertise our open air concerts on the internet."

"Will we bring amps?" he would ask.  "We've gotta have amps..."

I thought this was silly.  Like Tony Hawks fridge, I figured it would be difficult to hitch with a pair of 25-pound amplifiers.

"C'mon," I would reply, "Musicians have been playing instruments for thousands of years and they didn't need amps..."

Later on I realized there was more to this argument than meets the eye.  What is it that makes good music?  A fundamental idea in classical music theory is that of consonant versus dissonant intervals.  That is, to produce a pleasant-sounding chord, the ratio between the fundamental frequencies of the notes must be simple, rational-fraction intervals, say 3:2.

To understand what I mean by this, we must go back to our basic physics: the mechanics of standing wave.  If we have a vibrating string (such as on a guitar) that is fixed at both ends, the fundamental frequency will be a wave twice the length of the string.  Of course the string won't just vibrate at this frequency, there will also be standing waves with wavelengths the length of the string, 2/3 the length of the string, 1/2 the length of the string and so on.  Thus all the frequencies (or harmonics) can be predicted to first order by a simple arithmetic sequence.

If two strings are vibrating at a rational-fraction interval, say 3:2, then every second harmonic of the first string will constructively interfere with every third harmonic of the second string.  To demonstrate this effect, try taking a guitar and fretting the low 'E'  string (topmost, thickest string) on the fifth fret.  It is now at the simplest rational fraction interval with the 'A' string (the one below it), 1:1.  If you pluck one of the two strings, the other will start to vibrate in sympathy, assuming your guitar is well tuned.

The question this leads me to, is simply, is good music simply louder than bad music?  This makes a certain brutal and obvious sense: louder music will shout down quieter music.  Hence Adam's desire for amplification.

Ever since the sixties, rock'n rollers have been on a quest for ever more volume.  This has led to some interesting developments.  First, when you try to amplify a standard guitar, you frequently get feedback, that squealing noise often heard from microphone PA systems, as the sound from the amplifiers gets picked up again by the guitar and re-amplified.  This led to the development of solid-body electric guitars which don't suffer from this problem as much.  Also, when you try to drive an amplifier too hard, it goes outside of its linear range, resulting in a distortion of the signal as the wave-forms get clipped.  Rock'n rollers decided that they liked this sound, resulting in the development of devices, such as this effects pedal, to produce the effect artificially at much lower volumes.

With the development of equally-tempered tunings, much of the preceding discusion about consonance and dissonance is fairly moot.  In the past, it was common to use a just tuning, that is, every note in a scale is a rational fraction interval from every other note, with consonant intervals being simple fractions while dissonant intervals are more complex fractions.  Older music is based on a 7-note scale which defines the key of the piece--music is still written in this way.  When we switch keys, a consonance in one key may become a dissonance in another.  This led to the development of equal-temperament.  That is, we take the number of notes in the scale and divide the octave into that number of equal intervals.  Modern Western music uses a chromatic, or twelve-note scale, meaning that the next-higher note is the twelfth root of two times the frequency of the previous one.  If we now go back to our basic maths, the twelfth root of two is not a rational fraction.  The older, diatonic, or seven-note scale, is now picked out from the chromatic scale.  All keys sound the same, just sharpened or flattened by a certain interval.

A fretted, stringed instrument such as a guitar is almost by necessity tuned in equal temperament.  Most pianos are tuned somewhere between a just and equal temperament.  The implication being, except for perfect octave intervals and their multiples, no two notes are ever perfectly consonant as they only ever approximate a rational fraction interval.  

All chords on a guitar are somewhat dissonant.  Hence modern musicians' reliance on electronic amplifiers and the feedback they produce for generating volume.  Heavy metal musicians in particular are fond of what was traditionally considered the most dissonant interval: the tri-tone or one-half octave.