Day: November 17, 2016

Writing scientific software

RFOn asked:
“I’d love to hear your thoughts on what makes good scientific software. I strive to write correct software that’s intuitive to use, but would love examples of useful tools you’ve come across.”

RFon, I must admit I was not entirely sure what type of scientific software you had in mind  — the code that a student/postdoc would write, or community-supported and maintained open-source codes, or commercial codes? So I will give a general answer based on my experiences, and hopefully not be too vague.

I am a theorist/computational scientist and my work falls under applied physics; the problems I look at lie at the interface of physics, chemistry, and several branches of engineering. In the context of code development, my work falls under”scientific computing,” but our style is such that the emphasis is much more on “scientific” than on “computing.” In short, pure CS folks would probably scoff at much of our work as not pretty or clean enough (that is a common issue; academic science codes are often not pretty by CS standards). However, the main goal of our work is describing properly relevant physical phenomena, so physics is first, and numerics is a means to that end. Our work involves steps like: develop theory (write complicated partial differential or integro-differential equations or coupled systems thereof) that relate to interesting properties of a class of systems –> develop and/or implement algorithms to solve these systems of equations –> have a code that captures the underlying physics well enough that we can perform numerical experiments and understand quite well a class of systems.

In my group, we write our own code — FORTRAN FTW! We also use interpreters like Python and Matlab for some smaller-scale calculations, but FORTRAN is extremely fast for the type of work we do (lots of matrix/array manipulation), we have a lot of legacy code, and modern compilers (such as the free gfortran) are great. [Please, no proselytizing here how everything should be written in C++ or whatever, I have no patience with the “one true programming language” silliness, especially because much of scientific computing admits the procedural (rather than object-oriented) programming paradigm]. However, I will say that sending students to take certain undergrad computer science courses (e.g., data structures) really helps with adopting good programming habits and staying organized.

In the work we do, I cannot say we really pay much attention to a potential user experience (beyond commenting and documenting code), because we assume people will work with the source code. The way science in my field is funded is that there is really no money for developing a user interface (unless you are part of these big centers with permanent staff), let alone for providing user support. Again, our focus is on solving certain physics problems.

There is some commercial software that experimentalists in my field use (I can’t really go into detail without revealing what I do) and those are well done and fairly intuitive, but the operating word is “commercial.” I would not say that someone who uses commercial code does theoretical/computational work; it’s fine to use it if you want to test something or are an experimentalist comparing with a measurement, but this “computational” work by itself is not publication worthy. If it seems like I am bitter, that’s because I am — I cannot tell you how many times I have encountered someone who thinks that modeling and simulation are trivial because they equate modeling and simulation with using “canned” software and have no idea what it actually take to simulate a complex physical system on the computer with enough detail that you can perform numerical experiments on it.

In my field, there are several teams who have tried packaging and selling their code and they are generally puzzled that other computational people don’t want to buy it. Why would I buy it? I can write the same thing based on publications and have my own source code. If that sounds like needless duplication of work, that’s because it is. But most computational scientists have no use for a fancy GUI; give me the source code and some documentation, that will be useful, and we’ll probably still rewrite most of it. We are working on cleaning up some of our larger code and then putting it on GitHub.

I have some experimental colleagues for whom we have written small amounts of specialized code, and really all you need is to work with some representative users closely for a little while, because often what they want or need is really not what you’d think.

RFon, does this somewhat address your question? Readers, what say you?