A friend from college found my blog, and to my delight made some suggestions. I had to promise, though, to include a diatribe against “offside-rule” languages, scripting, and automatic memory allocation. I may never again get a job writing Python or Go applications, but here I go…
Offside-rule languages, such as Python and F#, use whitespace indentation to delimit blocks of statements. Its a nice clean syntax and a maintenance nightmare. I would have suffered less in my life without the hours spent deciphering the change on logic that cutting and pasting code between different indentation levels. It’s especially bad when you’re trying to find that change in logic that someone else induced with their indentation error.
Taking it to an extreme the humorous people Edwin Brady and Chris Morris at the University of Durham created the language of Whitespace (https://en.wikipedia.org/wiki/Whitespace_(programming_language) (the wikipedia tag is prettier than the official page which only seems available on the Wayback Machine (http://archive.org/web/).
For full disclosure, I do use Python when I’m playing around with Project Euler (https://projecteuler.net/). It is the ideal language for quick number theory problems. In a professional context Python has proven to be a nightmare starting with the compiler crashing with segmentation faults on what I thought were simple constructs, lack of asynchronous and multi-threaded features (try implementing an interactive read with a timeout, or fetching the standard and error output from a child process). Complete the nightmare with a lack of compatibility between Python releases.
How To Get a Legacy Project Under Test.
You’re smart, so I’ll just give the outline and let you fill in the blanks:
0. Given: you have a project of 300K to millions of lines of code largely without tests.
1. Look at your source control and find the areas undergoing the most change. Use StatSVN’s heatmap with Subversion With Perforce, just look at the revision numbers of the files to detect the files undergoing the most change. With git, use gource or StatGit. The areas under the most change are the areas you want to refactor first.
2. In your chosen area of code, look at the dependencies. Go to the leaves of the dependency tree of just that section of code. Create mock function replacements for system functions and other external APIs, like databases and file i/o, that the leaf routine use.
3. Even at this level, you’ll find circular dependencies and compilation units dependent on dozens of header files and libraries. Create dummy replacements for some of your headers that aren’t essential to your test. Use macro definitions to replace functions — use every trick in the book to get just what you want under test. Notice so far you haven’t actually changed any of the code you’re supposed to fix. You may spend a week or weeks to get to this point dependency on the spaghetti factor of the code. Compromise a little — such as don’t worry about how to simulate an out-of-memory condition at first. Hopefully you’ll start reaching a critical mass where it gets easier and easier to write tests against your code base.
4. Now you get to refactor. Follow the Law of Demeter. Avoid “train wrecks” of expressions where you use more than one dot or arrow to get at something. Don’t pass all of object when all it needs is a member. This step will change the interfaces of your leaf routines, so you’ll need to go up one level in the dependency tree and refactor that — so rinse and repeat at step 3.
5. At each step in the process, keep adding to your testing infrastructure. Use coverage analysis to work towards 100% s-path coverage (not just lines or functions). Accept you’re not going get everything at first.
What does this buy you? You can now add features and modify the code with impunity because you have tests for that code. You’ll find the rate of change due to bug fixes disappears to be replaced with changes for new salable features.
On the couple of projects where I applied this methodology the customer escalation rate due to bugs went from thousands a month to zero. I have never seen a bug submitted against code covered with unit tests.
I assume your diatribes against scripting and automatic memory management are in the queue?
I would like to point out that we may be up against it if we are trying to get rid of the offside rule. It is a very old rule and some of the coolest and most academic languages use it.
Peter Landin invented the offside rule in 1966, if I’m not mistaken, for the conceptual programming language ISWIM (https://en.wikipedia.org/wiki/ISWIM), which inspired all the well-known functional languages, and all of them adopted the rule, with slight variations. Of course ISWIM was described in Landin’s paper, “The Next 700 Programming Languages”. Haskell’s version of the offside rule amounts to (1) accept curly braces (2) use the offside rule to generate curly braces (3) if there are syntax errors at certain keywords then insert curly braces to see if that fixes the error. Scheme has SRFI 119 which proposes to adopt the rule into a LISP dialect.
Rust uses curly braces, and depending on the lifetimes you need for your state variables, doesn’t use a heap. Plus it’s compiled. Is it the ideal language?