50 suggestions on correctness and testing for scientific software for PDEs

09 Jul, 2021

I think testing and debugging is one of the harder aspects of scientific and numerical software. It’s easy to get buried in giant pile of code and have no idea where to look for bugs. It’s even harder when actually the bug isn’t in the code but is in the concepts, math or data.

The fundamental problem with testing and debugging scientific software is that we don’t know correct intermediate values or even the correct final output. I helped write a previous post on similar topics but I wanted to write more on the topic. So, here are various suggestions that probably would’ve helped me at some point. I’d like to expand quite a bit more on individual points here in the future.

Don’t trust your code. Seriously, never trust your code.
Treasure correct code and use version control. Don’t lose your treasure.
Use existing correct code to maximum value because it means you can check intermediate values that would otherwise be untestable.
Hunt the internet for existing correct code. But don’t trust it.
Add one feature at a time.¹
Define “one feature” as narrowly as possible.²
Edges cases shouldn’t be an after thought. Sometimes solving an edge case can be informative about the common case.
It’s okay to spend a couple hours just thinking about how to design a test problem that tests only one new feature.
Any time you can, compare with an analytical solution.
The method of manufactured solutions (MMS) is amazing.
Use MMS even if it requires implementing some extra features. It will be worth it.
Check the order of accuracy/convergence rate!
Even on problems where the true solution is unknown, it’s possible to check for convergence rate by comparing with a very high accuracy solution.
Be a more careful programmer. Sometimes, It’s okay to spend thirty minutes just thinking through each line of code.
Learn your debugging tools and IDE! Line-by-line step through debugging is incredibly helpful.
Don’t stay stuck for more than a couple hours. Try something different.
When you notice that something looks wrong, try to encode the meaning of “wrong”.
Be a faster programmer. Sometimes, just trying a bunch of things is the right approach.
Use smaller and simpler test problems so that you can iterate really fast.
Don’t trust evaluation code. Look at the output. Bugs in evaluation code can make you waste a huge amount of time.
Make lots of figures and videos. Visualizing a problem is often very effective for debugging.
Test derivatives with finite differences.
Test symmetries and invariances. Many problems are rotationally symmetric. Or invariant with respect to rigid body motion.
Test more algorithmic properties. Optimization algorithms often guarantee a decrease in objective at each step!
Log lots of info. It’s okay to save a full matrix at each time step.
Prototype! The first version of the code should not be well designed or fast.
Write at least the first version in a language like Python or Julia where edit-(compile)-run cycles are on the order of a second.
Doing the first phases of development in a Jupyter notebook (or similar) is really fast. Iteration time is extremely important.
Write a second version from scratch, maybe in a different language or style, and compare the results. You might catch some bugs.
Fast tests are more useful than slow tests.
Characterization tests (aka “golden master tests”, aka “freeze tests”) are useful for preventing unexpected changes. But they can’t define correctness and they are brittle.
Robust tests are more useful than brittle tests. False alarms are a bummer.
Automated tests are more useful than manual tests.
Continuous integration is normally worth the effort.
Remove randomness in testing by setting a random seed. Even if true randomness is necessary for correctness. Consistency in testing is very important.
Use symbolic algebra tools to develop test cases. I frequently use Sage and sympy.
Use multiprecision and arbitrary precision arithmetic to develop test cases. I’ve used mpmath and MPFR.
Sign errors can often be solved with guess and check. Verify with the math later! I’ve found a lot of mild math errors or misunderstandings this way.
Guess and check also works in some other areas. But, don’t flail around in the dark for very long.
Ablation testing: remove some component of your system and verify that the performance degrades as expected.
Look for gaps or overlaps and other problems in your meshes.
Check your normal vectors. Should they point inwards or outwards?
Corners are scary. Start with something smooth like a circle or sphere.
The math and theory is often more important than you think.
Violating function space and regularity requirements can bite you. See “corners”.
Start with direct linear solvers. Iterative solvers introduce a whole new class of problems with preconditioning.
Check the condition number of your matrices.
Use standard tools and libraries where they are sufficient.
Test single threaded first. Then on two cores. Then many. And write the CPU version before the GPU version!
Finally, remember not to trust your code.

Footnotes and links

I’ve also worked in machine learning, statistical software and deep learning a lot. I think a similar list would apply in those areas, but I’d have to add and remove a few points.
Best Practices for Scientific Computing
Good enough practices in scientific computing
Sharing your code feels related
Is it worthwhile to write unit tests for scientific research codes?
How to test scientific software?
How to write integration tests for numeric simulation software?
Strategies for unit testing and test-driven development
Why We Don’t Teach Testing (Even Though We’d Like To)
Spinning Up as a Deep RL Reseacher (A great high-level summary of how to do effective research in Reinforcement Learning)
Design for testability

We should try to find the smallest testable unit of code. For example, when writing a fast multipole method implementation, you could start by writing code for a 3x3 subdivision of the domain with only source approximation. This would test the source-side approximation code in isolation. Then, you write the target-side approximation code separately and test that. Then, you move to a tree structure and test that component separately. ↩︎
As an aside, I think most test driven development (TDD) advice is insane, particularly when applied to code for numerical methods. TDD focuses on tiny, (almost?) meaningless tests. In most numerical methods work, a unit of testable work is often an entire solution to a particular physical problem, perhaps including convergence tests. It’s perfectly fine and maybe even good to write that “test” before writing the implementation of the numerical method. I do it often. But, that sometimes still means that I have several hundred lines of code that I need to write before I can see if the test passes or even runs at all. The problem isn’t really in the spirit of TDD, but in the focus in books and examples on tiny increments of work and on requiring the tests to be written first. ↩︎