Concrete Examples: Red Tests
18 November 2024
This post is a continuation of the concrete examples series, which documents concrete examples I have seen while working as a 'real' software engineer. Today's topic is unit tests that have likely never gone red.
As a reminder, 'seeing a unit test go red' is an integral part of the red, green, refactor cycle close to the heart of all TDD aficionados.
I present 5 examples of dubious tests I have seen in enterprise codebases, discuss why seeing the test go red in each example would have prevented the dubious test from getting checked into source control, then finally how to fix/avoid each scenario.
The examples are
- Swallowed runtime errors
- Asynchronous tests with negative assertions
- Cryptic assertion failures
- Assertions inside callbacks
- Miscellaneous mistakes
1. Swallowed runtime errors
Imagine you saw the following green test, would it give you confidence that the SendCommunicationService is correctly sending messages with the correct text?
Can you see the problem?
I only saw it because I got lucky during a long and frustrating debugging session (debugging a different test, in the same file). I had added some extra console output to the code under test as part of debugging:
To my surprise I saw the following error spat out when I re-ran the whole test file with the new logging present.
undefined method `to' for "Some Message2":String
A misplaced parenthesis (the assertion should read expect(message_payload[:text]).to.eq('Some Message')) caused a type error. The string class doesn't contain a to method, but the error was never raised by the test - it was being caught by the app code then silently discarded (silently until I added the extra logging, at least).
The test had been giving false positive feedback for 6 years. A message with any message text could have been passed to CommunicationQueue.enqueue(...) and the test would have reported 'no problems here'.
How would seeing a red test have helped?
If we were strictly following red-green-refactor process while developing the SendCommunicationService, we would have taken the following steps:
- Write the above test (before the code to retrieve the message text has been written)
- See the test go green straight away
- Realise the test is dubious and re-write it
How to re-write?
Use an argument matcher.
The root cause is the assertion being performed in the supplied mock implementation. To avoid this problem entirely, avoid using a mock implementation + assertions, and instead use argument matchers.
Now a mistake in the test assertion such as an extra parenthesis…
… will throw a syntax error before the tests run.
2. Asynchronous tests with negative assertions
I still get caught out by this scenario sometimes.
Let's start with a react component that makes a network request to fetch some communications then displays them in a list:
We've picked up a ticket to hide draft communications by default, we're in a bit of a rush but have the time to quickly knock up a unit test. The test first sets up a fake network request handler using msw returning a draft communication, then ensures the draft communication isn't visible in the rendered list component:
And remember the rush, but the test is so simple test we won't need to see it go red, right? So we'll quickly add some client side filtering by default:
The test goes green straight away after we add the filtering code, we push up the changes feeling pleased with ourselves. Just like that we've not only added a useless test, but a new feature not covered by tests! Can you see where the problem is?
Let's take a closer look at the test's lifetime:
On the 1st render cycle, the communications state variable is null (the network request to /communications handled by msw still takes some time to complete, like a normal network request), so the component will render:
<div>Loading...</div>
Next the assertion will run...
… and rightly pass after 1 render cycle, because there is no text on the screen containing 'Some Draft Communication'. If we remove the filtering code the unit test still passes - useless.
How would seeing a red test have helped?
If we were strictly following red-green-refactor process while developing the filtering, we would have taken the following steps:
- Write the above test (before adding the filtering code)
- See the test go green straight away
- Realise the test is dubious and re-write it
How to re-write?
Wait for the asynchronous action to complete.
By waiting for the loading to finish (e.g. using waitForElementToBeRemoved) before the assertion runs, we are interacting with the component in the same way end users interact with the app, thus giving us more confidence the app works 'for real'.
3. Cryptic assertion failures
A sign of a good test is an assertion failure that clearly describes why the test has failed. Here's a nice example:
The self explanatory message tells us exactly why the test has failed, and gives us a clue what the underlying problem with the calculate function is (i.e. probably NOT an arithmetic error, but a runtime type problem).
So try your hand at telling me what might trip up future developers in the following test, which is a very small part of de-duplicating long running requests:
The test looks completely innocuous when it goes green. When the test returns the incorrect request, on the other hand here's the assertion failure:
and here are the expected/received object graphs that are spat out to the console when the test fails. The object diff is so large it takes a couple of seconds to appear in the console, and took up 40 pages when I copied it into the first draft of this post.
Just imagine being confronted with this assertion failure and a 40 page diff while trying to fix a time critical issue 😕.
How would seeing a red test have helped?
If we were strictly following red-green-refactor process while developing the dedupe function, we would have taken the following steps:
- Write the above test
- See the test go red and see the confusing assertion failure/long object diff
- Realise the test is dubious and re-write it
How to re-write?
Either use a custom assertion, or supply a custom assertion failure message if possible (the latter not possible with vitest as far as I know).
Many testing frameworks allow custom assertions, for example in vitest:
Now use the new assertion…
…and see the much more appropriate error message:
4. Assertions inside callbacks
Sometimes we want to assert that 'something has happened correctly', for example a callback has been invoked with the correct parameters. Consider we're part way through creating a custom react modal component:
We've already added some buttons (+ tests that ensure the buttons exist) , but now it's time to make 'something happen' when the buttons are clicked. We want to add the following functionality and corresponding tests:
- When the cancel button is clicked, onClose is called with 'clicked cancel'
- When the ok button is clicked, onClose is called with 'clicked ok'
Here's a (bad) example of scenario A
The test is clearly well intentioned - if the modal supplied 'clicked ok' (or anything else incorrect) to the callback the test would fail. However, there's another scenario the test doesn't account for - can you see it?
What if onClose is never invoked at all? The assertion would never run and the test would go green, despite the modal not behaving as we intended.
How would seeing a red test have helped?
If we were strictly following red-green-refactor process while developing the cancel button, we would have taken the following steps:
- Write the above test (assuming the cancel button already existed, but does nothing)
- See the test go green straight away (clicking the button does nothing, so the test assertion never runs)
- Realise the test is dubious and re-write it
How to re-write?
Now in the 'nothing happens' scenario, we get a clear assertion failure that the onClose callback hasn't been invoked.
N.b. using expect.HasAssertions (discussed later later in the blog) might also help here, but it's no replacement for seeing a red test.
5. Miscellaneous mistakes
Mistakes are inevitable. Here's a concrete example I have seen quite a few times in the wild. Similar to example 4, we want to ensure a supplied callback is called when a button is clicked:
When skim-reading the test it looks fine (hands up who else has ticked a PR after only skim-reading it 🙃), but on closer inspection notice that the toHaveBeenCalled matcher isn't actually invoked by the test. Another case of a test with no assertion, which will go green straight away.
How would seeing a red test have helped?
If we were strictly following red-green-refactor process while developing the CustomButton component, we would have taken the following steps:
- Write the above test (assuming the onClick callback isn't yet respected by CustomButton )
- See the test go green straight away
- Realise the test is dubious and re-write it
How to re-write?
Although actually invoking the matcher (toHaveBeenCalled()) fixes the specific issue, it could conceivably happen again. To ensure such a dubious test can't possibly be written, some testing frameworks can be configured to fail tests when no assertions run. Jest and vitest both support expect.HasAssertions which will fail such tests with a message similar to:
Error: expected any number of assertion, but got none.
For testing frameworks without a similar feature, static analysis is the way to go - look for a linter/code analyzer that will flag such tests, e.g. for JVM languages
A test with no assertions is just one example of a mistake leading to a dubious test. Other examples include:
- Forgetting to mark a test as a test (e.g. not adding a Test annotation)
- Tautological assertions (expect(received).toBe(received))
- Using so many mocks there's no prod code actually being tested
All of which would be noticed by checking for a red test then finding none. All of which I have seen in real codebases.
In summary - save future developers working with your code from seeing red by check for it yourself first.