Cylcomatic Complexity Complexity

05 September 2019

This blog requires javascript enabled sourced from github.com and my site for full code formatting/syntax highlighting/tooltips. Gists.

This blog will give a brief introduction to cyclomatic complexity and how it might change over time.

There are all sorts of metrics we can measure or reason about our code, something simple such as number of lines, perhaps something a little more involved - the runtime complexity (how an algorithm scales as the input size increases) of our code. Maybe we’ve considered the cyclomatic complexity (CC) of our code (number of distinct paths through our code - refresher coming up!). But have we ever considered the cyclomatic complexity... complexity of our code? Is that even a thing?? I’m going to at least think about it.

First let’s recap how to calculate CC using some simple examples I have seen all too many times, (mostly) in very old production code concerning search form validation. The business logic is to always require at least one search criteria to prevent open searches grinding the system to a halt.

And here’s our validation code (recall that returning false stops the form submitting) that ensures a user always types at least one field in.

There are a few methods of calculating CC, drawing out a directed graph of the code is often a good way to start. The numbers in each node (circle) do NOT have anything to do with the order the graph is traversed when our code runs, rather are present so we can refer to each node later on. Each node roughly represents one action our program will take and each edge (arrow) represents a possible path from one node to the next. For example node 2 represents the Form.Name.value === "" check, the edges marked as true and false are the two possible paths of execution after evaluating the expression.

Start
Name empty?
Description empty?
Date empty?
Show error, don’t submit form (by returning false)
Submit Form (by returning true)
End

Method 1, listing the distinct paths through our code (without cycles!)...

1,2,3,4,5,7
1,2,6,7
1,2,3,6,7
1,2,3,4,6,7

...that’s a CC of 4.

Method 2, using the formula E − N + 2 where E is the number of edges in the graph and N is the number of nodes. That's

9 - 7 + 2 = 4

Method 3, count the number of faces (regions) in the graph, which again is 4 (including the region around the outside of the other 3 bounded regions). These are displayed on the graph below in the coloured boxes.

It's often a goal of programmers to keep the CC of a program low, the fewer routes through some code the fewer number of tests are required to completely cover it. Keeping it low will also help future colleagues (or worse, yourself) understand your code - the less complex, (hopefully) the easier the code is to understand.

Now we’ve reminded ourselves how to calculate CC and why one might care about it let’s define cyclomatic complexity complexity (CCC) as how the CC of our code will change as more functionality is added. Doesn’t make sense? Here’s an example working with the code above.

We know our CC at this moment in time is 4, but how will it change as more functionality is added? (Phrased another way, what is the CCC of our code?)

A new ticket comes in - “Users should be able to search on colour”. This looks like a quick win. We only need to type for a few seconds to get another pull request under our belts and please the end-users with our new feature we’ve worked ever so hard on. Here’s the new code:

And here’s the new graph to help us calculate the CC (actual calculations is an exercise left to the reader):

Start
Name empty?
Description empty?
Date empty?
Colour empty?
Show error, don’t submit form (by returning false)
Submit Form (by returning true)
End

It's a huge disaster, by adding one measly piece of functionality our complexity is now 5 - an increase of 1. It doesn’t stop there. Consider adding more and more search fields in a similar manner - the CC of our code is going to scale linearly with the number of search conditions, which is undesirable. One might quantify this by stating our code has CCC(n) - the complexity will scale at the same rate that we add inputs, much like an O(n) algorithm. How can we fix this? How would you fix this?

Here’s one very simple attempt - adding a data structure.

Graphing out the code, we can see that the CC is immediately reduced to 3. Note if you count the paths through the code graph in this example one might arrive at an infinite cyclomatic complexity by way of having an infinite number cycles bouncing between nodes 2 and 3. For this reason cycles are omitted when calculating cyclomatic complexity.

Start
Are there more inputs in the list to check?
Is the current input empty?
Submit Form (by returning true)
Show error, don’t submit form (by returning false)
End

So we’ve already had one win, but let’s consider what our CCC metric is for the new version, to help us imagine let’s add another search condition:

We don’t even have to put the effort into graphing this code out. The only change we’ve made is adding to a data structure, which won’t change how the program flows. Our code has CCC(1), that is to say the CC will scale constantly (i.e. won’t change) when we add more inputs.

So cool we’ve invented a new quality metric, but what does it tell us? CC gives us a rough estimate of the number of test cases we need to ‘fully’ test a section of code, going back to our second example we need at least 4 test cases to test every possible path through the code, listed exhaustively:

Start
Name empty?
Description empty?
Date empty?
Colour empty?
Show error, don’t submit form (by returning false)
Submit Form (by returning true)
End

Test Case 1 - Non-empty name: 1,2,7,8
Test Case 2 - Non-empty description: 1,2,3,7,8
Test Case 3 - Non-empty date: 1,2,3,4,7,8
Test Case 4 - Non-empty colour: 1,2,3,4,7,5,8
Test Case 5 - Empty everything: 1,2,3,4,5,6,8

Our test cases will scale the same way as our CCC scales: every new input we add we’re forced to add a new test case.

In our updated example the story has a happier outlook. Since our code has CCC(1), to test every path through our program we will only ever have to have three test cases:

Start
Are there more inputs in the list to check?
Is the current input empty?
Submit Form (by returning true)
Show error, don’t submit form (by returning false)
End

Test case 1 - Empty list of inputs to check: 1,2,5,6
Test case 2 - Non-empty List of inputs containing a valid: 1,2,3,4,6
Test case 3 - Non-empty List of invalid inputs: 1,2,3,2,5,6

Note: cases 2 and 3 could have more cycles between nodes 2 and 3 here e.g. 1,2,3,2,3,4,6 which still test the same path as 1,2,3,4,6)

So testability gets better as our CCC drops, what about the code’s maintainability?

Perhaps we need to change the logic of what constitutes a valid search term. An arbitrary function isValidSearchTerm has been provided which we’re to use instead of empty string comparisons. What changes must we make to each version?

Version 1

That’s 4 changes, and we don’t need a math degree to reason that again the number of changes scales with respect to our CCC metric. The more search terms we add, the more code changes we require - a side effect of duplicating our empty string check all those times. I think the code smell is easier to spot when we use isValidSearchTerm, but that doesn’t stop either version committing the sin of code duplication to the same degree. Version 1 is also susceptible to sloppy and paste errors, heck I even forgot to change Form.Date to Form.Colour on the initial draft when i was adding the new search term, so shoutout to all of my english teachers advising a proof-read.

Version 2

That’s just the 1 change, again the number of additions scales with respect to our CCC metric. I know which version i’d rather be responsible for maintaining!

So in summary, thinking about the CC and CCC of our code will help to keep it easy to test and easy to understand.