Author Topic: Cyclomatic complexity (Read 3570 times)

peter-h · « **on:** July 18, 2023, 09:06:50 pm »

What is a good cyclomatic complexity?
Any function with a cyclomatic complexity below 10 can be considered simple and testable while a cyclomatic complexity greater than 20 indicates an overly complex function, so an acceptance threshold for functions should be defined between 10 and 20, depending on the application domain.

Looking at the Cube IDE analysis of my project:

My code is the top one, and the 2nd from the bottom.

Does this mean anything?

The GPS function is large - about 500 lines.

DiTBho · « **Reply #1 on:** July 18, 2023, 09:29:50 pm »

I described something similar here to summarize briefly why it's useful for my team and me and in which context.

golden_labels · « **Reply #2 on:** July 19, 2023, 08:08:35 am »

In practical terms higher cyclomatic complexity means one has to create more test cases to check a unit of code.

At some point the task of testing the code becomes unfeasible. The values presented are way above that point. This is just a rough estimate, in particular for automated evaluation. For example execution paths in separate segments of code being highly correlated with each other would quickly draw the number of test cases down. Though that might indicate problems of different kind. If exception handling fragments are counted as branches, CC may be an order of magnitude higher than the value bearing realistic significance. An automated tester can’t tell, that these 10 handlers are just choosing a different message and are not affecting the primary logic of the program.

peter-h · « **Reply #3 on:** July 19, 2023, 08:39:11 am »

OK so this is for automated testing.

But software doesn't degrade over time - old jokes notwithstanding

So why do you need production testing? Let's say you have a 500 line function which takes in an NMEA packet and handles say 10 different cases of something. You just test those cases during development. As regards testing different data exposing issues, you could not test that during production anyway, but during dev you could test it with pseudo random data.

DiTBho · « **Reply #4 on:** July 19, 2023, 09:30:02 am »

Quote from: golden_labels on July 19, 2023, 08:08:35 am

In practical terms higher cyclomatic complexity means one has to create more test cases to check a unit of code.

For military things that can cause a mission to fail and, worse still, for civilian things that can cause people to die: not only for "automatic tests" where you click, wait, and see, the measure of complexity is also important for the test-case and test-report activities(3)!

The diagram that I have illustrated in the link makes it clear that the higher the measured complexity, the more time it takes to be able to conclude (for the testing-team) the testing phases: not only because more tests are needed(1), but also because the tests for triplets are "exponentially hardened" (2) with the measured complexity.

(1) it's not "let's inject some random patterns from /dev/rand0 and see what happens"
(2) the testing team has to carefully plan, cleverly implement, and rigorously verify and report.
(3) I remind you that it is highly discouraged that the same people who develop test their own code. Two teams are needed { developers, testers } (possibly that they speak to each other ONLY and only through test-reports); when it is not possible to divide these two activities into two teams, at least a week's break is needed, a "mental context switch". If you don't care about that, you don't care about any measure of complexity, it's just not in your need as your customers can survive your bugs without catastrophic consequences.

newbrain · « **Reply #5 on:** July 20, 2023, 12:03:29 pm »

Quote from: peter-h on July 19, 2023, 08:39:11 am

OK so this is for automated testing.

Not only that , though.
High cyclomatic complexity means - very roughly - the code is not maintainable because it cannot be understood by someone new or even your future self.
It also means that even if understood, it becomes extremely hard to modify to correct bugs and implement new functionality.

Those numbers are way too high.

I've seen this first hand with a FOSS product I was the PO for, we were the first contributors after inheriting it from the previous ones.
It had some parts with CC up in the 130 range

.
Each correction or update would take ages and cause cascading effects in the jungle of nested corner cases - the definition of brittle code.
Testing was a nightmare, as DiTBho very clearly explained - we are not kept to his safety standards, but reliability is very important on our products.

peter-h · « **Reply #6 on:** July 21, 2023, 10:44:41 am »

Maybe the large number of lines in some of my code is due to me using C in a simple way, to make it easily readable for me in the future.

That 500 line function is a large switch statement, with each branch of it containing lots of weird code, picking out the values out of GPS packets of different kinds.

I can see a clever coder would have made it table-driven, with tables of offsets into the packets for different values, and with some values read with scanf and others with atoi etc, then maybe have a table of function pointers for actually picking the values out.

SiliconWizard · « **Reply #7 on:** July 21, 2023, 09:23:23 pm »

Apart from what has been said:

Note that a large number of lines in a function doesn't necessarily imply a large number of execution paths. It just makes it likely of course, but you could have a very long function with just one execution path.

Generally speaking:
- More executions paths => harder to maintain, harder to test and harder to prove correct (if that's one goal);
- More LOCs (irrespective of # of executions paths) => harder to read, thus potentially harder to reason about (humans are able to remember and reason about relatively short pieces of text).

Often, high cyclomatic complexity means that the code has not been factored enough (meaning, there's a lot of redundant code, even when it's not plain copy-and-paste, and overall not enough structure).

Now of course, if you go to the extreme of splitting your code into ultra small functions, each function will be much easier to test/read/reason about, but now you'll be shifting the problem to the interfaces between them, so all effort will have to be made to define interfaces that make sense, which is also difficult.

cfbsoftware · « **Reply #8 on:** July 22, 2023, 12:12:43 am »

Quote from: peter-h on July 21, 2023, 10:44:41 am

That 500 line function is a large switch statement, with each branch of it containing lots of weird code, picking out the values out of GPS packets of different kinds.

'500 lines' and 'weird code' are immediate red flags to me. As a general rule of thumb I limit the size of a function so that I can see it all on the screen at once without having to scroll. In my case that's about 60 lines of code.

It sounds to me like the design of your program needs to be changed, not the way you have coded it. Before you do that, read the book Programming Pearls by Jon Bentley that I recommended recently in the discussion here: Familiar with programming, not prolific, any advice?

DiTBho · « **Reply #9 on:** July 22, 2023, 01:16:14 am »

Quote from: cfbsoftware on July 22, 2023, 12:12:43 am

In my case that's about 60 lines of code.

my-c refuses to compile
- a function with more than 50 lines of code (comments don't count)
- a function with more than 5..10 (1) linearly independent paths

this enforces low complexity, but doesn't help with interfaces, a problem that my-c tries to face with monads.
(this part is research and development, entirely experimental)

(1) my-c uses the parameter "level"
"level1" imposes stricter restrictions than "level3"
With complexity={ 1 .. 10 } you should have high testability structured and well written code

golden_labels · « **Reply #10 on:** July 22, 2023, 02:09:36 am »

Quote from: peter-h on July 21, 2023, 10:44:41 am

I can see a clever coder would have made it table-driven, with tables of offsets into the packets for different values (…)

Table-based state machines, some cases of LUTs and similar solutions may bring down the cyclomatic complexity value as calculated by an automated analyzer. But this is just “cheating” in the metrics. Not really improving them.

An analyzer is a naïve tool, which counts conditionals.⁽¹⁾ You can “hide” conditionals in data and thus make the tool “believe” there is less paths than there actually is. But I hope you see this is not helping in practice.

⁽¹⁾ In all their flavors used by the language. For C that includes switch and loops.

Siwastaja · « **Reply #11 on:** July 22, 2023, 05:59:01 am »

If you have 1000 small functions, then the problem is just pushed elsewhere. Now you would need to create a test case for every one of them, and the complexity forms as in the interaction between those functions, which you then again need to test with a larger integration test. As usual, there is some kind of golden middle way.

peter-h · « **Reply #12 on:** July 22, 2023, 11:36:40 am »

Quote

An analyzer is a naïve tool, which counts conditionals.(1) You can “hide” conditionals in data and thus make the tool “believe” there is less paths than there actually is. But I hope you see this is not helping in practice.

Exactly what I had in mind.

I don't see how to make this particular problem "simpler". I am presented with a range of packets which contain different data in different places. One has to code it "somehow".

Siwastaja · « **Reply #13 on:** July 22, 2023, 11:50:23 am »

Yes. There is nothing complex in a long switch case which makes a simple X->Y mapping based on a selection variable:

Code: [Select]

int f(int se)
{
   switch(sel)
   {
      case 0: return 5;
      case 1: return 2;
      case 2: return 1;
      case 3: return 77;
      default: return -1;
   }
}

(Or the equivalent of calling a parser function.)

Yet according to cyclomatic complexity reference tables, if that grows to 50 elements then it's "unmaintainable, untestable mess". If you then refactor it in a weird tree:

Code: [Select]

int f_1to10(sel)
{
   switch(sel)
      // cases 1 to 10
}

int f_11to20(sel)
{
   switch(sel)
      // cases 11 to 20
}
...

int f(sel)
{
    if(sel < 11)
        f_1to10(sel);
    else if(fuck that, you got the point
}

then it suddenly becomes Very Good Code because Cyclomatic Complexity of each function stays below 10. What utter bullshit. I don't think this metric is useful at all.

AndyBeez · « **Reply #14 on:** July 22, 2023, 05:57:23 pm »

Cyclomatic complexity? Please do not mention that term in front of I.T. and System Development managers as they will all be using the tool to root out "bad coders".n [ Just keep them foaming over the A.I. hysteria for now. ]

Efficient code is sometimes hard-to-read code. Or near impossible to read code for a newbie. I wonder what cyclomatic complexity rating an electronic circuit diagram would have. OMG they're using mosfets! Simple to me means modular, functional and efficient. And if that means using clever stuff that's complicated, rather than hundreds of IF constructs, then so be it.

cfbsoftware · « **Reply #15 on:** July 22, 2023, 10:11:22 pm »

Quote from: peter-h on July 22, 2023, 11:36:40 am

I don't see how to make this particular problem "simpler". I am presented with a range of packets which contain different data in different places. One has to code it "somehow".

Without knowing the details of your data it is difficult to say. However, a general approach to a problem of this type might be:

1. Write a function that transforms the input into a number of data structures that are easier to process. This might be a distinct record for each type of packet if they were all very different or a smaller number of variant records (unions) if the packets contain a lot of common data types.
2. Write functions to extract values from each type of packet. GetDate, GetCoordinates etc.
3. Call the extraction routine with the parameters needed for your application.

If your data was in XML format for example, then code would already exist for step 1 and the problem becomes much simpler.

ejeffrey · « **Reply #16 on:** July 22, 2023, 11:57:41 pm »

Nobody is seriously arguing that reducing code complexity in the abstract is bad. The argument is whether cyclomatic complexity is good for anything other than putting a veneer of graph theory respectability on what's at best a very rough heuristic.

I think it's a pretty weak measure. For functions I think just dumb "lines of code" and "maximum nesting depth" are simpler, more obvious, and more useful. Of course none of them are useful without context. I giant switch statement in a parser is a great example of a construct that is often the best and most testable and maintainable than any refactoring that "improves" your code metrics.

golden_labels · « **Reply #17 on:** July 23, 2023, 05:07:51 am »

Cyclomatic complexity is a tool. As with any other tool, it may be inappropriate in some cases. This truism is neither an excuse for fallacies in reasoning about the tool, nor a permission to ignore what the tool indicates.

Cyclomatic complexity isn’t a proxy variable, which was chosen merely because somebody noticed a correlation with the value of interest.⁽¹⁾ CC has been constructed as a mathematical formulation of the measured value itself. This gives it a strong theoretical foundation. As a result of deductive reasoning it can only be challenged in limited number of ways. Even if the arguments make sense. Much less if they are in the form of “there exist corner cases, therefore it is wrong” or “I constructed a synthetic and obviously inadequate case,⁽²⁾ therefore wrong”.

It has shortcomings. There are corner cases, where it performs poorly. But corner cases are rare and can’t be used to explain a large number of functions having high CC. As I mentioned earlier, automated evaluators may not be able to “understand” some patterns: for example multiple execution paths are equivalent from program’s logic perspective. But these patterns are objectively recognizable. “I wrote this code” is not one of them. Neither is “this tool I stupid and can’t understand my genius”. CC assumes mutual independence of edges and it will overestimate, if this condition is not met. But if you have a big function with many edges and they are highly dependent on each other, it is very often an indication of a larger problem. Finally, cyclomatic complexity does not make statements about some arbitrarily chosen interpretation of code quality. It tells something about testability and maintainability. Yes, highly optimized code segments may have high CC. But guess what: they are also testability and maintainability nightmare. They may be neccessary evil, but they remain evil.

Code metrics are misused by management. This alone does not make them invalid. If that was the case, we would need to reject all maths beyond basic arithmetic. “My boss uses a screwdriver to hit nails, so I will not use it for screws” seems to me like a poor choice. I think it is smarter to understand the tool and make the best use of it.

Even if this means accepting, that our code is not as amazing as we might have thought it is.

⁽¹⁾ In opposition to — for example — LoC.
⁽²⁾ Expressing data as code.

peter-h · « **Reply #18 on:** July 23, 2023, 06:47:00 am »

Quote

1. Write a function that transforms the input into a number of data structures that are easier to process. This might be a distinct record for each type of packet if they were all very different or a smaller number of variant records (unions) if the packets contain a lot of common data types.
2. Write functions to extract values from each type of packet. GetDate, GetCoordinates etc.
3. Call the extraction routine with the parameters needed for your application.

Only an expert programmer is going to understand that sort of code

But yeah many people love structures of structures and typedefs and structures of typedef structures.

Especially if one does what real experts do: put in almost no comments

ejeffrey · « **Reply #19 on:** July 23, 2023, 03:42:10 pm »

Quote from: golden_labels on July 23, 2023, 05:07:51 am

Cyclomatic complexity isn’t a proxy variable, which was chosen merely because somebody noticed a correlation with the value of interest.⁽¹⁾ CC has been constructed as a mathematical formulation of the measured value itself. This gives it a strong theoretical foundation.

This is exactly my problem with CC. This statement is wrong. It has no more theoretical foundation than lines of code and it doesn't more directly measure the property of interest. But because it's fancy graph theory math it makes people think it's somehow special.

Nobody has that sort of misconception about lines of code.

golden_labels · « **Reply #20 on:** July 23, 2023, 04:59:31 pm »

Cyclomatic complexity is based on fancy graph theory?

I am not sure, where this came from. If this is because I wrote “strong theoretical foundation”, this is not what I meant. Yes, McCabe uses Betti number as the value. But calculating circuit rank is not the goal of the paper. The goal is to give an estimator for — putting it in his own words — “intuitive notion of ‘minimum number of paths’”. Which is a quasi-formal way of writing “number of test cases required”.

I do not know, what kind of grime cyclomatic complexity has accumulated in the past 40 years. Given how inventive “we will sell you the next great management methodology” people are, I assume you might have heard some wild promises about CC. But here, in this thread, we talked about how high CC numbers indicate problems with — specifically — code testability and maintainability.

In case you meant later sections, where McCabe uses, these are: discussing a particular calculation algorithm (not the measure itself), splitting a program into subgraphs (irrelevant, as we’re already talking about a program decomposed into functions), and applications to unstructured programming (again, not discussed in this thread). In case you mean the general “mathsy” language: not only this is a language expected by people, to whom this paper has been addressed, but in the 70s nobody would treat McCabe’s publication seriously if it used different wording.

ejeffrey · « **Reply #21 on:** July 23, 2023, 05:44:13 pm »

Quote from: DiTBho on July 23, 2023, 10:29:11 am

Quote from: golden_labels on July 23, 2023, 05:07:51 am
But corner cases are rare and can’t be used to explain a large number of functions having high CC

CC gives a fairly precise indication about the number of test-cases for white-box branch coverage.

That is true, but I'm not sure why it's so important. Either you have test coverage or you don't. If you have coverage you don't CC to tell you how many test cases you need. Refactoring may or may not reduce the number of total test cases, even when it improves understandability or maintainability.

I'm much more concerned about how legitimately hard to understand a function is than the number of test cases, and while CC purports to provide an objective measure of that, I think it is fairly weak. Not zero value but it's value is perhaps eclipsed by it's propensity to be treated as more rigorous than it is.

DiTBho · « **Reply #22 on:** July 23, 2023, 06:54:13 pm »

Quote from: ejeffrey on July 23, 2023, 05:44:13 pm

Either you have test coverage or you don't. If you have coverage you don't CC to tell you how many test cases you need

Between the draft and the first engineering version you have no plan for any test-cases.
During these steps CC gives you an idea of the effort required to prepare test cases for branch coverage, which is just one of the analyzes to do.

If it doesn't sound useful, I don't think you need it.

ejeffrey · « **Reply #23 on:** July 23, 2023, 07:12:23 pm »

Quote from: golden_labels on July 23, 2023, 04:59:31 pm

I do not know, what kind of grime cyclomatic complexity has accumulated in the past 40 years. Given how inventive “we will sell you the next great management methodology” people are, I assume you might have heard some wild promises about CC.

Frankly, in working with profession software engineers and engineering managers, I basically never hear about it. I don't think most of the software engineering world treats it as much value at all. Certainly many people know what it is, and it is a feature a lot of code analysis tools include, it's just not something people pay much attention to. It's like something you learned about once by reading the sidebar of a CS text book. I'm sure there are domains where it is more widely used, but in terms of code health metrics people use I think its pretty far down the list.

Quote

But here, in this thread, we talked about how high CC numbers indicate problems with — specifically — code testability and maintainability.

I'm only responding to what was said in this thread. As I said, in my professional career nobody talks about it much. What I am saying is that I don't think looking at CC provides much value in most cases. It's not that there is no correlation, just that there are much more important metrics, testing methodology has changed dramatically, and the added value of CC is relatively small.

cfbsoftware · « **Reply #24 on:** July 23, 2023, 09:46:28 pm »

Quote from: peter-h on July 23, 2023, 06:47:00 am

Quote
1. Write a function that transforms the input into a number of data structures that are easier to process.
Only an expert programmer is going to understand that sort of code

Far from it. Here's an exercise for you to do which illustrates how transforming data makes it easier to work with. You are old enough (but not too old I hope

) to remember dealing with pounds, shillings and pence (LSD). Write a handful of functions to be used in an application that has numerous instances of:

1. Adding two LSD values together
2. Multiplying an LSD value by an integer
3. Dividing an LSD value by an integer (with a remainder if necessary)


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Cyclomatic complexity (Read 3570 times)

Share me