Groovy code coverage issues
Lately, while working on a microservice in Groovy, I discovered interesting issues with measuring code coverage for Groovy projects. If you struggle with suspiciously low coverage yourself and don’t know what’s happening, read on.
After putting Cobertura test report of my rather thoroughly tested code into Sonar I saw surprising numbers: line coverage 95%, branch coverage 43%. Such low branch coverage? Something must be wrong…
Missed branch coverage
For the record, the project is in Groovy 2.3.2, without indy, on Java 8 and analyzed with Cobertura 2.0.3. Have a look at some of the examples of branches not covered as shown in Sonar. Line coverage in these files is reported as 100%.
Example 1:
Example 2:
How come calling super in the constructor of a RuntimeException
subclass counts
for 6 branches? And a simple if has as much as 10 branches? No wonder the reported
branch coverage is so low! But where do these extra branches come from? Well, what
you see in the source code is not what you get in the bytecode.
Dive into the bytecode
Let’s have a look at the first case with calling super.
59: ldc #4 // class java/lang/RuntimeException
61: invokestatic #45 // Method org/codehaus/groovy/runtime/ScriptBytecodeAdapter.selectConstructorAndTransformArguments:([Ljava/lang/Object;ILjava/lang/Class;)I
64: aload_0
65: swap
66: lookupswitch { // 5
-2020310112: 116
-1428966913: 137
-947674026: 156
-255735978: 201
39797: 232
default: 241
}
Calling super compiles into calling ScriptBytecodeAdapter.selectConstructorAndTransformArguments
from Groovy runtime package which selects the matching constructor among 5 constructors of
RuntimeException
class. Together with the default case that makes for 6 branches in the
bytecode in total, out of which only one is covered by the tests.
Now, the bytecode for the second example is quite lengthy so I decided not to include it
here. In short, that simple if statement with a throw compiles into around 80 bytecode
instructions among which you can find 5 branch instructions: ifeq
and ifne
. You would
expect only a single branch instruction, but Groovy complicates the compiled code by
including some performance optimizations. Unfortunately, some of the resulting optimization
branches were not reached from the unit tests.
Partial solution – disable groovy optimizations
While we cannot do anything with the first case, we can control optimizations performed by Groovy. It is possible to compile the sources with all optimizations disabled. This way the bytecode for the second case is reduced to 40 instructions with only a single branch instruction. As a result only 2 branches are reported by Cobertura, all of them covered.
You probably don’t want to disable the optimizations in your production build, just in your separate Sonar build. If you use Gradle, you can achieve this for example this way:
gradle.taskGraph.whenReady { graph ->
if (graph.hasTask(':cobertura')) {
compileGroovy.groovyOptions.optimizationOptions.all = false
}
}
After disabling the optimizations the number of false negatives in the Cobertura report decreases. In my project this led to an increase of over 5% in the reported branch coverage. Not much, but closer to truth.
Enabling invoke dynamic support increased the branch coverage by 3%, whether with or without optimizations. I haven’t analyzed this deeply though.
Trying Jacoco
Jacoco is the second of the two main opensource coverage measuring tools. I applied
its latest version, 0.7.1.201405082137, on my project and the reported coverages were
even more off: line coverage 33%, branch coverage 9%. Disabling the optimizations
increased the line coverage to 48% and branch coverage to 12% however. This difference
compared to Cobertura results comes from Jacoco detecting Groovy generated code
(e.g. by @Immutable
transformation) and reporting it as not covered – which Cobertura didn’t do.
Conclusions
There’s not much you can do about incorrect code coverage reported by these opensource tools. They are written with Java in mind and support for other languages is still somewhere in their roadmap. Actually, I haven’t tried another popular tool – Clover. It’s not free, but perhaps it supports Groovy better.
On the bright side, I read Peter Niederwieser – an active member of the Groovy community – saying that the situation might improve with Groovy 3.0, which will be designed around Java invokedynamic, meaning that less “magic” byte code will have to be generated. And that gives us hope for the future.