How much code coverage is enough?

Code coverage is an important benchmark for the quality of software tests. But how much coverage really makes sense for your project?

Reading time:: 6 min
: Share

Code coverage, or test coverage, measures how much of your code is covered by tests - but what does high coverage actually say about the quality of your software? High test coverage can reduce the defect rate, but is 100% coverage always useful? This article looks at how code coverage works, what tools are available and why a well thought-out test strategy is often more important than achieving a certain percentage. Find out which coverage targets make sense and how to find a balance between effort and code quality.

What is code coverage?

In software development, metrics are quantitative measures that are used to evaluate the quality and efficiency of code. Code coverage is an important metric in this context, measuring how much of your code is covered by tests.

Typically, code coverage is defined as a percentage of the number of lines covered by a test or collection of tests (line coverage). There are also other, sometimes less common types, such as Branch Coverage, Function Coverage, Statement Coverage or Condition Coverage (also called Predicate Coverage). The metrics are then determined based on the ratio of tested elements (lines, statements, etc.) and untested elements. For example, if 96 out of 120 lines of code are covered by unit tests, this results in a line coverage of 80 %.

Measuring code coverage is part of software quality assurance. High test coverage means that a large part of the code has been tested, which reduces the likelihood of errors and increases the quality of the code. Analysing the test coverage shows which parts of the code have not been tested. This allows tests to be specifically extended to increase coverage. In CI/CD pipelines, code coverage can be used as a metric to ensure that the code is continuously well tested. For example, merge requests can be rejected if they would reduce the code coverage of the overall project.

How do you collect code coverage?

Depending on the programming language and the tools used, there are various ways to collect code coverage. As a rule, a special tool or library is used that monitors the execution of the code during the tests and counts the covered branches, statements or lines. Some common tools for different programming languages are:

Java: JaCoCo, Cobertura
Python: coverage.py, pytest-cov
JavaScript: Istanbul, Jest
Rust: Tarpaulin, grcov
C++: gcov, lcov

What should my code coverage be?

The ideal code coverage (here line coverage) is almost a philosophical question and can vary from project to project. It depends on several factors, including the nature of the project, the risks associated with defects, and the practices of the development team. However, there are some general guidelines and considerations that can be helpful in setting a target value for code coverage.

Cost-benefit ratio: Achieving the last percentage of code coverage is associated with a high expenditure of time and resources, while the additional benefit decreases. It should therefore be weighed up whether the additional effort is economically justified.
Code complexity: Complex code should tend to have higher coverage as it is more prone to errors. Simple, well-structured sections of code may require less extensive testing.
Risk analysis: Areas of code that pose a higher risk (e.g. safety-critical functions, business logic, error handling routines) should have higher coverage than less critical parts.

These two guidelines are widely used:

At least 80%: A frequently cited guideline is that the code coverage should be at least 80%. This value is considered a good compromise between the effort required to write tests and the certainty that the code is well tested.
100% coverage: In safety-critical or high-reliability systems, such as aerospace, medical technology or financial applications, 100% test coverage may be required. This ensures that every part of the code has been tested and minimises the risk of undetected errors.

Average code coverage in software projects can vary, but studies offer some insights. For example, an analysis of 1,270 open source projects using TravisCI in 2019 found that the average code coverage was 78%, with Ruby projects having higher coverage (86%) than Java projects (63%). Another study of 100 large open source Java projects did not give an average value, but showed that code coverage does not correlate significantly with the number of bugs occurring after release.

Whilst high code coverage should generally be aimed for, it can therefore never be the sole criterion for the quality of the tests. It is important to also consider the depth and quality of the tests. Tests should not only aim to achieve high coverage, but also ensure that they cover meaningful and relevant scenarios. In particular, testing edge cases is crucial to ensure the robustness and reliability of an application. In practice, a pragmatic approach with a target of around 80-90% and higher coverage in critical areas can be a good balance.

High coverage does not provide any information about test quality

High coverage can be deceptive if important boundary conditions or error cases are not tested. A simple example of this is the division of two numbers. Consider the following example in the programming language Rust:

// src/lib.rs
fn divide(a: i32, b: i32) -> i32 {
    a / b
}

#[test]
fn test_divide() {
    assert_eq!(divide(42, 2), 21);
}

The test_divide function achieves 100% coverage with Tarpaulin, as all lines of code are executed:

cargo tarpaulin
...
INFO cargo_tarpaulin::report: Coverage Results:
|| Uncovered Lines:
|| Tested/Total Lines:
|| src/lib.rs: 2/2
|| 
100.00% coverage, 2/2 lines covered, +0.00% change in coverage

Obviously, however, this test alone is insufficient, as it does not cover the case of division by zero.

fn main() {
    let result = divide(42, 0);
    println!("Result is: {result}");
}

If this function is called with b = 0, we receive a runtime error and the programme crashes.

cargo run --release
...
thread 'main' panicked at src/lib.rs:2:5:
attempt to divide by zero
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

In addition to such boundary conditions, side effects such as writing to a database or sending emails are also important aspects that are not covered by code coverage alone. Accordingly, code coverage should always be considered in combination with other quality metrics and a sensible test strategy.

Integration in Gitlab CI/CD

A common use case for code coverage is integration into CI/CD pipelines. Here is an example of a Gitlab CI/CD configuration that uses Tarpaulin to cover a Rust project:

# .gitlab-ci.yml
test:
  image: xd009642/tarpaulin
  script:
    - cargo tarpaulin --all-features --out xml
  coverage: /\d+.\d+% coverage/
  artifacts:
      reports:
        coverage_report:
          coverage_format: cobertura
          path: cobertura.xml

Further examples of integration in GitHub Actions and CircleCI can be found in the Tarpaulin documentation.

The integration with GitLab also allows direct visualisation of the code coverage in the merge request view. This allows developers and reviewers to see at a glance whether the coverage is affected by the merge request. Alternatively, tools such as Tarpaulin also allow the generation of HTML reports, which also allow the local visualisation of coverage.

Fazit

Code coverage is a valuable tool, but it does not guarantee bug-free software. High coverage can help you cover many code paths, but it should not be your only goal. Instead, develop a targeted, risk-based testing strategy that considers critical scenarios and edge cases. With a realistic coverage target of around 80-90% and particularly high coverage in security-relevant areas, you can achieve the optimum balance between test depth and efficiency - and thus ensure the quality and reliability of your software.

How much code coverage is enough?

What is code coverage?

How do you collect code coverage?

What should my code coverage be?

High coverage does not provide any information about test quality

Integration in Gitlab CI/CD

Fazit

Contact an Expert

Read more

Auto-Generating & Validating OpenAPI Docs in Rust: A Streamlined Approach with Utoipa and Schemathesis

Fast and User Friendly: Our Alternative to Gitmoji