The DO-178C assurance gap for machine learning in airborne systems

Certification Methods • 2026-05-29 • 9 min

The DO-178C assurance gap for machine learning in airborne systems

DO-178C was designed for deterministic software with traceable requirements and structural coverage analysis. Machine learning introduces learned representations, probabilistic outputs, and data-dependent failure modes that these verification methods cannot address directly. RTCA SC-240 and EUROCAE WG-114 are developing guidance, but no settled certification method exists yet for ML in airborne systems. The gap widens once a certified model is later updated, compressed, or swapped, which is the lifecycle Fionn Labs focuses on.

Why DO-178C cannot assure ML directly

DO-178C assumes that software behavior can be traced from requirements through design to executable code, and that structural coverage analysis can confirm that testing has exercised the implementation adequately. For deterministic software, this model works: every code path exists because a requirement demanded it, and coverage metrics confirm that testing reached it.

Machine learning systems violate both assumptions. A neural network's behavior is not derived from a requirements decomposition; it is learned from training data through optimization. There is no requirements traceability chain from a safety requirement to a specific weight matrix. Structural coverage of source code (the training loop, inference pipeline) does not measure whether the learned function behaves correctly across its operational input space.

The practical consequence is that DO-178C's core verification apparatus (requirements-based testing, structural coverage analysis, and traceability matrices) cannot be applied to the learned components of an ML system without significant methodological extension.

Where the standards community is working

RTCA SC-240 and EUROCAE WG-114 are jointly developing guidance (anticipated as a supplement or new document in the DO-178 family) for ML/AI in airborne systems. EASA's AI Roadmap 2.0 has published concept papers on learning assurance and AI trustworthiness, proposing a W-shaped development lifecycle that separates data management, model training, and model verification into distinct assurance activities. SAE G-34 is working on ARP6983, addressing ML considerations in aerospace applications.

The working positions across RTCA SC-240, EUROCAE WG-114, SAE G-34, and EASA's AI Roadmap converge on a common set of new evidence types beyond what DO-178C defines: dataset sufficiency evidence, training process validation, operational domain coverage analysis, and runtime monitoring for out-of-distribution detection. These do not replace DO-178C. They extend the assurance framework for components whose behavior is learned rather than specified.

What programs should do now

Programs integrating ML into safety-critical or mission-critical aerospace systems should not wait for final standards. The assurance gap is real and the regulatory timeline is uncertain. Programs should build assurance evidence that anticipates the emerging framework while staying compatible with current DO-178C and ARP4754A processes.

• Document the operational design domain (ODD) for every ML component, specifying input conditions, environmental assumptions, and performance boundaries the model is expected to operate within.
• Build dataset assurance records that trace training and validation data to operational requirements, including data provenance, labeling confidence, and representativeness analysis.
• Implement runtime monitoring for ML inference: confidence calibration, input distribution monitoring, and graceful degradation triggers that activate deterministic fallback behavior.
• Maintain a traceability structure that connects safety requirements to ML component specifications, even where the implementation is learned rather than designed.

Action checklist

• Audit current ML integration plans against DO-178C objectives to identify where traditional verification methods break down.
• Establish dataset management and provenance practices before RTCA SC-240 guidance is finalized.
• Design runtime monitoring architectures that can produce certification-compatible evidence of ML behavior in operation.