The frozen-model assumption: what ED-324 / ARP6983 does not yet cover

Certification Methods • 2026-06-24 • 10 min

The frozen-model assumption: what ED-324 / ARP6983 does not yet cover

Two standards documents arrived within weeks of each other in mid-2026, and the space between them is the problem this lab works on. EASA released the final issue of its AI Concept Paper, extending learning assurance toward reinforcement learning and higher autonomy. The joint EUROCAE/SAE process standard meant to make machine learning certifiable in practice, ED-324 / ARP6983, is still in draft, and its first issue is scoped to a model that never changes after training. The day a certified model is retrained, compressed, or swapped, the program is outside that scope and back in a recertification question that no published standard yet answers.

Two documents from mid-2026, and the seam between them

On 3 June 2026, EASA released Proposed Issue 03 of its Concept Paper on artificial intelligence, the final Concept Paper deliverable under its AI Roadmap 2.0. Issue 03 extends the agency's learning-assurance guidance beyond the supervised models of earlier issues to reinforcement learning, symbolic AI, and Level 3 'advanced automation', the category where the human operator may be remote or absent. The consultation runs until 12 August 2026. The direction is unambiguous: the regulator is now writing assurance objectives for AI that adapts, and for AI that acts with high authority.

The process standard that programs need in order to actually generate certification evidence is not keeping the same pace. ED-324, the EUROCAE designation for the joint EUROCAE WG-114 and SAE G-34 standard (ARP6983 on the SAE side), is listed by EUROCAE as a draft with a target publication date of 31 December 2026. Public briefings on the standard describe its first issue as scoped to frozen, non-adaptive, supervised-learning models, up to design assurance level C, with reinforcement learning and other techniques deferred to a later issue.

Read together, the two documents describe a seam. One side defines objectives that increasingly assume the model can change. The other side, the part a certification authority can hold a program to, assumes the model does not.

Why frozen is the load-bearing word

Every objective in a learning-assurance process is anchored to a specific model instance. The operational design domain is characterized for that model. Dataset sufficiency, data management, and learning-process verification are evidence about how that model was trained. The verification activities confirm that this model, the one under evaluation, behaves acceptably across its stated input space. The assurance case is, in effect, a statement about one artifact at one point in time.

This is a reasonable foundation, and it mirrors how DO-178C and DO-330 treat airborne software and the tools that produce it. The difficulty is not that the frozen-model approach is wrong. The difficulty is that a learned component rarely stays frozen for the life of the system.

The day after type certification

A model gets retrained when in-service data exposes a gap. It gets compressed or quantized so it fits the target processor and its timing budget. It gets swapped when a better architecture appears or a supplier changes. Each of these is a change to the exact artifact the assurance case was written about, and each one, under a frozen-model standard, moves the system outside the scope the evidence was built for.

Today there is no agreed evidence model for that transition. The practical consequence is that a change a software team would treat as routine becomes a recertification question, and the program either absorbs a long manual review cycle or holds the update. Decoupling a model update from full system recertification, without lowering the assurance bar, is the bottleneck that gates how fast AI can be adopted in airworthy and mission-critical systems.

What the public research is starting to show

The mechanism for closing this gap is beginning to appear in the open literature, which a lab can cite without exposing anything proprietary. The relevant question is not only whether an updated model is accurate. It is whether the updated model is behaviorally equivalent to the certified original, within a stated and defensible bound.

A March 2026 paper, SimCert (arXiv 2603.14818), certifies exactly that property for compressed networks: it produces quantitative, confidence-bounded guarantees that a quantized or pruned model preserves the behavior of the original, and it demonstrates the method on the public ACAS Xu collision-avoidance benchmark. Related work on semantics preservation argues that the certified artifact is a precise description of the model's behavior rather than a particular weights file, so any re-implementation is sound only if it provably preserves that description. Runtime out-of-distribution monitoring supplies the complementary piece: an in-operation check that flags when live inputs leave the domain the certification evidence actually covers.

• Behavioral-equivalence evidence: prove an updated model stays within a bounded distance of the certified baseline, rather than re-justifying it from scratch.
• Semantics preservation: treat the certified description, not the weights file, as the object that re-implementation and quantization must not break.
• Runtime envelope: monitor for out-of-distribution inputs so a model that drifts past its certified domain is caught in operation.

What a program can do before the standard lands

The standard is not final, and the regulator's objectives are still moving. Programs integrating machine learning into safety-critical aerospace and defense systems do not have to wait for either to settle. The work that holds up later is the work that treats each model version as a controlled configuration item and records, per change, what was modified and what evidence was regenerated.

Action checklist

• Treat every retrain, compression, or model swap as a change event with its own scoped assurance impact analysis, mapped to the lifecycle data items the emerging standard already defines.
• Build behavioral-equivalence evidence against a frozen, certified baseline, so an update can be argued as a bounded change rather than a new certification.
• Instrument runtime out-of-distribution monitoring so a fielded model that leaves its certified operating domain is detected and can fall back to a defined safe state.