compliance · 9 min read

OSFI B-13 and E-23: What AI Agent Enforcement Evidence Actually Needs to Prove

How OSFI B-13 Technology and Cyber Risk Management expectations and E-23 model governance (published September 2025, effective May 1, 2027) map to pre-execution evidence for capital markets AI agents.

Published 2026-05-04 · AI Syndicate

Primary topic: OSFI B-13 AI agent enforcement evidence
Category: compliance
Reading time: 9 min read

The practical audit question is not whether the AI model was reviewed. It is whether the firm can prove policy was evaluated before the AI agent touched a client record, trade workflow, KYC field, or AML disposition.

That question sits at the boundary between two control families. OSFI's B-13 Technology and Cyber Risk Management guideline speaks to technology risk, access controls, secure architecture, change control, logging, and resilience for federally regulated financial institutions. OSFI's E-23 Model Risk Management guideline speaks to model identification, model inventory, model risk rating, model review, approval, deployment, and use.

Both matter. They do not answer the same evidence question.

Source Scope

This mapping uses two OSFI anchors.

The first is B-13 Technology and Cyber Risk Management. For AI agents, the relevant parts are the technology and cyber risk management framework, secure architecture, SDLC control gates, change and release traceability, preventive security controls, identity and access management, security configuration baselines, and security logging.

The second is E-23 Model Risk Management. For AI systems, the relevant parts are model inventory, model risk rating, approved use, model limitations, independent review, model approval, deployment controls, and production use.

This article does not treat E-23 as a substitute for execution enforcement. E-23 helps establish whether a model is governed for its intended use. B-13 helps frame whether the technology environment enforces controls, blocks unauthorized access, and preserves evidence. AI agent execution needs both layers, and it needs a specific evidence artifact that neither layer automatically produces by itself.

Where Model Governance Ends

Model governance proves the model was identified, reviewed, approved, and used within a documented lifecycle. That is necessary, especially where an AI or ML model drives decisions or supports regulated workflows.

But a model approval does not prove that a later agent action was authorized. It does not prove that the exact prompt, tool call, client record, trade parameter, KYC attribute, or AML disposition was allowed under the policy version in force at the moment of execution.

That distinction matters because AI agents turn model output into operational action. A model can be approved for a business purpose while a specific agent action remains unauthorized, over-scoped, stale, replayed, or outside the parameters a reviewer approved.

E-23 can establish that the model lifecycle was governed. It does not by itself prove pre-execution enforcement for each action the agent attempted.

Where Platform Inventory Ends

A platform inventory proves that an agent, model, tool, or integration exists. That is useful for technology asset management and model inventory. It can help a firm identify what is deployed, who owns it, what it depends on, and where it runs.

Inventory does not prove execution authorization.

An inventory can show that a trade-support agent exists. It can show that the agent has access to a retrieval service, a workflow queue, or a case management API. It cannot prove that a specific action on a specific client record was evaluated against policy before the side effect occurred.

That is the gap capital markets technology risk teams need to close.

The Enforcement Evidence Gap

The three-part distinction is the core of the mapping.

Model governance proves the model was governed.

Platform inventory proves the agent existed.

Neither proves the specific action had policy evaluation, approval binding, and parameter match before execution.

That third proof is the enforcement evidence requirement. It is not a dashboard. It is not ordinary application logging. It is a pre-execution evidence chain that shows who requested the action, which policy version evaluated it, which approval envelope bounded it, which parameters were approved, which parameters executed, and whether the system denied the action when a required condition was missing.

For a Director of Platform Engineering or VP Technology Risk, that is the difference between telling an examiner that controls exist and producing evidence that the control ran before the agent acted.

Mapping B-13 Expectations to Evidence Artifacts

B-13 expects technology and cyber risks to be governed through clear accountability, risk appetite, control domains, policies, standards, and processes. For AI agents, the evidence artifact is the policy version that evaluated the request, plus the actor identity and system identity bound to the action. Without those fields, the firm cannot show which control made the decision.

B-13 expects technology architecture and SDLC practices to support security requirements and control gates. For AI agents, the evidence artifact is the enforcement boundary: an execution path where the agent cannot reach the sensitive tool, record, queue, or transaction endpoint unless Gate has verified policy and approval first.

B-13 expects change and release processes to be controlled, approved, and traceable. For AI agents, the evidence artifact is the policy and deployment lineage: which policy version, tool definition, approval rule, and release state were active when the action ran.

B-13 expects preventive cyber security controls and secure-by-design practices where feasible. For AI agents, the evidence artifact is fail-closed behavior: DENY records for missing approval, expired approval, replay attempt, policy miss, malformed request, and unavailable approval service. A system that only records successful actions cannot prove it blocked unauthorized ones.

B-13 expects identity and access controls, including least privilege and privileged access management. For AI agents, the evidence artifact is actor attribution and service identity binding: which human, agent, service account, or delegated workflow requested the action, and whether that identity was permitted to act on that resource under the applicable policy.

B-13 expects security configuration baselines to be enforced and deviations managed. For AI agents, the evidence artifact is the boundary configuration state: whether direct provider access, direct tool access, or bypass routes were disabled or explicitly out of scope at the time of review.

B-13 expects security logging to support investigation. For AI agents, logs are secondary. The primary artifact is the enforcement evidence chain: request, decision, approval envelope, execution trace, and denial records, hash-chained or otherwise tamper-evident enough for independent review.

Mapping E-23 Expectations to the Boundary

E-23 helps define what model governance can prove. A model inventory can show model ID, owner, origin, version, risk rating, approved uses, dependencies, limitations, reviewer, and approver. Those records matter when an AI agent uses a model in a client-impacting workflow.

E-23 also expects model approval and deployment controls to address suitability for production use, residual risk, operational dependencies, stakeholder responsibility, approval hierarchy, change control, and exception handling.

Those controls support the model layer. They do not replace the execution layer.

The useful mapping is this: E-23 tells the firm whether the model is approved for the intended use. The enforcement boundary tells the firm whether a specific action was permitted under policy before it ran. If a model is approved but the action lacks a valid approval envelope, the execution layer should deny. If a model is not approved for the workflow, the execution layer should deny. If the approved parameters differ from the execution parameters, the execution layer should deny or apply only an explicit deterministic narrowing rule.

That is how model governance and execution enforcement complement each other without being conflated.

What the Evidence Chain Should Contain

A capital markets reviewer should be able to inspect the evidence chain without relying on the operator's interpretation of internal state.

The minimum useful chain starts with the raw request: actor identity, agent identity, action, target resource, parameters, timestamp, nonce, and request signature input.

It then records the policy decision: policy version, rule identifier, decision outcome, decision reason, approval requirement, and any limitation or exception that affected the decision.

If approval is required, the chain includes the approval envelope: approver identity, bounded action, bounded parameters, validity window, issuance time, expiry time, and signature.

At execution, the chain records the parameter comparison: approved parameters, executed parameters, and any declared narrowing rule. Silent truncation is not acceptable evidence. If the system changed the parameters, the rule must be explicit and reviewable.

The chain also records failures: DENY outcomes, expired approvals, replay attempts, approval service unavailability, signature mismatch, policy miss, direct access rejection, and malformed requests.

Finally, the chain needs continuity: stable record identifiers, ordering, trace identifiers, request hashes, and previous-hash or equivalent tamper-evidence. The point is not to create more logs. The point is to preserve the attributable chain needed to verify that policy was evaluated before execution.

What Enforcement Does Not Prove

This mapping is credible only if the limitation disclosures are as specific as the capability claims.

Enforcement evidence does not govern actions that bypass the enforcement boundary. If a worker, integration, administrator, or agent can reach the execution layer directly, the evidence chain covers only the routed path, not the bypass.

Enforcement evidence does not prove provider-side behavior after the inference call is made. It can show what request was approved, what provider or model was selected, and what response or tool call returned to the controlled path. It does not prove every internal operation performed by the provider.

Enforcement evidence does not prove model output correctness. It proves the action path, policy decision, approval envelope, and parameter binding. Model accuracy, explainability, bias, and performance remain model risk questions under E-23 and related controls.

Enforcement evidence does not cover traffic routed through direct provider access instead of Gate. If a team keeps an ungoverned API key or direct route outside the boundary, that path is outside the claim.

Enforcement evidence is not a legal opinion. It proves what was requested, what policy evaluated, what was approved, what ran, and what was denied. Whether that action was legally sufficient, commercially appropriate, or compliant in the full legal sense remains a legal and compliance determination.

The Consultation Leave-Behind Question

The control question to take into an OSFI B-13 or internal audit discussion is direct.

For each AI agent workflow touching client records, trade execution, KYC, AML, or reportable transaction activity, can the firm export independently verifiable evidence that policy was evaluated before execution?

If the answer depends on dashboards, screenshots, summary reports, or reconstructed application logs, the evidence is not yet at the enforcement boundary.

If the answer includes policy version, actor identity, approval envelope, parameter binding, fail-closed denial behavior, and a hash-chained evidence record, the firm can move the conversation from assertion to verification.

Frequently asked questions

How does OSFI B-13 map to AI agent enforcement evidence?

OSFI B-13 maps to the technology control layer: accountability, secure architecture, access control, change traceability, preventive controls, and investigation records. For AI agents, those expectations translate into policy version, actor identity, approval binding, parameter comparison, fail-closed denial records, and independently verifiable evidence.

Does E-23 model governance replace pre-execution enforcement?

No. E-23 helps establish whether a model is identified, reviewed, approved, limited, and deployed under model lifecycle controls. Pre-execution enforcement proves whether a specific agent action was allowed under policy before it ran.

What is the main evidence gap for AI agents in capital markets workflows?

Model governance proves the model was governed. Platform inventory proves the agent existed. Neither proves the specific action had policy evaluation, approval binding, and parameter match before execution.

What artifacts should a reviewer expect for an AI agent action?

A reviewer should expect the raw request, policy decision, policy version, actor identity, approval envelope, approved parameters, executed parameters, denial records where applicable, and a tamper-evident evidence chain linking the events.

What does enforcement evidence not prove?

It does not prove model output correctness, provider-side internal behavior, legal sufficiency, or actions that bypass the enforcement boundary. It proves what was requested, evaluated, approved, executed, or denied within the controlled path.

Continue reading

compliance