Financial Crime Operations: Agentic AI Needs Evidence, Not Autonomy

Financial-crime work is full of repetitive investigation, but the judgment is too consequential to hand to an unbounded agent.

The right shape is not “AI closes cases.” The right shape is an evidence-bound casework system: agents gather facts, resolve identity, compare policy, draft narratives, highlight contradictions, recommend disposition, and route high-risk decisions to accountable humans.

That is exactly the Financial Crime Operations playbook in ContextOS.

Why this is agentic-first

KYC, AML, sanctions, and fraud workflows are not single prompts. A case may require customer identity resolution, beneficial ownership, transaction analysis, sanctions screening, adverse-media review, policy interpretation, narrative writing, supervisor review, and regulatory filing.

McKinsey describes agentic AI in financial-crime contexts across client onboarding, KYC checks and refreshes, transaction monitoring, sanctions, and fraud investigations from alert to case closure. The opportunity is real because these workflows are high-volume, evidence-heavy, and cross-system.

The control problem is equally real. A false negative can be regulatory risk. A false positive can create customer harm. A weak narrative can fail supervisory or regulator review. ContextOS narrows the agent’s role: assemble and reason over evidence, then preserve the approval boundary.

The Context Pack

The pack declares the casework boundary:

Layer	Required entries
`decision_layer`	`fincrime.alert.triage`, `fincrime.kyc.refresh`, `fincrime.case.disposition`, `fincrime.report.file`.
`policy_layer`	AML policy, sanctions policy, fraud policy, customer risk policy, data-retention policy.
`approval_gates`	`GATE_INVESTIGATOR_REVIEW`, `GATE_MLRO_APPROVAL`, `GATE_REGULATORY_REPORT`.
`tooling_layer`	core banking lookup, KYC fetch, transaction analysis, sanctions screen, case update, report filing.
`memory_layer`	case pattern, policy correction, decision outcome candidates.
`evaluation_layer`	false-negative rate, escalation quality, narrative completeness, audit acceptance.

The pack should also pin jurisdiction overlays and retention rules. Financial-crime decisions are not portable across policy context.

Agent roles

Agent	Responsibility	Boundary
Alert Triage Agent	clusters alerts and resolves customer, account, and transaction identity.	cannot close cases.
Evidence Agent	fetches KYC, sanctions, transaction, and adverse-media evidence.	read-only except case notes.
Policy Agent	maps evidence to policy rules and required case fields.	cannot override policy.
Narrative Agent	drafts case summary, rationale, and regulator-ready timeline.	draft only.
Supervisor Agent	checks completeness, contradictions, and approval requirements.	can block disposition.

The Supervisor Agent is the casework equivalent of the Critic. It does not have to be smarter than every specialist; it has to enforce the contract.

Decision gates

Low-risk false positives may be recommended for batch approval. Suspicious activity, sanctions hits, high-risk jurisdictions, and regulatory filings should stay behind named approval.

What a useful DecisionRecord contains

For financial-crime operations, the DecisionRecord should include:

{
  "decision_key": "fincrime.case.disposition",
  "subject_ids": [
    "ceid_customer_8f4a",
    "ceid_account_1309",
    "ceid_alert_cluster_77"
  ],
  "outputs": {
    "recommended_disposition": "escalate_to_mlro",
    "risk_factors": ["sanctions_name_similarity", "unusual_velocity"],
    "narrative_ref": "artifact_case_narrative_481"
  },
  "evidence_refs": [
    "evidence_kyc_snapshot_11",
    "evidence_txn_graph_91",
    "evidence_sanctions_screen_42"
  ],
  "policy_decisions": [
    "policy_decision_aml_v7_rule_19",
    "policy_decision_sanctions_v4_rule_03"
  ],
  "approvals": [
    "approval_mlro_203"
  ],
  "controls_active": [
    "GATE_MLRO_APPROVAL",
    "NO_AUTO_CLOSE_HIGH_RISK"
  ]
}

The record is not a compliance afterthought. It is how the case gets reviewed, queried, replayed, and improved.

Failure modes to block

Failure	Control
Identity collision	CEID/SID resolution proves customer, account, and transaction relationships.
Source conflict	contradictory KYC or sanctions evidence triggers escalation.
Policy version mismatch	Compiler refuses packs that mix incompatible policy versions.
Narrative without evidence	Supervisor blocks claims without `evidence_refs`.
Over-automation	high-risk disposition remains recommendation until approval.
Memory poisoning	case patterns enter review before promotion.

Metrics that matter

case package completeness,
false-positive reduction without false-negative increase,
investigator acceptance rate,
time from alert to first decision,
regulatory filing defect rate,
supervisor block rate,
policy gaps discovered through operator correction,
replay pass rate on sampled cases.

The goal is not maximum autonomy. The goal is higher-quality casework with a shorter path from alert to accountable decision.

Research base

ContextOS use case: Financial Crime Operations.
ContextOS contracts: Decision Record, Decision Catalog, Governance, and Identity Layer.
McKinsey: How agentic AI can change the way banks fight financial crime, covering KYC, transaction monitoring, sanctions, fraud, and investigations from alert to case closure.
OWASP Top 10 for LLM Applications for risks around prompt injection, data exposure, excessive agency, and unbounded consumption in tool-using systems.

Financial Crime Operations: Agentic AI Needs Evidence, Not Autonomy

Why this is agentic-first

The Context Pack

Agent roles

Decision gates

What a useful DecisionRecord contains

Failure modes to block

Metrics that matter

Research base

What to read next

Agentic Incident Command Center: Agents Can Coordinate, Boundaries Still Decide

The AI Software Delivery Squad: From Ticket to Proof-Carrying Pull Request

Financial Crime Operations: Agentic AI Needs Evidence, Not Autonomy

Why this is agentic-first

The Context Pack

Agent roles

Decision gates

What a useful DecisionRecord contains

Failure modes to block

Metrics that matter

Research base

What to read next

Related implementation guides

Agentic Incident Command Center: Agents Can Coordinate, Boundaries Still Decide

The AI Software Delivery Squad: From Ticket to Proof-Carrying Pull Request