Why Extracted Obligations Never Activate Themselves

Missing an obligation would have been visible. Accepting a fake one was harder to catch. From one signed MSA, the RAG Platform returned nine real payment schedules, two duplicate renewal windows, and one hallucinated reporting duty the contract never created.

The contract engine inserts all 12 rows. Every single one is set to Status = ObligationStatus.Pending. None of them will fire an alert. None will appear on the deadline dashboard. None will publish a contract.obligation.breached event if their date passes. The hourly scanner walks right past them.

This is by design, and it took a specific design decision to make that true. The line is in ExtractionResultParser.BuildObligation, and the docstring calls it a load-bearing invariant:

// Invariants (load-bearing - do NOT weaken without a test):
// - Every obligation returned carries ObligationStatus.Pending - no
//   auto-activation, extract-then-confirm per PRD 5.2.
// - Source is always ObligationSource.RagExtraction.
// - Malformed JSON returns an empty list.

A human has to click confirm on every row before the system treats it as real. Nine clicks to accept the true positives. Two clicks to dismiss the duplicates. One click to dismiss the hallucination. Twelve interactions for a single contract extraction. That is a lot of friction for what could have been a fully automatic workflow.

I put the friction there on purpose.

The Failure Mode I Was Optimising For

AI extraction has two failure modes. False negatives (the model misses a real obligation) and false positives (the model invents one). In most domains, false negatives are the worse problem. Missed entries do not generate phantom work. A search system that fails to surface a document just returns fewer results; the user notices and reformulates.

Legal obligations invert that calculus. A missed obligation is a real legal exposure, but so is a phantom one. If the AI invents a 90-day renewal notice clause and the system auto-activates the obligation, the scanner will fire alerts on a fabricated deadline. The operator sees the alert, files the notice, and now there is a formally documented paper trail attesting to a clause that does not exist in the underlying contract.

That is worse than missing the real clause. A missed obligation creates a gap. A phantom obligation creates a fiction in the compliance record.

The scanner cannot tell the difference between a real deadline and a hallucinated one, because by the time a status is Active, the system treats it as legally binding. The only place the distinction can be enforced is at the boundary where AI output becomes domain state. That boundary is the Pending status.

Four Terminal States, Not One

The obligation state machine has eleven states. Seven are non-terminal: Pending, Active, Upcoming, Due, Overdue, Escalated, Disputed. Four are terminal: Dismissed, Fulfilled, Waived, Expired. The easy design is one terminal state called Closed with a reason column.

I rejected that. The four terminal values each answer a different audit question:

Dismissed. The obligation was never real. An AI false positive, or a human operator seeing a duplicate entry. The event row captures the reason verbatim.
Fulfilled. The obligation was honored. Counterparty was paid, notice was delivered, deliverable shipped. For recurring obligations this spawns the next instance.
Waived. The obligation was real but forgiven. The counterparty agreed not to enforce it, or we renegotiated. A waiver without a rationale is a compliance liability, so WaiveAsync throws if reason is empty.
Expired. The contract was archived before the obligation resolved. This is a cascade, not a decision. When a contract archives, every non-terminal obligation under it expires with actor = "system:archive_cascade".

An auditor asking "how many obligations did we waive in Q2?" should get a number, not a regex over free-text reason fields. "How many were AI-extracted and then dismissed?" should be a two-column filter (source = rag_extraction AND status = dismissed). Collapsing the four into Closed pushes that information into prose, where it stops being queryable.

The transition map enforces the distinction at the type-system level. Pending can only transition to Active (via confirm), Dismissed (via dismiss), or Expired (via archive cascade). There is no path from Pending to Fulfilled or Waived. You cannot fulfill something that was never confirmed as real. The ObligationStateMachine.GetValidNextStates method is a switch expression that encodes this per PRD section 4.6, and EnsureTransitionAllowed throws ObligationTransitionException on any deviation. The middleware maps that exception to 422 INVALID_TRANSITION with the list of valid next states in details[], so a caller who gets the transition wrong sees exactly what was allowed.

Creation Emits No Event

This is the second invariant that falls out of extract-then-confirm. The CreateAsync method on ObligationService explicitly does not write an event row:

await _obligationRepository.AddAsync(obligation, cancellationToken);
// Deliberately: NO event row here. Events represent transitions, not creation.
return obligation;

An event represents a decision about an existing thing. Creating a row is not a decision. If the AI extracts 12 obligations and a human dismisses 3, the audit log should not read like 12 creations followed by 3 dismissals. It should read like 3 dismissals, because the other 9 have not yet been acted upon. They are proposals, not facts.

The moment a human confirms an obligation, the event log records the transition pending -> active with the actor, the timestamp, and any reason they provided. Now it is a fact. If the contract is renegotiated two months later and the obligation is waived, the log reads active -> waived with the waiver reason required. The full lineage is two events.

There is exactly one exception: recurring obligations. When a monthly payment obligation is fulfilled, the system spawns a new Active instance with next_due_date advanced by the recurrence interval. The spawn writes one event on the new row with FromStatus = "" and Reason = "auto-created from fulfilled parent (recurring): {parent.Id}". Empty from_status is the signal for a creation event rather than a transition, and the metadata column carries the parent ID. The provenance matters here because otherwise the spawn looks like a ghost, an Active obligation with no origin story.

That exception was debated. An earlier design kept the "no creation events ever" rule strict and relied on querying obligations.metadata.parent_id to reconstruct recurrence chains. I changed it because reconstruction in the event log is cheap (one SELECT, ordered by timestamp), whereas reconstruction across the obligations table means understanding a JSONB structure that no one else will be looking at three years from now. The event log is the canonical place an auditor goes. Putting provenance anywhere else is hoping they find it.

What Confirm Actually Costs

The friction of clicking confirm on every row is real. An AI-extracted MSA produces 8 to 20 obligations depending on contract type. A legal-ops team processing 30 new contracts a month is confirming several hundred obligations a month. That is manual review work the AI was supposed to eliminate.

The response to that is to be honest about what the AI eliminated. It did not eliminate the review. It eliminated the extraction. Before the engine, a paralegal read each contract and typed deadline dates into a spreadsheet. The typing was the bottleneck, not the reading. With extraction, the typing is gone. The review still happens. The operator reads each extracted row, checks the clause reference against the source document, and either confirms or dismisses.

The confidence score on each extracted row helps prioritize. Extractions above 0.90 confidence tend to have exact clause references and pass review in seconds. Extractions under 0.70 tend to be the cases worth scrutinizing. The engine surfaces both but treats them identically at the storage layer. Confidence gates prioritization, not state. Activating high-confidence rows automatically would re-introduce the false-positive risk the whole design was avoiding.

The Result in the Audit Log

On a single contract with twelve extracted obligations, where a human confirms nine and dismisses three, the obligation_events table holds twelve rows. Nine rows are pending -> active with actor = user:{tenantId} and no reason. Three rows are pending -> dismissed with actor stamps and the reason the operator typed. Zero rows represent creation, because creation was not a decision.

When the contract is archived eighteen months later, the archive cascade writes another nine rows, one per non-terminal obligation, as {prior_status} -> expired with actor = system:archive_cascade. The dismissed three already terminated so the cascade skips them.

Six months after archive, a regulator asks the company to explain a specific obligation that was dismissed. Zero forensic work. One query:

SELECT * FROM obligation_events WHERE obligation_id = $1 ORDER BY created_at;

Two rows come back. The first one is the dismissal, with the timestamp, the user actor, and the reason text. The second one is the archive cascade that terminated the already-terminal row (defensive; the cascade is idempotent). The regulator has their answer. No reconstruction across tables, no JSON parsing, no correlating timestamps across systems. The audit is the data.

That is what the Pending status buys. A system where every legally-binding commitment in the database was put there by a human with an identity and a timestamp, and the row that records the commitment is the same row the auditor queries six months later.

The Failure Mode I Was Optimising For

Four Terminal States, Not One

Creation Emits No Event

What Confirm Actually Costs

The Result in the Audit Log

Put this system in context.

Contents