Confidence Is Not Ownership

What should a finance queue do when two credible records point at the same case?

An invoice discrepancy and a contract breach can describe the same dispute. They can also describe two different disputes with the same counterparty, the same currency, and nearly the same amount. That is the trap in a finance operations queue. The data looks related before anyone has proved ownership.

The Workbench ingests exceptions from invoice reconciliation, transaction reconciliation, contract lifecycle events, webhook dead letters, manual operator entry, and signed Hub fanout. Every source carries its own identifiers. Some are reliable. Some are only reliable inside the upstream tool that produced them.

I had to decide what the system should do when a new exception looks like it belongs to an existing dispute.

The tempting version is simple: compute a score, pick the highest dispute, attach the exception. That makes demos feel clean. Exceptions flow in, disputes become richer, and the queue stays small.

It is also how a finance system quietly corrupts its own audit trail.

Once an exception is attached to a dispute, every later action inherits that fact. SLA timers, resolution playbooks, Notification Hub events, audit PDF exports, and operator comments all treat the relationship as true. If the relationship was only probable, the system has converted probability into evidence.

That conversion is the real design problem.

The Scoring Shape

The correlator in src/drw/domain/correlator.clj uses seven signals. Source reference and entity id each carry 0.15. Counterparty carries 0.25. Currency carries 0.10. Amount carries 0.15. Date carries 0.10. Category carries 0.10.

Those weights are not magic in the sense of being secret. They are visible because this is a spec project. But the structure matters more than the numbers.

Counterparty is the gate. A candidate dispute is not eligible unless it belongs to the same tenant, is not terminal, has the same counterparty, and falls within the correlation window. Only then does scoring begin.

That means the correlator is not a general similarity search. It is a tenant-scoped dispute ownership test.

The amount signal has a 10 percent tolerance and only scores when the currency also matches. The date signal checks whether the exception was observed within 72 hours of the dispute creation time. The source reference and entity id signals compare against exceptions already attached to the candidate dispute, not just fields on the dispute itself.

That last part matters. A dispute becomes easier to recognize as it accumulates evidence. The first invoice mismatch may create the dispute. A later webhook dead letter with the same upstream reference can now match the attached evidence, even if the dispute record itself does not carry that reference.

The core function is plain Clojure:

(defn score-candidates
  ([tenant-id exception disputes attached]
   (score-candidates tenant-id exception disputes attached {}))
  ([tenant-id exception disputes attached opts]
   (let [cfg (merge default-config opts)
         review (get-in cfg [:thresholds :review])]
     (->> disputes
          (map-indexed vector)
          (filter (fn [[_ dispute]]
                    (candidate-eligible? tenant-id exception dispute cfg)))
          (map (fn [[idx dispute]]
                 (assoc (score-candidate tenant-id exception dispute attached cfg)
                        :sort-index idx)))
          (filter #(>= (:score %) review))
          (sort-by (juxt (comp - :score) :sort-index))
          (mapv #(dissoc % :sort-index))))))

There are two details in that function I care about.

First, eligibility happens before scoring. A cross-tenant dispute receives no score. A terminal dispute receives no score. A different counterparty receives no score. The function does not let a high amount or date match compensate for a broken boundary.

Second, ties preserve input order through :sort-index. That is not glamorous. It prevents unstable review queues where two equal candidates swap positions between renders and make operators think the system changed its mind.

What I Was Wrong About

I initially treated the correlation score as the hard part.

It was not. The harder question was what the system does with the score.

There are three possible outcomes in src/drw/domain/exceptions.clj. If no candidate passes review, the exception creates a new dispute and attaches immediately. If the best candidate hits the auto-merge band and auto-merge is explicitly enabled, the exception attaches and records an auto-merged correlation. Otherwise, the system creates pending correlation records and emits a dispute.correlation_pending event.

That middle branch is intentionally hard to reach. The .env.example values set AUTO_MERGE_THRESHOLD=0.92 and REVIEW_THRESHOLD=0.70, while the source correlator defaults are lower for unit-level behavior. The runtime config is stricter because this is finance operations. False attachment costs more than a larger review queue.

What surprised me is that pending correlation became a domain object, not a UI convenience.

The queue needed an id, a tenant id, an exception id, a target dispute id, a score, a rationale, a status, a decided-by user, and decision timestamps. That is a lot of structure for something that could have been a modal row.

But the moment an operator accepts or rejects a candidate, that decision becomes part of the case history. A rejected match is useful evidence. It says someone looked at the overlap and decided the exception did not belong there. If the same upstream source sends a related item later, the prior rejection explains why the system did not combine the cases earlier.

That is why correlation records live next to exceptions and disputes instead of inside a transient UI response.

The Failure Mode Hidden In Good Matches

The most dangerous false match is not a ridiculous one.

It is the match that looks reasonable.

Same counterparty. Same currency. Amount within 10 percent. Observed inside three days. Category is billing. If those signals point to the wrong open dispute, the system does not look broken. It looks efficient.

The damage appears later. A Workflow Engine playbook starts against the wrong case. A Notification Hub event tells an operator that the dispute is ready for resolution. The audit PDF now contains an exception that belongs somewhere else. Nobody sees the root mistake because every downstream artifact is internally consistent.

That is the kind of bug that worries me more than a 500 response.

A 500 stops the flow. A wrong attachment keeps moving.

The design answer was to make confidence create a decision, not mutate the dispute. A review-band candidate becomes work for an operator. The UI exposes accept and reject actions. The API carries the same boundary. The audit log records correlation creation and later decisions.

The Workbench still supports auto-merge, but it is a policy choice. It has to be enabled. The score has to clear the higher band. The code does not pretend the existence of a scoring function means the business has accepted the risk.

Why Tenant Scope Belongs Inside The Algorithm

Tenant isolation is usually discussed at the HTTP layer. API key comes in, tenant id gets attached to the request, handlers filter queries.

That is necessary, but it is not enough here.

Correlation is a cross-entity operation by nature. It compares a new exception against many existing disputes and attached exceptions. If the algorithm accepts a list that accidentally contains another tenant's disputes, the HTTP layer is already too far away to save it.

So score-candidate checks tenant equality itself. score-candidates filters candidates through candidate-eligible?, which repeats tenant, status, counterparty, and time-window checks before scoring.

This is defensive duplication with a purpose. The route should pass tenant-scoped collections. The domain should still reject anything outside the tenant boundary. In a single-tenant fixture, this looks redundant. In a two-tenant test, it is the difference between "the route behaved" and "the invariant held."

The same philosophy appears in reports. The audit PDF renderer captures a tenant snapshot, renders with strict token lookup, and the setup check renders two tenants to make sure one tenant's identity literals never appear in the other tenant's output.

The theme is the same: do not trust a boundary because a previous layer probably handled it.

The Result

The finished local build passed 160 tests with 869 assertions. The full-flow E2E drives invoice adapter polling into exception creation, assignment, investigation, Workflow Engine resolution polling, Notification Hub event capture, and audit PDF generation.

The number I care about most is smaller: the dashboard guard. It caught a practical operations failure. The first dashboard shape rendered every dispute link in the tenant fixture. The fix capped the overview at 50 open disputes and kept totals intact.

That is the Workbench in miniature. Preserve the facts. Limit the surface. Make the operator decide when the machine only has a probability.

Confidence is useful. Ownership is a human or policy decision.

The Scoring Shape

What I Was Wrong About

The Failure Mode Hidden In Good Matches

Why Tenant Scope Belongs Inside The Algorithm

The Result

Put this system in context.

Contents