Architectural Brief: Dispute Resolution Workbench
Finance disputes stop being manageable when the evidence lives in five different tools and the audit packet has to be assembled after the fact. This architecture puts ingestion, correlation, ownership, resolution, notification, and report export behind one tenant boundary.
The core queue has to work when every ecosystem client is turned off. That constraint shaped the system more than the language choice.
System Topology
Infrastructure Decisions
- Compute: Clojure 1.12 with Pedestal 0.7.2. Chose it over a larger MVC framework because the application is mostly data transitions, interceptors, and background jobs. Pedestal keeps the tenant and audit chain explicit on every route.
- Data Layer: Datomic schema plus PostgreSQL 16 storage config. Chose it over a plain SQL CRUD model because dispute history matters as much as current dispute state. The code still keeps domain writes process-local until durable Datomic transactions are wired, which is an honest current limit rather than a hidden claim.
- Cache and Queue: Redis 7 through Carmine. Chose it over a required broker because most work is local polling and state transition, not high-volume event transport. Redis is enough for cache and queue behavior without making NATS mandatory for local operation.
- UI: Server-rendered Hiccup with HTMX and Tailwind. Chose it over a SPA because operators need forms, queues, detail pages, and downloads. The dashboard is a work surface, not a consumer app.
- Ecosystem Boundaries: Notification Hub and Workflow Engine clients are disabled by default. Chose that over required service wiring because the manual dispute queue must still run when the portfolio services are not configured.
- Testing: Clojure test, Kaocha, Testcontainers, and real HTTP E2E. Chose container-backed upstream tests over in-memory adapter fakes because request paths, headers, cursor parameters, and stored source refs all sit at the HTTP boundary.
Constraints That Shaped the Design
- Input: Exceptions arrive from manual API calls, signed Hub ingress, invoice discrepancies, contract breaches, transaction mismatches, and webhook dead letters.
- Output: The Workbench produces tenant-scoped disputes, correlation candidates, workflow execution triggers, Notification Hub events, SLA breach events, and dispute audit PDF artifacts.
- Scale Handled: The dashboard guard uses a five-figure dispute fixture and keeps the overview bounded to 50 links while preserving totals. The adapter batch cap is 100 exceptions per run from
.env.example. - Hard Constraints: Correlation cannot merge across tenants. Terminal disputes cannot accept new exceptions or comments. Hub ingress has a 300-second replay window and duplicate delivery memory. Enabled adapters fail when configuration is missing, while disabled adapters return safe no-op results.
- Operational Boundary: Deployment files exist with a Traefik label for
disputes.kingsleyonoh.com, but the registry marks the project as not deployed and the host is unreachable. Content should treat it as shipped code, not a live service.
Core Contracts
The tenant boundary is the first contract. Protected API routes bind a tenant from X-API-Key, resolve only through the tenant store, and return cross-tenant misses as not found. Hub ingress uses a different path: signed payload plus X-Hub-Tenant-Slug. That separation matters because public Hub delivery should never inherit the normal operator auth model.
The correlation boundary is the second contract. The correlator scores seven signals: source reference, entity id, counterparty, currency, amount, date, and category. It will not even consider a dispute unless tenant, counterparty, status, and correlation window checks pass. Scoring starts after eligibility, not before it.
The audit boundary is the third contract. Tenant snapshots are captured before PDF rendering, missing template values throw, and report artifacts move through generating, ready, or failed. The code also keeps two-tenant leakage tests close to the report domain so a tenant identity leak fails before a generated artifact is trusted.
Decision Log
| Decision | Alternative Rejected | Why |
|---|---|---|
| Pedestal interceptors for request path | Per-handler tenant and audit checks | Tenant binding, request ids, rate limits, auth, and audit context need one ordered path before handlers run. |
| Pending correlation queue | Auto-attach every plausible match | A 0.70 review threshold is a hint, not a fact. Operators need accept or reject decisions when evidence is ambiguous. |
| Disabled-by-default ecosystem clients | Require Hub and Workflow Engine for startup | The queue is valuable even when integrations are off. Feature flags let the same code run standalone. |
| HMAC Hub ingress with replay headers | Treat Hub as a trusted internal caller | Public ingress needs signature verification, timestamp checks, tenant slug resolution, and duplicate delivery handling. |
| Frozen tenant snapshots for reports | Re-query tenant profile at export or reprint time | Audit packets must describe the tenant identity at generation time, not whatever the tenant renamed itself to later. |
| Bounded dashboard preview | Render all disputes on the root page | The high-volume dashboard guard proved the dashboard should summarize workload and link to the full queue. |
Scaling Limit
The current design is shaped for a tenant operations team, not a marketplace-scale dispute clearinghouse. At higher volume, three pieces would move first: process-local domain state would become durable Datomic transactions, adapter polling would move from per-source runs to partitioned queues, and duplicate Hub delivery memory would move out of an atom into Redis or Datomic.
The architecture already points that way because the boundaries are named. The risk is not a rewrite. The risk is promoting the current local stores to durable stores without weakening the tenant and audit contracts that made the Workbench coherent in the first place.