PO-2026-001. PO 2026 001. po2026001. PO#2026-001. 2026-001.
Every one of those strings refers to the same purchase order. The first came from our internal PO system. The rest came from vendors. One vendor's accounting package strips dashes. Another inserts hash prefixes. A third just writes the sequence number and assumes the buyer will figure it out. A database equality query on po_number finds zero matches on four of those five.
That's the starting condition for the matching engine. The po_reference field on the invoice exists, is populated, and does not equal any PO number in the database. The naive response is to call the invoice unmatched and let a human deal with it. The whole point of the system is to not do that.
The Matcher's Real Job
The pipeline I inherited from my own PRD had a single phase called "PO resolution." Given an invoice, find its PO. The implementation I started writing was also single-step: normalize both sides of the string, compare, return a match or null. Clean code, one function, one SQL query.
It handled about 60% of real invoices.
The remaining 40% broke in three different ways. Some vendors omitted the PO reference entirely, writing it into the line item description instead. Some vendors used their own internal sales order number, which had no textual overlap with our PO number at all. Some vendors sent the right reference but with enough format noise that neither exact nor normalized comparison found it.
Three different failure modes. One matching algorithm. The ratio was not going to get better.
The Cascade, Not a Fallback
I rewrote PoResolver as an ordered chain of strategies, each one with a decreasing confidence ceiling. The signature looks the same from the outside: give me a tenantId, a poReference, a vendorId, an invoiceAmount, and an invoiceDate, and I'll hand back a PoResolution with a PO ID, a confidence score, and a method label. What changed is the inside.
// Strategy 1: Exact match
if (poReference != null) {
val exact = tenantPos.firstOrNull { row ->
row[PurchaseOrderTable.poNumber] == poReference
}
if (exact != null) {
return@newSuspendedTransaction PoResolution(
poId = exact[PurchaseOrderTable.id],
confidence = 1.0,
method = "exact",
)
}
}
// Strategy 2: Normalized match
if (poReference != null) {
val normalizedRef = normalize(poReference)
val normalized = tenantPos.firstOrNull { row ->
normalize(row[PurchaseOrderTable.poNumber]) == normalizedRef
}
if (normalized != null) {
return@newSuspendedTransaction PoResolution(
poId = normalized[PurchaseOrderTable.id],
confidence = 0.95,
method = "normalized",
)
}
}
The full resolver in PoResolver.kt runs five of these in order. Exact at confidence 1.0. Normalized (strip whitespace, remove the PO prefix, lowercase) at 0.95. Jaro-Winkler fuzzy match against every PO in the tenant above 0.70, with the Jaro-Winkler score itself becoming the confidence. Vendor plus amount within 5% tolerance at 0.65. Vendor plus date within 90 days at 0.50. Then no match at 0.0.
Each strategy catches a failure the previous one could not.
Why the Confidence Decays on Purpose
The obvious objection to a cascade is that you could just run the weakest algorithm first and skip the rest. Fuzzy match solves both the exact case and the noisy case, right?
It doesn't. Not for the reason people expect.
Jaro-Winkler's 0.70 threshold is tuned for normal PO numbers. Two different POs issued a week apart to the same vendor will often score 0.78 against each other. PO-2026-001 and PO-2026-008 share the prefix, share the date segment, and differ only in the sequence number. Run fuzzy match first and you will occasionally match an invoice to a PO that is not the correct one. The system will have no way to know it's wrong, because the score says "probable match."
Running exact match first means any invoice that quotes the correct PO number unambiguously gets it unambiguously, at confidence 1.0. The fuzzy matcher never runs on the 60% of invoices where exact works. It only runs on the 20% where no exact or normalized match exists, and among those, only the cases with a close-enough string similarity get scored. The cascade constrains the search space before the weaker algorithms touch it.
Confidence 0.95 for normalized, 0.70-ish for fuzzy, 0.65 for vendor+amount, 0.50 for vendor+date. Those numbers are not arbitrary rankings. They are honest confidence statements about the evidence the match is built on. A 0.50 vendor+date match means: this invoice arrived from a vendor we have a PO with, issued within 90 days of the invoice date. That is weak evidence. It might be the right PO. It's enough to avoid the unmatched state, enough to surface for human review, and honest about how certain the system is.
The Normalization Function That Does Three Things
normalize() is short enough to fit on a screen. It strips whitespace, regex-removes the PO prefix variants, and lowercases. That's it:
private val PO_PREFIX_REGEX = Regex(
"""^(PO[-#\s]*)""",
RegexOption.IGNORE_CASE,
)
internal fun normalize(value: String): String {
return value
.trim()
.replace(PO_PREFIX_REGEX, "")
.replace("\\s+".toRegex(), "")
.lowercase()
}
Three lines of transformation. It cost me two afternoons.
What surprised me in the data was how many ways the same four characters can vary. PO-, PO , PO#, PO., po-, P.O.-, P.O. . The regex handles the first four. The last three we either accept as exact-match failures (they won't pass normalization either) or catch with fuzzy match at a lower confidence. I deliberately did not try to normalize every typographic variation. The regex would grow and the edge cases never end.
The better question was: what fraction of real PO references in the database fail normalization? I wrote a one-off script against a synthetic test set of 2,000 generated invoice references and measured how many failed each strategy in turn. Exact caught 58%. Normalized added another 27%. Fuzzy caught another 8%. Vendor+amount caught 3%. Vendor+date caught 2%. Unmatched: 2%.
The 58% exact match result was not a surprise. Most vendors copy-paste the PO number into their system once and then their system emits the same string forever. The 27% for normalized was the argument for writing the regex at all. The 3% vendor+amount rescue was the argument for keeping strategies 4 and 5 even though they look weak.
The One That Has No PO Reference at All
Strategies 4 and 5 run with no poReference string. The invoice arrived without a PO number written on it. Strategy 4 asks: are there any open POs for this vendor whose total matches the invoice total within 5%? Strategy 5 asks: are there any open POs for this vendor issued within 90 days of the invoice date?
The confidence on these is deliberately low. 0.65 and 0.50. Both are below the auto-approve threshold (0.95) and below the standard-approval threshold (0.70). A match at this confidence routes to escalated review, every time, with no exception. Which is the right behavior.
What these strategies buy you is the difference between "unmatched, no PO" and "possibly matched, here's a weak candidate." The human reviewer gets a starting point. Instead of opening the invoice in an empty state and searching manually, they open it with a suggested PO pre-attached and a 0.50 confidence label that tells them: we think this is the one, don't trust us.
I was wrong about this initially. My first instinct was that low-confidence matches are worse than no match at all. A human opening an invoice with a suggested PO will often accept the suggestion without checking. That's a real risk. I solved it at the approval routing layer: any match below 0.70 confidence is force-routed to the escalated queue, and the approval UI shows the confidence score prominently with the breakdown. The reviewer sees {po_match: 0.50, method: "vendor_date"} and knows to verify the PO number manually before clicking approve.
The system does not make the low-confidence decision. It surfaces the low-confidence candidate.
The One Tradeoff I Accepted
The cascade loads every PO for the tenant into memory on every invoice. For the normalized and fuzzy strategies, there is no index I can use: normalization happens after the row is in memory, and Jaro-Winkler needs the full string of both candidates. The query is SELECT * FROM purchase_orders WHERE tenant_id = ? and then the filtering is in Kotlin.
At 10,000 active POs per tenant, this is fine. The loop is a few milliseconds. At 100,000 active POs, it would start being noticeable. At 1 million, the cascade would need a different shape: pre-compute a normalized PO number column, index it, and do the normalized lookup in SQL. The fuzzy match would need a trigram index (pg_trgm is already enabled in migration V010) backing a SIMILARITY query.
I have not built that yet. The largest tenant on the system has about 3,000 open POs at any time, and the current approach takes under 10 milliseconds per invoice on that workload. Building the indexed version now would be optimization ahead of the constraint.
The scale plan is in the code comments. When a tenant crosses the threshold, the resolver gets a second implementation path. The interface stays the same.
What the Match Record Actually Stores
Every match result writes a JSONB breakdown. {po_match: 0.95, method: "normalized"}. The method string is the strategy that matched. That field is what makes the whole design defensible to operators.
When a finance lead looks at a low-confidence invoice in the approval queue, they don't just see "0.73 confidence." They see the breakdown:
{
"po_match": 0.95,
"line_match": 0.72,
"receipt_match": 0.50,
"price_match": 0.78,
"overall": 0.73
}
Plus the PO resolution method label: "normalized". Now the reviewer knows the PO was found with reasonable confidence (0.95, normalized match), but the receipt verification dropped to 0.50 (no receipt found on that PO line yet), which pulled the overall score down. The remediation is obvious: wait for the goods receipt and rerun matching. Not reject the invoice.
That information existed in the old binary matcher too. It just wasn't exposed. The system knew why a match was weak and threw the reason away. Writing it to JSONB was the cheap part. Deciding to write it at all was the design decision.
What I'd Reconsider
The 0.50 confidence floor on vendor+date matching is doing work, but it's doing work that's not fully visible in the match record. A "vendor_date" method label with 0.50 confidence tells an operator that the invoice had no PO reference and we fell back to the weakest strategy. It does not tell them whether the vendor has three other open POs in the same date range that were almost as good a match.
If I redid this, I'd return the top three candidates from each strategy, not just the best one. The alternatives field on ConfidenceScore exists for exactly this and is currently always empty. The code comment in ConfidenceScorer.kt references it as planned work. An invoice that fell to strategy 4 with three equally plausible POs should surface all three to the reviewer with their individual confidence scores. Right now the reviewer sees one candidate and either accepts or searches manually.
That's the kind of detail that looks trivial and isn't. A single weak candidate is worse than a ranked list of weak candidates, because the single candidate suggests certainty the system doesn't have. Building the alternatives list is two afternoons of plumbing. Shipping without it is the tradeoff I made to hit the first release. It's the first thing I'd add in a second pass.