An invoice arrives. OCR pulls the supplier name off the letterhead: ACME INDUSTRIAL SVCS. Your Oracle vendor master has the same company stored as Acme Industrial Services Pty Ltd. To a human, those are obviously the same vendor. To a naive string comparison, they're three different companies.
Resolving the extracted supplier name to the correct Oracle vendor is one of the quietly hard problems in AP automation. Get it wrong in one direction and you create a duplicate vendor or block a legitimate invoice; get it wrong in the other and you pay the wrong supplier. The robust answer isn't one clever algorithm — it's a tiered strategy that tries the safest match first and only loosens its criteria when it has to.
Why exact matching is never enough
Supplier names diverge from the vendor master for entirely mundane reasons:
- Case and punctuation —
ACMEvsAcme,Smith & Co.vsSmith and Co. - Legal suffixes — the invoice says
Acme Industrial, the vendor master saysAcme Industrial Pty Limited. - Short names and DBAs — the printed name is a trading name; the ERP stores the registered entity.
- OCR noise — a stray character, a dropped space, a misread letter.
A single exact-match query against the vendor name will miss the overwhelming majority of real invoices. But the opposite extreme — fuzzy-matching everything — is dangerous, because it will happily map Acme Industrial onto Acme International and route a payment to the wrong company. The discipline is to escalate deliberately, never to start loose.
Tier 1: Exact match
The first pass is a case-insensitive exact comparison against the supplier name in the vendor master. When it hits, you're done — this is the highest-confidence outcome and it should always be tried first.
In practice you normalize case on both sides (LOWER() on the column and the input) so ACME and Acme collapse to the same string. If a unique vendor comes back, accept it. Most invoices from established, repeat suppliers resolve right here, which is exactly what you want: the cheapest, safest tier handles the common case.
Tier 2: Normalized prefix / LIKE match
When exact match fails, the cause is usually a suffix or trailing-token difference rather than a genuinely different company. This is where a normalized prefix match earns its keep.
The pattern that works well in production: take the first few significant characters of the extracted name and look up vendors whose stored name begins with that prefix:
SELECT supplier_name
FROM vendor_master
WHERE supplier_name LIKE CONCAT(LEFT(:extracted, 4), '%')
AND is_active = 1
ORDER BY LENGTH(supplier_name) ASC
LIMIT 1
This deliberately matches Acme against Acme Industrial Services Pty Ltd even though the full strings differ. Ordering by length and taking the shortest candidate biases toward the cleanest canonical record rather than an oddly-suffixed site variant. There is a real, related case in extraction pipelines: OCR produces a Textract-style short name while the training and vendor data are stored under the canonical ERP name — a prefix lookup is what bridges the two so downstream field mapping doesn't silently return nulls.
The prefix length matters. Too short (one or two characters) and you pull in unrelated vendors; four characters is a reasonable floor for most Latin-script names. Always constrain to active vendors so retired records don't win the match.
Tier 3: Fuzzy match — with a confidence floor
Only when both exact and normalized matching fail do you reach for fuzzy comparison — a similarity ratio (Levenshtein-style or sequence-matching) between the extracted name and candidate vendor names. This is the tier that handles OCR noise and word-order differences.
The non-negotiable rule here is a confidence floor. A fuzzy match below a high threshold (a similarity ratio in the region of 0.9) should not auto-resolve — it should flag for human review instead. Auto-accepting a 0.7 match is how Acme Industrial becomes Acme International in your payment run. Fuzzy matching exists to catch the near-misses, not to guess.
A strong identity anchor, when the invoice carries one, can short-circuit all three tiers. In jurisdictions with a business registration number — an ABN in Australia, for example — extracting and validating that number and matching it against the vendor master is far more reliable than any name comparison, because it's an exact key rather than a string. When you have it, use it first; fall back to the name tiers when you don't.
Tying the tiers together
The full resolution flow is an ordered cascade:
- Identity number (if present and valid) — match on the registration number; highest confidence.
- Tier 1 — exact, case-insensitive name match against active vendors.
- Tier 2 — normalized prefix / LIKE match, shortest canonical candidate wins.
- Tier 3 — fuzzy similarity, auto-accept only above a strict floor, otherwise route to a human.
- No confident match — never invent a vendor; surface the invoice for manual supplier assignment.
The principle throughout is that each tier is safer than the one below it, so you only loosen the criteria when the stricter tier has genuinely found nothing. This is what keeps a high auto-match rate from turning into a stream of mispayments.
Where EZ Cloud fits
EZ Cloud resolves every extracted supplier name against the live Oracle vendor master using exactly this escalating strategy — identity number first where available, then exact, then normalized prefix, then bounded fuzzy matching — and it refuses to auto-resolve a weak match rather than risk paying the wrong company. Reading the vendor master directly, the way Oracle exposes it, is part of our native ERP integration approach, and it's what makes touchless posting trustworthy rather than just fast.
Supplier resolution is foundational to everything downstream — duplicate detection, PO matching, posting — so it's worth getting precisely right. It's one of the details we built our Oracle EBS and Fusion integration to handle without creating duplicate vendors or misrouted payments.