Self-improving AP extraction: turning user corrections into better accuracy

Most invoice-capture tools extract a field the same way forever. The first time a supplier's invoice comes through, the engine guesses where the invoice number is. The ten-thousandth time, it makes the same guess — even if your AP team has corrected that exact field hundreds of times in between. The corrections get saved, the invoice gets paid, and the knowledge evaporates.

That's a broken feedback loop. The data is there; nothing reads it back. Closing that loop is what turns a static OCR step into an extraction engine that gets measurably better at your suppliers over time.

Where the signal already lives

Every time an AP clerk fixes an extracted field — changes "Reference #" being mapped to the wrong place, or corrects which line on the page is the grand total — that's a labeled training example. A well-built pipeline already stores these as user-correction records: for a given supplier, the raw label that appeared on the document and the canonical field it should map to.

The problem is purely architectural: the extraction step never consults that history. Fixing it doesn't require retraining a model from scratch on every invoice (an expensive, fragile pattern). The modern document-AI services — including AWS Bedrock Data Automation, which sits behind a Claude model — don't expose fine-tuning at all. What they do expose is the blueprint: a JSON schema that tells the model which fields to extract and how to recognize them on the page.

That blueprint is the lever. A field instruction like "the invoice total — this supplier prints it as 'Total Due'" converts an accumulated pile of corrections into a concrete extraction hint the model applies on every future document from that supplier.

Building supplier-specific hints from corrections

The mechanism is straightforward once you frame it correctly:

Group corrections by supplier and target field. For each field that's been corrected enough times, find the most common raw label the supplier actually uses.
Append, don't replace. Start from a generic base schema and add the supplier-specific hint to each field description. If that label happens to be absent on a given invoice, the generic instruction still works — the blueprint degrades gracefully.
Set a minimum threshold. A single correction is noise; it could be one clerk's one-off mistake. Require a handful of consistent corrections before a pattern is trusted enough to bake into the blueprint. Suppliers below the threshold keep using the generic default — so there's never a regression for low-volume vendors.

The accuracy lift comes almost entirely from eliminating label-mismatch misses — cases where the field is plainly visible on the document but the model fails to associate it because the supplier uses non-standard wording. Pointing the model at the right label is often the difference between a field that's found 70% of the time and one found 90% of the time.

The part that makes it safe: regression-gated rollout

A self-improving system that can also self-degrade is a liability. The discipline that makes continuous learning trustworthy is treating every blueprint update like a code deploy — with a test suite and a rollback path.

The pattern looks like this:

Version every change. Before updating a supplier blueprint, preserve the previous version's identifier. The current version is always recoverable.
Test against known-good invoices. Pull a small set of recently approved invoices for that supplier where the correct values are already in the database — your ground truth. Re-run extraction with the new blueprint and compare field-by-field.
Promote only if accuracy holds. Compute the new blueprint's accuracy against ground truth. If it meets or beats the established baseline (within a small tolerance), promote it and update the baseline. If accuracy drops past the threshold, automatically roll back to the previous version and raise an alert.

This is what separates "AI that learns" marketing copy from a system you'd actually trust to touch production extraction. New corrections can only improve a supplier's blueprint, never silently make it worse — because a worse blueprint never gets promoted.

Triggering the loop without slowing anything down

Two triggers, working together, keep the loop current without adding latency to the path an invoice takes:

Threshold-based. When a supplier crosses a correction-count milestone, fire an asynchronous refresh — fire-and-forget, so the clerk's correction is saved instantly and the blueprint rebuild happens in the background.
Scheduled sweep. A weekly job picks up any supplier that accumulated new corrections but never hit the inline threshold. Active suppliers improve quickly; dormant ones still get swept.

Crucially, the rebuild work runs entirely outside the invoice-processing path. The clerk never waits; the next invoice from that supplier simply arrives with a sharper blueprint.

Where this fits

This kind of feedback loop is part of what makes ERP-native AP automation durable rather than brittle. The same extraction layer that posts cleanly into Oracle EBS and Fusion also has to keep getting better at the long tail of supplier formats no generic template ever covers — and it does that by learning from the corrections your team is already making. It's the same philosophy behind cross-source confidence scoring: real automation reasons about its own outputs instead of trusting a single static guess.

Where EZ Cloud fits

EZ Cloud's extraction layer treats every user correction as training signal. Corrections accumulate per supplier, get translated into blueprint hints once a real pattern emerges, and roll out only through a regression gate that compares the new blueprint against known-good invoices before promoting it. Accuracy climbs on the suppliers you actually process — and a bad update can never make it to production.

If your current capture tool extracts the same way on day one and day five hundred, the corrections your team keeps making are worth far more than they're currently getting credit for. That's a conversation worth having.