AIClaudeLLMSaaSRestaurant Tech

What I've Learned Integrating Claude AI into Production Applications

March 1, 2025

There's a gap between the demo and production for every AI integration I've worked on. The demo is five lines of code and a screenshot. Production is weeks of edge cases, cost management, latency handling, and prompt iteration. Here's what I've actually learned.

Receipt OCR with Claude Vision

The first real Claude integration I shipped was in Restaurant OS: parsing handwritten and printed bills from suppliers using Claude Vision. The use case is simple — a restaurant manager photographs a delivery receipt, the system extracts line items, quantities, and prices, and pre-fills the inventory update form.

The naive approach works 80% of the time. Send the image, ask for JSON, parse the response. The remaining 20% is where the real work lives:

Inconsistent document layouts. Supplier invoices have no standard format. Some are printed, some handwritten. Some have totals at the top, some at the bottom. Some list items with codes, some with descriptions, some with both.

The prompt engineering for this took longer than the integration itself. The prompt that works in production is structured: it describes the task, gives examples of the output schema, explicitly handles ambiguity ("if a quantity is unclear, use null"), and asks the model to flag low-confidence extractions.

Validation layer. We never pass Claude's output directly to the database. Every extracted value runs through a validator: numbers are within plausible ranges, item codes match known SKUs where possible, totals are arithmetically consistent with line items. Claude makes mistakes. The validator catches most of them.

Human-in-the-loop for failures. When confidence is low or validation fails, the form is pre-filled but flagged for manual review. The manager can correct any field. Corrected records go into a feedback loop — we log what Claude extracted vs. what the manager corrected, which informs prompt revisions.

Structuring prompts for reliability

The most important thing I've learned: treat prompts like code. Version them. Test them. Don't change them without measuring the impact.

For production prompts, the structure that works:

Role and context — brief, but specific. "You are extracting structured data from a supplier invoice photograph."
Task description — what you want, stated clearly.
Output schema — JSON schema or a literal example of the expected structure.
Edge cases — explicitly tell the model how to handle ambiguity, missing data, and multiple valid interpretations.
Confidence signalling — ask the model to indicate uncertainty rather than guess.

Systems prompts that work in the playground often fail on real-world inputs. Build a test set of real documents (anonymised if needed) before any prompt change, and run the whole set after.

Latency and cost

Claude Vision calls for receipt OCR average around 3–4 seconds on the Haiku model, which is fast enough for the use case — managers aren't waiting at a terminal. For use cases where sub-second response is required, Vision is a different equation.

On cost: per-call pricing is fine at low volume. As volume scales, you start watching token counts. Compressing images before sending (within the quality bounds needed for accurate OCR), batching where possible, and caching responses for identical inputs all matter.

For the receipt case, we don't cache — invoices are always different. But for features like "summarise this menu item description" or "suggest a category for this product", caching on the input hash eliminates a large fraction of API calls.

The part no one talks about: user trust

The hardest part of shipping AI features isn't the AI. It's getting users to trust the output enough to actually use it.

Restaurant OS managers are busy people. They will not spend time correcting AI mistakes — they'll abandon the feature and go back to manual entry. The product has to be right often enough, and obvious enough about when it might be wrong, that it's reliably faster than the alternative.

That means: default to showing the AI output alongside the source document so users can verify at a glance. Use confidence indicators. Make corrections easy. Track correction rates per feature and per user segment.

A feature that's right 90% of the time with clear flagging of the 10% is usable. A feature that's right 95% of the time with no error signals gets abandoned the first time it's wrong without warning.

What I'd build next

The receipt OCR feature is in production and working. The next logical step is active anomaly detection: compare the AI-extracted delivery totals against historical purchase orders and flag discrepancies automatically. That's a small model or rule layer on top of the extraction pipeline, not another Vision call.

The broader lesson: AI integration isn't a feature, it's an infrastructure layer. The value compounds when you build the data loop (extract → validate → correct → feedback) from day one, rather than treating it as a one-way street from model to UI.

If you're building an AI integration and want to talk through the architecture, reach out at hello@rgcs.ca.

RGCS

Vancouver-based software & AI studio