scriva documentation

scriva — documentation index

scriva is a composable, engine-agnostic OCR framework for Python built on two pillars:

Worker — converts image regions or cells into text. Worker = image region → text + confidence
Orchestrator — handles everything around the Worker. Orchestrator = preprocess → split → assign workers → run OCR → review → reconstruct → export

The one-liner

import scriva
text = scriva.read("scan.png")

That uses sensible defaults end-to-end. When you want to swap a model, change a prompt, or stream events, drop down to a Worker / Orchestrator recipe — every default above is one chained method away.

Three ways in

import scriva
from scriva.schemas import Invoice

scriva.read("scan.png")                          # text out
scriva.extract("acme.pdf", schema=Invoice)       # typed pydantic instance out
scriva.presets.invoice("acme.pdf")               # tuned recipe + schema

Pick the highest-level entry that fits and drop a rung when you need more control. See scriva.extract and Presets.

A full recipe

from scriva import Orchestrator, Worker

worker = (
    Worker.openai("gpt-4o")
    .cache(".scriva_cache")
    .score(method="rendering")
)

recipe = (
    Orchestrator()
    .deskew()
    .split.grid()
    .classify(blank=True, merged=True)
    .recognize.by_kind({
        "text":     worker,
        "number":   Worker.number(),
        "date":     Worker.date(),
        "checkbox": Worker.checkbox(),
        "blank":    Worker.skip(),
    })
    .review.hitl(when=lambda r: (r.confidence or 0) < 0.7)
    .reconstruct.grid()
    .export.xlsx("out.xlsx")
)

result = recipe("scan.png")

Three pillars for high-accuracy production OCR

When the project’s bar is “wrong answers are unacceptable,” scriva exposes three composable pillars. Every other reference page slots into one of them.

Learn from your environment. Keep a SampleStore of your own corrections. The same store powers few-shot exemplars on the Worker (Worker.few_shot(store)) and derived dictionaries on the Orchestrator’s reconstruct phase (reconstruct.dictionary.from_samples(...)). One store, two roads.
Cross-check, then route. Three accuracy levers — human-in-the-loop review, cross-check against the original crop (round-trip rendering or multi-Worker consensus), and cross-check against ground-truth data — compose freely. See Orchestrator › High-accuracy patterns.
Manipulate the pixels the Worker actually sees. Page-level preprocessing (orientation, deskew, dewarp) and region-level preprocessing (per-cell binarisation, horizontal/vertical slicing, per-role padding, glare removal, whiteboard clean-up) are both first-class. See Preprocessors.

Read in order

Concepts — the two pillars and the primitives.
Quickstart — a runnable end-to-end example.
Architecture — how the pillars compose.

Reference — high level

scriva.extract — schema-first one-liner, classify, batch, watch.
Presets — pre-tuned recipes per document kind.
Schemas — built-in pydantic models (Invoice, Receipt, IdCard, …).
Working with results — DocumentResult accessors and serialisation.
CLI — scriva extract, scriva watch, scriva eval, scriva annotate.

Reference — the two pillars

Worker — engines, kinds, decorators, composition.
Orchestrator — steps, options, events, accuracy patterns.

Reference — adapters

Preprocessors — .rotate() .crop() .deskew() .denoise() plus region refine.
Detectors — .split.grid() .split.vertical() .split.horizontal() .split.boxes().
Post-processors — .classify(...) and .reconstruct.* adapters.
Exporters — .export.xlsx() .csv() .json() .html() .parquet() .markdown() .debug() .samples().
Caching — Worker.cache(...) in depth.
Sample stores — Worker.few_shot(...) and active learning.

Reference — production

Reliability — retries, rate limits, timeouts, cost caps, PII redaction, telemetry.
Evaluation — scriva.eval, ground-truth format, calibration.

Recipes

Cookbook — worked recipes per document kind and per workflow.
Domain packs — pre-built recipes for forms, P&ID, agentic extraction, annotation.
Orchestrator › Accuracy patterns — HITL, confidence-driven re-OCR, Worker diff.
Cookbook › Rebuilding ocr-agent — case study porting a Japanese-forms / P&ID / annotation app, plus FastAPI hosting.

One-line map

You want to…	Read…
just read an image to text	Quickstart › one-liner
extract a typed pydantic instance	`scriva.extract`
understand the design	Concepts
OCR an invoice / receipt / ID	Cookbook
OCR a tabular form to Excel	Domains › forms
route an unknown document	`scriva.classify_document`
ingest a folder continuously	Cookbook › Batch & watch
swap OpenAI for Anthropic	Worker › engine factories
OCR numbers / dates / checkboxes specifically	Worker › kind factories
add a new output format	Exporters
serialise a result on the fly	Results › serialisation verbs
cut API spend	Caching
set a hard cost cap	Reliability › Cost caps
retry on 429 / 5xx	Worker › `.retry()`
redact PII before sending to a VLM	Reliability › PII redaction
stream pages from a 100-page PDF	Cookbook › Long PDFs
score a recipe against ground truth	Evaluation
OCR pages in parallel	Orchestrator › options
straighten a rotated scan	Preprocessors › deskew/orient
binarise / sharpen each cell separately	Preprocessors › Region refine
slice a tall cell into rows	Preprocessors › Slicers
use a few-shot exemplar from past corrections	Worker › `.few_shot()`
derive a correction dictionary from a labelled store	Post-processors › `dictionary.from_samples`
let a human review detected cells	Orchestrator › Lever 1: HITL
cross-check the recognised text against the crop	Orchestrator › Lever 2: Cross-check vs original
cross-check a run against ground truth	Orchestrator › Lever 3: Cross-check vs ground-truth
re-OCR only the low-confidence cells	Orchestrator › 2c Confidence-driven re-OCR
plug in your own ML detector	Detectors › Writing your own
write a one-off step	Orchestrator › Writing your own step
use scriva from the shell	CLI
port a FastAPI OCR app onto scriva	Cookbook › Rebuilding ocr-agent