Architecture

scriva has two pillars: a Worker that converts an image region to text, and an Orchestrator that owns everything around it. This page is how they fit together.

Layering

┌──────────────────────────────────────────────────────────────────┐
│  domains/     forms.py    pid.py    agentic.py                   │  ← opinionated, ready-to-run recipes
├──────────────────────────────────────────────────────────────────┤
│  Worker factories                Orchestrator step adapters       │
│    engines: openai, anthropic,     preprocess: deskew, denoise…   │
│             bedrock, tesseract     split:      grid, vertical…    │
│    kinds:   number, date, text,    classify:   rule_based, ml…    │
│             checkbox, handwriting  review:     hitl_json, hitl_ui │
│    decor.:  cache, few_shot,       reconstruct: grid, document    │
│             score, retry, fallback export:     xlsx, csv, json…   │
├──────────────────────────────────────────────────────────────────┤
│  worker.py       orchestrator.py     types.py     events.py       │  ← core protocols + machinery
│  errors.py       __init__.py: Worker, Orchestrator, read, extract │  ← top-level surface
└──────────────────────────────────────────────────────────────────┘

Each upper layer depends only on the layer below it. The core layer has no dependency on any OCR engine, image library, or storage backend — those live in the adapter layer behind optional extras.

The flow of a single run

A recipe is read top-to-bottom; at runtime the Orchestrator drives the flow and the Worker is called once per region.

                          ┌────────────────┐
        recipe(source) ──►│   Document     │
                          └───────┬────────┘
                                  │ pages()
                                  ▼
                          ┌────────────────┐
                          │   Page (n)     │
                          └───────┬────────┘
                                  │
       ┌──────────────────────────┴───────────────────────────┐
       │                  ORCHESTRATOR                        │
       │                                                      │
       │   .deskew() / .crop() / .rotate() / .denoise()       │   page-level pixel prep
       │            ▼                                         │
       │   .split.grid() / .split.vertical() / …              │   page → regions
       │            ▼                                         │
       │   .classify()                                        │   blank / merged / role / kind
       │            ▼                                         │
       │   .recognize(worker)   or   .recognize.by_kind({…})  │   ┌───────────────┐
       │            │                                         │ ◄─┤    WORKER     │
       │            │  per region, parallel up to budget      │   │ crop ► text   │
       │            │                                         │ ◄─┤ + confidence  │
       │            ▼                                         │   └───────────────┘
       │   .review.hitl(when=…)                               │   (optional pause)
       │            ▼                                         │
       │   .reconstruct.grid() / .reconstruct.document()      │   regions → structure
       │            ▼                                         │
       │   .export.xlsx() / .csv() / .json() / .html() / …    │   structure → file(s)
       └──────────────────────────────────────────────────────┘
                                  │
                                  ▼
                          ┌────────────────┐
                          │ DocumentResult │  ← knows how to .to_excel(), .to_dict(), …
                          └────────────────┘

Every Orchestrator step is replaceable. Every Worker decorator is stackable. The two *-marked-style chains (multiple .export.* calls, multiple .classify() rules) compose as much as you like.

Where each capability lives

Capability	Pillar	Surface
Rotation, cropping, deskewing, denoising	Orchestrator	`.rotate() .crop() .deskew() .denoise()`
Vertical / horizontal / grid splitting	Orchestrator	`.split.vertical() .split.horizontal() .split.grid()`
Blank-cell and merged-cell detection	Orchestrator	`.classify(blank=True, merged=True)`
Cell classification → Worker assignment	Orchestrator	`.recognize.by_kind({...})` / `.by_role({...})`
Parallel OCR execution	Orchestrator	`.options(concurrency=N)`
HITL review for uncertain results	Orchestrator	`.review.hitl(when=…)`
Reconstruction into table / document	Orchestrator	`.reconstruct.grid() / .document()`
Export to CSV / JSON / XLSX / HTML / Parquet	Orchestrator	`.export.csv() .json() .xlsx() .html() .parquet() .debug()`
OCR per region or cell	Worker	`Worker.openai() / .anthropic() / .tesseract()`
Cell-type-specific OCR algorithms	Worker	`Worker.number() / .date() / .text() / .checkbox() / .handwriting()`
Confidence scoring	Worker	`.score(method="rendering" \| "logprob" \| …)`
Few-shot OCR	Worker	`.few_shot(store)`
Caching	Worker	`.cache(".scriva_cache")`
Retry / fallback / cascading	Worker	`.retry() / .fallback() / .escalate_to()`

The split is deliberate: anything specific to one crop belongs on the Worker; anything that depends on the whole page or document belongs on the Orchestrator.

Concurrency model

The Orchestrator is internally async. recipe(doc) is the sync entry point that wraps the coroutine; await recipe.aio(doc) is the async one.
Calling recipe(doc) from within a running event loop raises with a pointer to aio. We don’t try to be clever.
Each Worker exposes max_concurrency; the Orchestrator enforces it with an asyncio.Semaphore. You never call the Worker directly — the Orchestrator submits regions and the Worker batches.
Per-page work runs sequentially by default; pass .options(page_concurrency=N) to parallelise across pages. Per-region work inside a Worker is always parallel up to its budget.
When .recognize.by_kind({...}) dispatches to multiple Workers, each Worker has its own semaphore — slow Workers don’t starve fast ones.

Cancellation

recipe.cancel() is cooperative; it sets an asyncio.Event the Orchestrator and every standard Worker check at yield points. A long-running custom Worker that ignores the cancel flag will simply run to completion after recipe.cancel() — it will not deadlock, but it will not honour the cancel either. Wrap any blocking subprocess or HTTP call in asyncio.wait_for(..., timeout=...) if you cannot insert your own checkpoint.

Errors

Three error classes, all under scriva.errors:

ConfigurationError — raised at recipe-build time. Wrong wiring, missing capabilities, malformed config. Never raised mid-run.
EngineError — raised by an adapter when its upstream (an HTTP API, a binary on $PATH, a model file) failed. Carries engine.name and the original exception.
RecognitionError — raised when a region was attempted but no Worker could produce text. The Orchestrator’s error_policy decides whether this aborts the page or attaches an empty Recognition with error set.

Workers and steps never raise raw exceptions to the caller — they wrap, attach context, and emit a structured error event before re-raising.

Observability

Events are typed:

class Event(BaseModel):
    stage: str                   # "split", "recognize", "export", …
    kind: Literal["started", "progress", "finished", "error", "cache_hit"]
    timestamp: datetime
    payload: dict[str, Any]      # step-specific; see table below

Three subscription paths, one canonical recommendation each:

Callback — recipe(doc, on_event=callback). Best default.
Async iterator — async for item in recipe.events(doc): yields events and the final result through one channel.
SSE — scriva.events.to_sse(recipe.events(doc)) for web servers.

Every standard adapter emits events shaped for the SSE form so a UI can render a progress bar without further translation.

Standard payloads

The payload schema for every built-in step is stable and versioned alongside the JSON export schema. Custom Workers and steps may emit extra keys; UI code should ignore unknown keys.

stage	kind	payload keys
`preprocess`	`finished`	`ms`, plus step-specific (e.g. `angle` from `deskew`)
`split`	`finished`	`rows`, `cols`, `cells`, `ms` (grid) / `regions`, `ms` (other)
`classify`	`finished`	`blank`, `merged`, `ambiguous`, `ms`
`recognize`	`started`	`total`, `workers` (when dispatching)
`recognize`	`progress`	`done`, `total`, `cache_hits`, `region_id`, `worker`, `cell` (grid)
`recognize`	`progress`	`region_id`, `partial_text`, `tokens` (streaming workers only)
`recognize`	`cache_hit`	`region_id`, `tier` (`"exact"` or `"semantic"`), `similarity`
`recognize`	`finished`	`done`, `total`, `cache_hits`, `ms`
`review`	`paused`	`regions`, `sidecar_path`
`review`	`resumed`	`changed`, `ms_paused`
`reconstruct`	`finished`	`shape`, `regions`, `ms`
`export`	`finished`	`path`, `format`, `bytes`
(any)	`error`	`error`, `message`, `traceback`, `region_id` (when applicable)

The cell payload on a recognize/progress event is present when the region has region.grid set. Its shape is:

{
    "row": int,
    "col": int,
    "rowspan": int,
    "colspan": int,
    "text": str | None,
    "is_blank": bool,
    "confidence": float | None,
    "cache": {"tier": "exact" | "semantic", "similarity": float} | None,
}

This is the same shape recognition.to_event_dict() returns, so a UI that consumes one can consume the other.

Caching, in one sentence

The cache sits inside the Worker as a decorator: the Worker looks up each crop before calling its engine, records hits/misses in the event stream, and writes back on miss. Cache adapters never reach into engine adapters; engine adapters never know whether they are cached. The hit provenance shows up on recognition.cache.

What the core layer does not do

It does not render PDFs. PDF rendering is a Document plugin behind the pdf extra.
It does not call any HTTP API. Every API call lives in a Worker adapter.
It does not assume a storage backend. Cache, SampleStore, .export.*, and Document loading are all swappable.
It does not impose a database. Persistence of jobs/results is out of scope; if you need it, your application owns it and uses scriva for the pure OCR transform.

This is what makes the same library suitable for a CLI, a worker, a notebook, or a web server.