How AI Extraction Fits Into Your Archival Workflow
AI proposes, curators decide: how a three-layer review keeps machine-generated metadata accountable and nothing reaches the public portal without human approval.
Automation in the archive raises a fair question: if a machine writes the metadata, who is accountable for it? The honest answer is that the machine should never have the last word. AI is useful for the slow, mechanical parts of description, but a catalogue record is an assertion about the historical record, and assertions need a person behind them. The model that works treats AI as a drafting assistant, not an authority.
AI proposes
When material is ingested, extraction runs in the background and produces draft enrichments rather than finished metadata. In practice that means a handful of distinct tasks:
- Text extraction (OCR) from images and scanned PDFs
- Transcription of audio and video, with speaker separation and timestamps
- Named-entity recognition for people, organisations, places and dates
- Keyword and subject suggestions to support discovery
- Caption generation for visual material, in brief, detailed or archival styles
Each output is a proposal attached to the item. Nothing is written into the authoritative description automatically, and nothing is exposed to the public on the strength of the model alone.
Curators decide: three layers of review
The decision sits with the curator, and the workflow is built around three distinct layers so that the AI's contribution is always separable from the human's.
- AI draft. The model's raw output is captured as an immutable snapshot. It is never silently overwritten, so you can always see exactly what was extracted and when.
- Curator review. Each extracted field can be accepted, edited or rejected. A word-level diff shows precisely where a curator changed the machine's text, so corrections are visible rather than buried.
- Published snapshot. Only material a curator has approved is frozen into a published version, with its own history. Earlier published states are retained, so a record's public-facing description can be traced over time.
Keeping these layers separate is what makes the system answerable. The AI draft, the reviewed working copy and the published version are three different things, and at any point you can ask what the model said, what the curator decided, and what the public can see.
Provenance and the public boundary
This structure exists to protect provenance, which is the discipline's first commitment. Because every extraction carries a trail from machine proposal to human decision to published record, you can demonstrate not just what a description says but how it came to say it. That distinction matters when a researcher, an auditor or a future archivist asks why a record reads the way it does.
The model can read a thousand pages overnight. It cannot be accountable for what they mean. That is, and remains, the curator's work.
The boundary is deliberate: nothing crosses into the public portal without curator approval. AI shortens the distance between a box of unprocessed material and a usable draft, sometimes dramatically. What it does not do is decide what the archive publishes. The work moves faster; the judgement stays human.
See it on your own collection.
Upload a few records, run the AI, and publish a finding aid — before the next post lands.