Files & AI extraction

Every file type. Every AI layer.

A single file detail surface that adapts to images, PDFs, audio, video, Office documents, and data. Six AI extractions per file, kept in a dual AI / Reviewed pattern so curators always control what publishes.

6 file families6 AI extractionsDual-version patternPer-field status

Book a demo See pricing

File detail surface with viewer and extractions

The right viewer for the right file.

A unified detail page that knows what a TIFF is, what an MP3 is, and what an EML is — and shows the right tools, fields, and extractions for each.

Image

IIIF deep-zoom viewer; width, height, DPI, color space, bit depth, EXIF — preserved on import.

PDF

Native PDF viewer with smart text vs scanned detection.

Audio

Styled waveform player with timestamped transcript and proofreader.

Video

Native player with VTT captions, client-side thumbnails, full-viewport proofreader.

Word / Spreadsheet

Office docs and CSV / TSV with row, column, font detection.

Email

EML / MSG with header parsing, attachment listing, body extraction.

AI says "here's a draft." Curators say "here's the truth."

Every AI extraction lives as two versions side by side — an immutable AI snapshot and an editable Reviewed version. Each field carries its own status: pending, accepted, modified, or rejected. Nothing publishes without explicit curator sign-off.

AI Generated — immutable, read-only, what the model produced
Reviewed — your working copy, with pristine / dirty tracking
Per-field status: pending · accepted · modified · rejected
Copy from AI to seed the review, then edit by hand
Accept All for fast triage on long extractions

Six extractions per file

The metadata an archivist actually wants.

Not just an OCR dump. Six structured extractions land on every processed file — each with its own review surface.

Caption

One-sentence AI description, with an editable reviewed version.

Keywords

Auto-extracted tags with add/remove on review and live count comparison.

Transcript

Timestamped, speaker-diarized, per-entry confidence — accept or edit per line.

Entities

People, organizations, locations, dates, events, terms — confidence + source.

Locations

Geo-extracted places with type, coordinates, and confidence.

Sentiment

Overall tone (positive / neutral / negative / mixed) with breakdown bar.

Technical metadata grid for a file — Per-file-type technical metadata — extracted at upload, preserved for PREMIS

Technical metadata, automatically.

Everything PREMIS expects, lifted from the file itself on upload — no manual entry, no spreadsheet round-trip.

Image: Width · Height · DPI · Color · EXIF
Audio: Duration · Codec · Bitrate · Sample · Channels
Video: Duration · Codec · Bitrate · Frame rate · Resolution
PDF: Page count · Word count · Fonts · Language
Data: Row count · Column count · Column names
All: Checksum + algorithm for fixity

Upload a thousand files. Walk away.

Drag-and-drop a folder. Watch a queue with per-file progress, speed, and status. Pause, resume, retry, or remove individual files. Batch-clear completed and cancel-all when you need to.

Status tracking: Queued · Uploading · Processing · Completed · Failed · Paused
Per-file Pause / Resume / Retry / Remove
Batch operations: Clear Completed · Cancel All
Overall progress with active / completed / error counts
Target an existing item or auto-create a new one

Drag-and-drop upload queue with per-file progress — Drag-drop a folder, watch every file through the queue

When the file changes

Re-scan, re-render, never re-catalogue.

Send the photograph back through the scanner at higher resolution? Re-export the oral history with cleaner audio? Drop the new version straight on top of the old one. Your title, your reviewed transcript, the published snapshot, the item it's attached to — all of it stays put. A small version tag tells future readers the file was refreshed.

The new file inherits the old file's catalogue record
Reviewed metadata, item links, and published snapshots survive the swap
A version tag (v2, v3 …) appears in the file header so the change is visible
Crops and annotations reset — they belong to the pixels you just replaced

File detail action menu with Replace and Delete — Replace · Delete · Crop & annotate — all from the file detail page

Network view

See how your files relate to each other.

Switch the Media Library from grid to network and the files arrange themselves into clusters — the same survey, the same photographer, the same recurring subject. Drag a node to reposition it. Click one to make it the centre of the world and watch the rest re-orient around it. Hit Find similar on any file to open a focused star-graph of its closest matches.

Three view modes on the same library: Grid · Table · Network
Files cluster automatically by shared subjects, tags, and AI signals
Pick any file to recentre the graph — its closest matches pull in
Find similar opens a star-graph centred on the source file

Media library

Spring-clean without holding your breath.

One grid of every file in your archive. Find the test uploads, the duplicate scans, the abandoned drafts — select a hundred at a time and clear them. The system quietly protects anything that's already attached to a catalogue record, so a busy afternoon doesn't end with an apology email to your director.

Filter by file type, processing status, or creator
Select hundreds of files at once and act on them in batches
Files attached to items are protected from accidental cleanup
A confirmation step on every destructive action — no surprises

Throw a real file at it.

Upload an oral history, a manuscript, a TIFF, an EML — see the extractions land in minutes.

Start free Book a demo