Archival practice8 April 2026·8 min read

Why we built Archively.AI on standards, not promises

Every catalog tool claims to be 'ISAD(G) compliant'. Here's what that actually means in our data model — and what we refused to compromise.

Rafiq Hossain

Walk into any catalog software demo and the first slide will tell you the product is "ISAD(G) compliant", "Dublin Core ready", and "MARC-friendly". The second slide shows a flat form with twenty-six fields and an export button. The third slide is a logo wall.

A field labelled Reference code is not compliance. A dropdown of Dublin Core element names is not interoperability. What standards actually require is a data model that can express the relationships the standards are written to preserve. We built Archively.AI starting from that data model, and the product surface followed.

This post is about the four places we refused to compromise on the model, what each cost us in time-to-first-record, and what each unlocks at publish time.

Compromise one we refused: multi-level description as a tree, not a tag

ISAD(G) is built on the principle that description happens at multiple levels — fonds, sub-fonds, series, file, item — and that a record only makes sense in the context of its parent. The easy way to support this is to add a "level" dropdown to a flat record and call it done.

The honest way is to give every record a parent and let the hierarchy be a real tree. We picked the honest way. Our Item and Fonds entities use a closure-table pattern so an ancestor lookup is one indexed query, regardless of how deep the tree gets, and inheritance of context (creators, dates, rights) propagates down the tree at read time.

The cost: archivists who came from spreadsheet catalogs have to think about parents. The payoff: a published finding aid actually shows the structure the records have, not a flat list with a "Series" badge on each row.

Compromise two we refused: authorities are entities, not strings

Most tools store a creator as a free-text field. "Jane Goodall" appears 41 times in your catalog. One of them is misspelled. Two have her death date wrong. None of them link to anything.

EAC-CPF, ISAAR(CPF), and every aggregator we want to publish to assume an authority record — a Person or Corporate Body with its own description, dates, biography, sources, and a stable identifier — and assume your item records link to it. We built that. Person and Organization are their own modules; items reference them with foreign keys; the same Jane Goodall is one row.

The cost: when a curator types "Jane Goodall" into the Creator field, we have to disambiguate against existing rows, and that interaction has to be fast. The payoff: when you publish an EAC-CPF record for Jane Goodall, every item the institution holds about her shows up automatically. There is no shadow list to reconcile.

Compromise three we refused: persistent identifiers are minted, not borrowed

Every item in Archively.AI can be minted an ARK or a DOI. The minting service is built in, the resolver is configured, and the identifier is durable across rename, move, and even tenant migration. We disable it by default — most pilot tenants do not need persistent identifiers on day one — but the schema is wired so turning it on is a settings toggle, not a migration.

The cost: extra columns and a quiet dependency on an ARK NAAN or a DOI prefix. The payoff: when a researcher cites your record in a journal article and your URL structure changes three years later, the citation still resolves.

Compromise four we refused: provenance preserved through AI

When the AI generates a description, we store the AI output in its own immutable column (AiExtractionJson). When the curator edits it, the edits go to a separate draft. When they publish, the published snapshot is stored again in its own immutable column. Three layers, three sources of truth — none of them overwrite the others.

This sounds like ceremony. It is the only way to honestly support PREMIS-style provenance for an AI-augmented workflow. When someone asks "did a human approve this description?" or "what changed between version 2 and version 3?", we can answer.

What this costs, and why it is worth it

A vendor who flattens these four things ships faster. The first ten records get into their tool faster than they get into ours. We accept that. The cost compounds for them at publish time, when the aggregator rejects the export, the citation breaks, the authority records have to be hand-built, and the AI provenance is gone. Our cost compounds in the other direction: every record we hold is one we can publish without translation.

Standards are not features. They are the shape of the data the field has agreed on so that records survive the institution that holds them.

If you are evaluating archival software in 2026, the question is not "does it support EAD?". The question is "what does it export, and will the aggregator accept the file without manual cleanup?". The first question has a marketing answer. The second has only an engineering one.

standardsISAD(G)data modelarchival practice

← Back to blog

See it on your own collection.

Upload a few records, run the AI, and publish a finding aid — before the next post lands.

Start free trial Book a demo