ARCHANGEL: Trusted Archives of Digital Public Documents
We present ARCHANGEL; a de-centralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation — trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.
💡 Research Summary
The paper introduces ARCHANGEL, a decentralized platform designed to guarantee the long‑term integrity of digital documents stored in public archives. Recognizing that traditional archives rely on institutional reputation to inspire trust, the authors propose shifting that trust to cryptographic guarantees provided by distributed ledger technology (DLT). The core idea is to store a compact “content evidence” – essentially a hash of the document’s substantive content – together with identifying metadata on a blockchain, thereby creating an immutable, publicly verifiable record of each document’s state at the moment of deposition.
The architecture consists of four main stages. First, when a document arrives, a file‑format identification tool (the UK National Archives’ DROID) automatically determines its type regardless of filename. Second, a content‑extraction process computes a hash. The simplest implementation uses a standard binary hash (SHA‑256), but the design allows for format‑specific or machine‑learning‑based feature extraction (e.g., deep neural networks for scanned images) that produce robust, invariant signatures. Third, the system bundles the hash, a globally unique identifier (GUID), a hash‑algorithm identifier, and supplemental metadata (deposition date, curator notes, etc.) into a single transaction. Fourth, this transaction is appended to a blockchain, making the evidence permanently readable and searchable.
Two consensus models are discussed. In a permissioned private network, multiple archives collectively run a proof‑of‑work (PoW) chain; collusion would require all participating institutions to act maliciously, which is unlikely. In a public network (the authors’ prototype uses the Ethereum mainnet), a smart contract with write‑only access (protected by a secret key) records the data. Here, an attacker would need to control a majority of the global mining power, an economically prohibitive scenario.
The prototype is built on the Ethereum testnet (Rinkeby) using Solidity smart contracts. A web‑based UI enables users to deposit documents, search by GUID, hash, or metadata, and verify integrity by recomputing the hash and comparing it to the on‑chain value. If a bespoke hashing algorithm was used, the hash of the algorithm’s code (or model) is also stored, allowing verification of both content and the method that produced it.
To assess practical relevance, the authors convened a workshop with 13 stakeholders from government archives, legal firms, and university research data management units. Participants interacted with the prototype and discussed its implications. Four major themes emerged: (1) Blockchain provides a “defender of the record,” offering cryptographic evidence that can counter growing public skepticism about digital authenticity; (2) Archives are increasingly open to emerging technologies and see blockchain as a useful tool for handling the deluge of digital material; (3) The immutable ledger can serve as an audit trail for curatorial actions, enhancing internal transparency and accountability; (4) The cross‑institutional, decentralized model encourages collaboration among archives, fostering a community‑wide standard for provenance verification.
The paper also acknowledges limitations. Recording hashes on a public blockchain incurs gas costs that may become significant at scale. Maintaining the reproducibility of custom machine‑learning hash functions requires secure storage and versioning of the models themselves. Legal acceptance of blockchain‑based evidence remains uncertain, and public education will be needed to convey the security properties of DLT.
Future work outlined includes exploring layer‑2 scaling solutions to reduce transaction fees, integrating zero‑knowledge proofs for privacy‑preserving verification, aligning the metadata schema with international archival standards, and establishing governance frameworks for long‑term algorithm preservation.
In summary, ARCHANGEL demonstrates that blockchain can move trust for public digital archives from institutional reputation to mathematically provable integrity. By recording immutable content evidence alongside rich metadata, the platform offers a transparent, tamper‑evident mechanism for verifying documents decades or even centuries after they are archived, potentially reshaping archival practice and public confidence in digital records.
Comments & Academic Discussion
Loading comments...
Leave a Comment