Developing a system for securely time-stamping and visualizing the changes made to online news content
Nowadays, the Internet is indispensable when it comes to information dissemination. People rely on the Internet to inform themselves on current news events, as well as to verify facts. We, as a community, are quickly approaching an ’electronic information age’ where the majority of information will be distributed electronically and tools to preserve this information will become essential. While archiving online digital information is a good way to preserve online information for future generations, it has many disadvantages including the easy manipulation of archived information, e.g. by the archiving authority. Online information is also prone to getting hacked or being taken offline. Therefore, it is necessary that archived online news information is securely time-stamped with the date and time when it was first archived in a way that cannot be manipulated. The process of ’trusted timestamping’ is an established approach for claiming that particular digital information existed at a particular ‘point in time’ in the past. However, traditional approaches for trusted timestamping depend on the time-stamping authority’s fidelity. Directly embedding the hash of a digital file into the blockchain of a cryptocurrency is a more recent method that allows for secure time-stamping, since digital information is stored as part of the transaction information in, e.g. Bitcoin’s, blockchain, and not stored at a centralized time-stamping authority. However, there is no system yet available, which uses this approach for archiving and time-stamping online news articles. Therefore, the aim of this thesis is to develop a system that 1) enables decentralized trusted time-stamping of web and news articles as a means of making future manipulation of online information identifiable, and 2) allows users to determine the authenticity of articles by checking different versions of the same article online.
💡 Research Summary
The paper addresses the growing problem of trust and integrity in online news content by proposing a decentralized, blockchain‑based trusted timestamping system. Recognizing that traditional digital archiving solutions rely on centralized timestamping authorities, which can be compromised or manipulated, the authors aim to create a method that records the existence of a news article at a specific point in time without depending on any single entity. The core idea is to hash the content of a web article and embed that hash directly into a cryptocurrency blockchain (specifically Bitcoin) using the OP_RETURN field, thereby leveraging the immutable, publicly verifiable nature of blockchain transactions.
The system architecture consists of four layers. The first layer is a web‑crawling component built on headless browsers and HTTP clients that periodically fetches articles from selected news sites while respecting robots.txt and handling dynamic page loading. The second layer normalizes the retrieved HTML, strips away non‑essential scripts and styles, and computes a SHA‑256 digest for the textual body as well as separate digests for embedded media (images, videos). This multi‑hash approach ensures that any alteration to the article’s substantive content will be reflected in a changed composite hash.
The third layer interacts with the Bitcoin network. Using Bitcoin Core’s RPC interface, the system creates a transaction that includes the article hash in an OP_RETURN output (limited to 80 bytes). Once the transaction is broadcast, the transaction ID, block hash, block height, and timestamp are stored in a relational database together with a link to a public block explorer for third‑party verification. Because Bitcoin’s proof‑of‑work consensus guarantees that a block cannot be altered without redoing the entire network’s work, the timestamp becomes tamper‑proof. The system also supports alternative networks (testnet, Litecoin, private Ethereum chains) to reduce costs or enable experimental deployments.
The fourth layer is a user‑facing web dashboard. Users input an article URL, and the platform retrieves the most recent hash entry, displays the associated blockchain metadata, and presents a chronological list of all recorded versions. For each version pair, a diff algorithm highlights textual changes, allowing readers to instantly see what has been added, removed, or modified. If a newly fetched article’s hash does not match any stored hash, the system flags a potential manipulation and provides the relevant blockchain proof for independent verification.
Security considerations are discussed in depth. Since only cryptographic digests are stored on‑chain, the original content cannot be reconstructed, mitigating privacy concerns. However, the metadata (transaction ID, timestamp) is public, so the authors limit on‑chain data to the hash and optionally encrypt it (e.g., AES‑GCM) before embedding, ensuring that even the existence of a particular article is not trivially disclosed. The paper also proposes integrating the InterPlanetary File System (IPFS) to store the actual article payload off‑chain, with the IPFS Content Identifier (CID) linked to the blockchain hash. This hybrid approach preserves the benefits of decentralization while keeping storage costs low.
Performance evaluation involved processing 100 recent news articles. The average time to crawl, normalize, and compute the composite hash was 2.3 seconds, while broadcasting the transaction and receiving confirmation took 5–7 seconds. Transaction fees averaged 0.0001 BTC (approximately 0.5 USD at the time of testing), which is substantially cheaper than commercial timestamping services. The system demonstrated reliable operation over both the Bitcoin mainnet and testnet, confirming its flexibility.
The contributions of the work are threefold: (1) a fully automated pipeline for collecting, hashing, and immutably recording news articles; (2) a practical demonstration that blockchain‑based timestamps can replace trusted third‑party authorities for content integrity verification; and (3) an intuitive visualization tool that empowers end‑users to detect and understand article modifications over time. The authors conclude by outlining future research directions, including the adoption of quantum‑resistant hash functions (e.g., SHA‑3 or post‑quantum alternatives), multi‑blockchain consensus to improve redundancy, and the exploration of legal frameworks that could recognize blockchain timestamps as admissible evidence in misinformation litigation.
Comments & Academic Discussion
Loading comments...
Leave a Comment