Multiple Presents: How Search Engines Re-write the Past

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Internet search engines function in a present which changes continuously. The search engines update their indices regularly, overwriting Web pages with newer ones, adding new pages to the index, and losing older ones. Some search engines can be used to search for information at the internet for specific periods of time. However, these ‘date stamps’ are not determined by the first occurrence of the pages in the Web, but by the last date at which a page was updated or a new page was added, and the search engine’s crawler updated this change in the database. This has major implications for the use of search engines in scholarly research as well as theoretical implications for the conceptions of time and temporality. We examine the interplay between the different updating frequencies by using AltaVista and Google for searches at different moments of time. Both the retrieval of the results and the structure of the retrieved information erodes over time.

💡 Research Summary

The paper “Multiple Presents: How Search Engines Re‑write the Past” investigates a largely overlooked dimension of web search: the way search engines continuously reshape the temporal record of the Internet. While search engines are praised for their ability to retrieve the most recent information, the authors demonstrate that the timestamps attached to indexed pages do not reflect the moment a page first appeared on the web, but rather the moment the engine’s crawler last observed a change. Consequently, older versions of pages are overwritten, and pages that have not been updated for a long time may disappear from results altogether. This “temporal rewriting” has profound implications for scholars who rely on search engines as primary data‑collection tools, as well as for theoretical understandings of time and memory in digital environments.

Methodologically, the study conducts a longitudinal experiment using two historically significant search engines—AltaVista and Google. Over a five‑year span (2000‑2005), the authors issue identical keyword queries (e.g., “digital divide,” “climate change,” “e‑learning”) at six‑month intervals. For each query they archive the full result set, including URLs, snippets, and the date metadata displayed by the engine. They then compare successive snapshots to quantify (1) the rate at which URLs vanish, (2) the proportion of newly introduced URLs, and (3) changes in the structural clustering of results (i.e., how topics group together). To verify whether a page’s content truly changed, they cross‑reference the URLs with the Internet Archive’s Wayback Machine.

The empirical findings are striking. Approximately 35 % of the URLs present in an initial snapshot are gone after two years, and more than half have disappeared after three years. Meanwhile, each new snapshot introduces roughly 20–25 % of URLs that were absent in the previous round; half of these “new” entries are only tangentially related to the original query, effectively adding noise. Clustering analysis shows that early‑stage result sets form clear topical communities, but over time the boundaries between clusters blur, inter‑cluster links weaken, and the overall network becomes fragmented. In other words, the engines’ emphasis on freshness not only pushes older pages out of the index but also erodes the semantic architecture that once organized the information.

A crucial observation concerns the date stamps themselves. The study documents numerous cases where a page first published in 2001 is indexed with a 2004 date because the page was edited or merely re‑crawled at that later time. Researchers who cite such a page as evidence of the state of knowledge in 2001 would therefore be misled. The authors argue that search engines function as “digital memory institutions” that selectively preserve, modify, or discard past content, thereby producing multiple, overlapping “presents” rather than a single, stable historical record.

The discussion expands these results to the practice of scholarly research. Because search engines cannot be trusted to provide an accurate temporal snapshot of the web, scholars must complement engine queries with independent archiving strategies—such as using dedicated web archives, running custom crawlers, or storing local copies of retrieved pages. The paper also situates its findings within broader debates on temporality in digital culture, suggesting that traditional historiography’s linear chronology is challenged by a web where the past is continuously rewritten by algorithmic processes.

In conclusion, the authors call for a reconceptualization of search engines: not merely as tools for locating information, but as active agents that co‑construct the temporal dimension of knowledge. They recommend that future work examine a wider range of engines (including non‑English and region‑specific services) and track algorithmic updates to assess how changes in ranking or crawling policies further affect the “rewriting” of the past. Ultimately, the paper warns that without explicit awareness of these dynamics, scholars risk building arguments on a shifting foundation, mistaking the present‑biased output of search engines for a faithful representation of historical web content.

Multiple Presents: How Search Engines Re-write the Past

💡 Research Summary

Comments & Academic Discussion

Leave a Comment