The leveled approach. Using and evaluating text mining tools AVResearcherXL and Texcavator for historical research on public perceptions of drugs

We introduce our explorative historical leveled approach that we use to understand drug debates in the Royal Dutch Library's digital newspaper archive. In this approach we alternate between distant re

The leveled approach. Using and evaluating text mining tools   AVResearcherXL and Texcavator for historical research on public perceptions   of drugs

We introduce our explorative historical leveled approach that we use to understand drug debates in the Royal Dutch Library’s digital newspaper archive. In this approach we alternate between distant reading and close reading. Furthermore, we use this approach to evaluate two text mining tools: AVResearcherXL and Texcavator.


💡 Research Summary

The paper presents a methodological framework called the “leveled approach” for conducting historical research on public perceptions of drugs using large‑scale newspaper archives, and it uses this framework to evaluate two contemporary text‑mining platforms: AVResearcherXL and Texcavator. The authors begin by outlining a persistent challenge in digital humanities: how to balance the breadth of distant reading—computational analysis of massive corpora—with the depth of close reading—human interpretation of individual texts. To address this, they design a four‑stage iterative workflow that alternates between the two modes.

In the first stage (Level 1), researchers employ both AVResearcherXL and Texcavator to run keyword queries, generate frequency time series, and produce co‑occurrence networks for the entire digital newspaper collection of the Royal Dutch Library. This “distant reading” phase provides a macro‑level map of when and how drug‑related terms (e.g., “opium,” “cocaine,” “heroin”) appear across decades. In Level 2, a subset of articles identified as salient by the quantitative analysis is examined through close reading. Scholars read the full texts, annotate contextual cues, and assess the rhetorical framing of drug debates (e.g., links to public health, crime, or socioeconomic conditions).

Level 3 is a feedback loop: insights from close reading generate new search terms, refined filters, or alternative conceptual clusters, which are fed back into the distant‑reading tools for a second round of query execution. Level 4 repeats the process, allowing the researcher to iteratively converge on a nuanced narrative that is both data‑driven and interpretively rich. The authors argue that this cyclical structure prevents the “black‑box” problem of distant reading (where patterns are identified without explanation) and the “myopia” problem of close reading (where analysis is limited to a tiny sample).

The paper then turns to a systematic comparison of the two tools. AVResearcherXL excels in handling very large datasets, offering fast query response times, granular metadata filters (year, publisher, geographic region), and robust export options for statistical software. Its limitations include a relatively steep learning curve, a static set of visualizations (mostly line charts and bar graphs), and limited support for custom scripting or plug‑ins, which can hinder more sophisticated network or topic‑model analyses.

Texcavator, by contrast, provides an intuitive web‑based dashboard, interactive word clouds, and network graphs that can be manipulated in real time. Its visual emphasis makes it especially suitable for exploratory phases and for communicating findings to non‑technical audiences. However, Texcavator’s query language is less expressive, its backend indexing is slower for very large corpora, and it offers fewer options for exporting raw data, which can be a bottleneck for downstream statistical work.

Applying the leveled approach to the Dutch newspaper corpus, the authors trace a pronounced surge in drug‑related coverage during the early 1920s. Distant reading shows a sharp increase in the frequency of “cocaine” and “heroin” mentions between 1920 and 1925, with peaks aligning with legislative debates on narcotics control. Close reading of selected articles reveals that journalists framed the drug issue within broader social concerns: urban poverty, public health crises, and the moral panic surrounding “degenerate” youth. Articles from 1923, for instance, link cocaine use to declining worker productivity, while 1927 pieces discuss the upcoming “Opium Law” and its expected impact on colonial trade. These qualitative insights validate and enrich the quantitative patterns uncovered in Level 1.

The authors conclude that the leveled approach successfully bridges the gap between macro‑level pattern detection and micro‑level textual interpretation. Moreover, the comparative evaluation suggests that AVResearcherXL and Texcavator are complementary: AVResearcherXL is preferable when precision, speed, and extensive metadata filtering are paramount; Texcavator is advantageous for rapid visual exploration and stakeholder communication. Selecting a tool should therefore be guided by the specific research question—whether the priority is analytical rigor or exploratory visualization.

Overall, the study contributes a replicable workflow for historians and digital humanists working with massive newspaper archives, demonstrates how to critically assess text‑mining platforms in practice, and offers new historical insight into how early‑20th‑century Dutch society perceived and debated illicit drugs.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...