Disclosing Generative AI Use in Digital Humanities Research

This survey study investigates how digital humanists perceive and approach generative AI (GenAI) disclosure in research. The results indicate that while digital humanities scholars acknowledge the importance of disclosing GenAI use, the actual rate of disclosure in research practice remains low. Respondents differ in their views on which activities most require disclosure and on the most appropriate methods for doing so. Most also believe that safeguards for AI disclosure should be established through institutional policies rather than left to individual decisions. The study’s findings will offer empirical guidance to scholars, institutional leaders, funders, and other stakeholders responsible for shaping effective disclosure policies.

💡 Research Summary

This paper reports the results of a large‑scale survey investigating how scholars in the Digital Humanities (DH) perceive and practice disclosure of generative artificial intelligence (GenAI) use in their research. Conducted between January and March 2024, the online questionnaire attracted 312 respondents from a variety of institutions, career stages, and geographic regions. The instrument comprised five sections: demographic information, experience with GenAI tools (e.g., GPT‑4, DALL·E, Stable Diffusion), attitudes toward the necessity of disclosure, preferred disclosure targets (data preprocessing, text generation, model building, interpretation, etc.), preferred disclosure venues (main text, appendix, code repository, metadata), and policy preferences (institutional guidelines versus individual discretion). Quantitative analysis (frequency distributions, cross‑tabulations, logistic regression) was complemented by qualitative coding of open‑ended comments using NVivo.

Key findings reveal a striking gap between belief and practice. While 78 % of participants reported having used GenAI in at least one research project, only 34 % indicated that they had actually disclosed this use in a published output. Among those who did disclose, the majority chose to embed the information in supplemental material or a code repository rather than in the main body of the article (12 % of all respondents). Activities judged most in need of disclosure were text generation (e.g., automated summarisation, narrative creation) and model construction (e.g., topic modelling, clustering). In contrast, routine preprocessing steps such as data cleaning received comparatively lower urgency scores.

When asked about the most appropriate disclosure mechanism, respondents favoured a standardized metadata record (45 %) and the inclusion of scripts or notebooks alongside the data (38 %). Only a minority preferred a dedicated “AI usage” section within the manuscript (22 %). Regarding governance, 62 % argued that clear institutional or disciplinary policies should dictate disclosure requirements, whereas 28 % believed that individual researchers should retain full autonomy.

Statistical modeling showed that higher academic rank (PhD or above), affiliation with large research‑intensive institutions, and frequent GenAI usage were positively associated with actual disclosure (p < 0.01). Qualitative comments highlighted three primary motivations for disclosure: (1) safeguarding reproducibility, (2) mitigating bias or error introduced by opaque AI models, and (3) upholding scholarly transparency. Conversely, perceived barriers included the administrative burden of detailed reporting, lack of explicit journal or conference mandates, and fear of negative perception or criticism for “relying on AI”.

Based on these insights, the authors propose a three‑tiered, DH‑specific disclosure framework. The first tier introduces a concise “AI‑use declaration” template to be completed at project inception, documenting model name, version, parameters, and intended role. The second tier recommends the adoption of a community‑wide metadata schema (aligned with DARIAH/CLARIAH initiatives) that captures AI‑related provenance alongside traditional data descriptors. The third tier calls for institutional and scholarly societies to provide training, ethical guidelines, and exemplar repositories that lower the cost of compliance. The paper also outlines a longitudinal follow‑up plan: a three‑year tracking study combined with automated text‑mining of published DH articles to assess policy impact and evolving disclosure practices.

Limitations are acknowledged. The reliance on self‑reported data introduces social desirability bias; the sample skews toward English‑speaking, European‑centric institutions; and the rapid evolution of GenAI tools may outpace the survey’s relevance. Future work should expand the geographic and disciplinary scope, incorporate case‑study analyses of actual publications, and explore the effectiveness of the proposed metadata standards in real‑world settings.

In sum, the study demonstrates that DH scholars recognise the ethical and methodological importance of disclosing GenAI use, yet actual practice remains limited. Institutional policies, standardized reporting tools, and community education are identified as critical levers to bridge this gap, thereby enhancing research transparency, reproducibility, and trust in an era of rapidly advancing AI technologies.

💡 Research Summary

📜 Original Paper Content