DiSCoKit: An Open-Source Toolkit for Deploying Live LLM Experiences in Survey Research

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Advancing social-scientific research of human-AI interaction dynamics and outcomes often requires researchers to deliver experiences with live large-language models (LLMs) to participants through online survey platforms. However, technical and practical challenges (from logging chat data to manipulating AI behaviors for experimental designs) often inhibit survey-based deployment of AI stimuli. We developed DiSCoKit–an open-source toolkit for deploying live LLM experiences (e.g., ones based on models delivered through Microsoft Azure portal) through JavaScript-enabled survey platforms (e.g., Qualtrics). This paper introduces that toolkit, explaining its scientific impetus, describes its architecture and operation, as well as its deployment possibilities and limitations.

💡 Research Summary

The paper introduces DiSCoKit, an open‑source toolkit that enables researchers to embed live large‑language‑model (LLM) conversational experiences directly into online survey platforms, primarily Qualtrics. The authors motivate the work by highlighting a persistent trade‑off in social‑science research on human‑AI interaction: laboratory studies provide rich, observable behavior but are limited in scale, while traditional surveys reach large samples but lack real‑time interactive stimuli. LLMs, with their stochastic output, further complicate experimental control, making it difficult to deliver consistent, manipulable AI behavior at scale.

DiSCoKit addresses these challenges through a three‑layer architecture. The front‑end component is a block of JavaScript that researchers embed in a Qualtrics question (or any web‑based survey supporting JavaScript). This script extracts the participant’s unique survey ID and any experimental condition assigned by Qualtrics’ randomization logic, then sends this metadata to a back‑end middleware service.

The middleware is a Flask‑based Python application that acts as a daemon for each survey session. Upon receiving a request, it selects a condition‑specific system prompt (the “master prompt” that defines the AI’s persona, tone, and constraints) and the appropriate LLM endpoint. The toolkit’s default implementation uses Azure OpenAI’s serverless endpoints, which are stateless: the middleware must explicitly provide the full conversational context with each API call. This design gives researchers fine‑grained control, allowing them to inject new system prompts at any survey page transition, thereby changing the AI’s behavior mid‑conversation. The middleware can also chain multiple LLM calls, augment responses, or perform post‑processing before returning the text to the front‑end chat window.

All dialogue turns are logged in a relational database (e.g., PostgreSQL). Each record includes a timestamp, the participant’s survey‑session ID, the experimental condition, and the content of both user and AI messages. Researchers can export the data as per‑turn CSV files or as whole‑conversation records, facilitating downstream analysis that links conversational content with traditional survey responses.

The paper details the data workflow through a concrete example: a participant clicks a survey link, receives a random ID, completes consent, and is randomly assigned to condition “3”. When the participant reaches the page containing the chat, Qualtrics passes the ID and condition to DiSCoKit. The middleware constructs the appropriate system prompt, renders the chat iframe, and logs each exchange. If the study design requires the AI to change demeanor after a certain number of turns, a subsequent survey page can trigger a new middleware request with a different system prompt, seamlessly altering the AI’s behavior.

Installation requirements are modest: a Linux or Docker environment with Python 3.12, access to an Azure OpenAI subscription (or another LLM with a compatible API), and a Qualtrics license. Setup involves configuring secret files for API keys, defining condition‑specific JSON prompt files, and deploying the Flask app behind HTTPS. The authors emphasize that non‑technical research teams can accomplish this with standard university IT support.

Limitations are acknowledged. Extending the toolkit to non‑Azure LLM providers requires custom adapters for authentication and request formatting. Because Azure’s endpoints are stateless, any long‑term context must be managed by the middleware, adding complexity for multi‑turn, stateful dialogues. Survey platforms that restrict JavaScript injection or enforce strict Content‑Security‑Policy headers may block the embedded chat. Ethical concerns are also discussed: uncontrolled LLM outputs could produce offensive or misleading content, so researchers should implement response filtering and ensure that logged data are fully anonymized to protect participant privacy.

In conclusion, DiSCoKit provides a practical, scalable solution for integrating live, controllable AI chat experiences into survey‑based research. By combining open‑source code, a clear middleware architecture, and detailed documentation, it lowers the barrier for social scientists to conduct large‑scale, ecologically valid experiments on human‑AI interaction, while preserving experimental control, data integrity, and reproducibility.

DiSCoKit: An Open-Source Toolkit for Deploying Live LLM Experiences in Survey Research

💡 Research Summary

Comments & Academic Discussion

Leave a Comment