The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle

The Ultima t e T ut orial for AI-driv en Scale Dev elopment in Gener a tiv e Psy chometrics: Rele asing AIGENIE fr om its Bo ttle Lara Russell-Lasalandra, * † Hudso n Golino, * † Luis Garrido, * ‡ and A lexander Christensen * ¶ † Department of Psychology, U niv ersit y of V irginia, Charlottesville, V A 22903, US A ‡ Department of Psychology, P ontiﬁ cia Uni versi dad Madre y Maestra, Dominican R epublic ¶ Department of Psychology, V anderbilt U niv ersit y, US A *Corresponding author. Email: LLR7CB@V irginia.Edu; hfg9s@virginia.edu; luisgarrido@pucmm.edu.do; alexander.christensen@ vanderbilt.edu Abstr ac t Psychological scale devel opment has traditi onally required extensiv e expert inv olv ement, iterative revi- sio n, and large-scale pilot testing bef ore psychometric evaluation can begin. The AIGENIE R package im- plements the AI-GENI E framew ork (A uto matic Item Generati on with Net w ork-Integrated Evaluati on), which integrates large language model (LLM) text generatio n with network psychometri c methods to automate the early stages of this process. The package generates candi date item pools using LLMs, trans- forms them into high-dimensional embeddings, and applies a multi-step reducti on pipeline— Exploratory Graph Analysis (EGA), Uniqu e V ariable Analysis (UV A), and bootstrap EGA— to produce structurally validated item pools entirely in silic o . This tutorial introduces the package across six parts: install atio n and setup, understanding Application Programming Interfaces (AP Is), text generation, item generatio n, the AIGENIE function, and the GENIE f unctio n. T w o running examples illustrate the package’s use: the Big Five personalit y model (a well-established construct) and AI Anxiet y (an emerging constru ct). The package supports multiple LLM providers (OpenAI, Anthropic, Groq, HuggingFace, and local models), oﬀers a fully oﬄine mode with no external API calls, an d provi des the GENIE() functio n for researchers who wish to apply the psychometric reductio n pipeline to existing item pools regardless of their origin. The AIGENIE package is freely a vailable on R-univ erse at https://laralee.r- univ erse.dev/AIGENI E. Keywords: AIGEN I E, Generati ve Psycho metrics, tutorial, LLMs, AI, EGA, UV A, R package 1. Introduction Large language models (LLMs) ha v e become in valuable tools f or scale dev el opment. Modern LLMs can gen erate extremely high-qualit y text Chakrabart y an d Dhillon, 2026; T engler and Brandhofer, 2025, and crucially, today’s models f unctio n as pow erful, expert-level writing tools straight out of the box. In other w ords, they are pow erful enough to be used as-is without ﬁne-tuning or retraining Carlini et al., 2021. T ext generati on, how ev er, is only one piece of the puzzle. Encoder LLMs are also extremely pow erful tools for psychometric applicatio ns Asudani et al., 2023; T ao et al., 2025, translating the context and meaning of human l anguage into numeric, computer-readable vectors V aswani et al., 2017. Given the considerabl e cost of traditio nal scale dev elopment (see Boateng et al. Boateng et al., 2018 & Cl ark and W atso n Cl ark an d W atson, 2016), researchers ha v e begun 2 Lara Russell-Lasalandra lev eraging LLMs to streamline this process. A growing body of literature demonstrates that LLM- generated items can meet the same qu alit y benchmarks expected of expert-authored items Götz et al., 2024; Hommel et al., 2022; Keane and McNaughton, 2026; Shin et al., 2025, and in some cases even surpass them Martin Ko wal et al., 2025. The present paper details how scale developers and methodologists can harness text genera- tio n models, embedding models, or both to accelerate scale development using an R package called AIGENIE ( A utomatic Item Gener ation with N etwork-Integr ate d Evaluation Russell-Lasalandra et al., 2024). AIGENIE is free f or non-co mmercial purposes, completely open-source, and available on R-univ erse at https://laralee.r- universe.dev/AI GENI E. This package serves an emerging research domain called Generative Psychometrics Garrido et al., 2025; R ussell-Lasalandra et al., 2024; Russell-Lasalandra an d Golino, 2026, in whi ch l anguage itself is treated a s a resource that can be evaluated and assessed algorithmically. For example, Gar- rido et al. Garrido et al., 2025 used LLM-generated item pools and their embeddings to compare Prin cipal Compon ent Analysis ( PCA Pearso n, 1901) with net work-based methods for recov er- ing dimensional structure, demonstrating that psychometric questio ns can be inv estigated entirely through generated text without collecting human responses, and demonstrating the superiorit y of Exploratory Graph An alysis ( EG A H. F. Golino and Epskamp, 2017) with item ﬁltering Russell- Lasal andra et al., 2024 when compared to PCA. In AIGENIE , item content produced by LLMs (or by humans) is subjected to rigorous quantit ativ e evaluatio n prior to any human data collectio n , enabling the development and structural validati on of entire scales in silico . When the in silico structural organiz atio n of items is compared to the structural organi zation of variables obtained from n ati onally representative samples, the best-performing LLM models achiev e a perfect match R ussell-Lasal andra et al., 2024. That is, in silico structural validit y can be equivalent to the stru c- tural validity reco v ered from human respo nse data. This approach subst antially reduces the resource barriers that ha v e lo ng characterized measurement development. The AIGENIE methodology combines optio nal item generati on with net work psychometric techniqu es f or structural v alidati o n. The package uses LLMs to generate large can didate item pools (or accepts an existing pool of human-authored items), embeds them as high-dimensio nal v ectors via LLM embeddings, and then applies a multi-step psychometric pipeline to identif y and remov e redundant or unstable items. This pipelin e includes Exploratory Graph Analysis ( EG A H. F. Golino and Epskamp, 2017) for estimating dimensi onalit y, Uni qu e V ariable Analysis ( UV A Christensen et al., 2023) for detecting item redundan cy, and bootstrap EGA ( bootEG A Christensen and Golino, 2021) f or evaluating the st abilit y of items and dimensi ons within the EGA framew ork. The resulting item pool is a concise, structurally vali dated set ready for empirical testing. The eﬃ cacy of AIGENIE has been demonstrated through multiple l arge-scale Monte Carlo simul atio ns across several LLMs and temperature settings, with results showing consistent improv ements in stru ctural validit y acro ss all con ditio ns R ussell-Lasal andra et al., 2024; R u ssell-Lasal andra an d Golino, 2026. The six steps of the AI-GENI E pipeline are as follo ws (see Figure 1): • Step 0: Generate or W rite Y our Initi al Item P ool. The item reducti o n process must be- gin with a si zable item pool. These initial items can either be generated seamlessly within the package using an LLM or directly supplied by the user. • Step 1: Embed items. Each item is transformed into a high-dimensional numeric vector (an embe dding ) using an encoder LLM. • Step 2: A ssess the initial item pool. An EGA model is run o n the initial item pool to estimate its dimensional structure before any reductio n t akes pl ace. The detected communities (clusters of items) are compared to the kno wn, inten ded structure using N ormalized Mutual Informatio n ( NMI Danon et al., 2005). NMI can be thought of as an accuracy metric; in AIGENIE , NMI is displ ayed as a percent age valu e on a scale from 0–100%, where 0% indicates complete dis- similarit y an d 100% demonstrates perfect communit y detecti on. Higher NMI valu es indicate 3 Figure 1. The six steps of the AIGENI E item pool reduction pipeline. Step 0 (which occurs before item reductio n) is obtain the initial item pool (either by generating items using an LLM or manually writing them). Step 1 is generate the item embeddings using an LLM. Steps 2–6 involv e using network psychometric techniques to whittle do wn the item pool algorithmically. 4 Lara Russell-Lasalandra that the clusters within the EGA n et work better match the kno wn item communiti es. This step establishes a ba seline measure of stru ctural vali dit y. • Step 3: Remov e redundant items. UV A is used iteratively to detect and remov e items with excessiv e semantic o v erlap. UV A identiﬁes redundant item pairs or sets based o n weighted topo- logical ov erlap ( wTO Zhang, Horvath, et al., 2005) within the net work, ret aining only the most uniqu e represent ative from each redun dant cluster. This step repeats until no f urther redun dan- cies are f oun d. • Step 4: Select sparse or full embeddings. An E GA model is run on the f ull embedding matrix and a sparsiﬁed v ersi on of the matrix (in which only the most informativ e embedding dimensions are retained). The embedding t ype that correspo nds t o the EGA n et work with the best NMI is retained for all subsequ ent steps. • Step 5: Find the most st able items. BootEGA is used to a ssess the structural st abilit y of each item. In this step, 100 new embedding matrices are generated by drawing from a multivariate normal distributio n parameterized by the original embedding matrix, and then EGA is applied to each one. If an item is consistently assigned to the same dimension across the 100 resamples, it is consi dered stable; items that frequently shift bet ween communiti es are consi dered unstable and remo v ed. This step repeats until all remaining items demonstrate high stabilit y. • Step 6: Final pool is ready for review. A ﬁnal EGA model is run on the reduced item pool. A ﬁnal NMI is calculated to assess the qualit y of the community detection after item pool reductio n. This ﬁnal NMI can be compared to that of the baseline. The AIGENIE R package is an open-so urce toolkit that integrates artiﬁcial intelligen ce an d n et- w ork psychometri cs. By auto mating labor-intensiv e stages of scal e dev elopment su ch as item writ- ing, redun dancy pruning, an d stru ctural v alidati on, the package redu ces the time an d ﬁnancial bur- den of creating new measurement tools. Users can build a scale from scratch or reﬁne an existing item pool, oﬀ ering a ﬂ exible entry point into the emerging ﬁeld of Gen erativ e Psycho metrics. The tutorial that foll o ws is intended to make ev ery step of this process accessible, regardless of the reader’s prior experi ence with LLMs or n et work psycho metric analysis. 2. T ut orial Overview The present tut orial provides a comprehensiv e, step-by-step gui de to using the AIGENIE R package. This tutorial is organized into six parts that progressively introdu ce the package’s capabilities: • Inst allation and Setup cov ers the package inst allation from R-univ erse, Python environment conﬁgurati on, and management of API-based (recommen ded for most researchers) and local LLMs. • Understanding AP Is (or Application Pr ogr amming Interf ace ) det ails the use of APIs to remotely access v ery pow erful LLMs to generate items an d embeddings. • T ext Gen eratio n introduces the text generation capabilities of the package through the chat() f uncti on, which allo w users to interact with LLMs directly from R. A dditi onally, this sectio n co vers hyperparameter tuning an d deﬁning system rol es. • Item Generatio n demonstrates st andal on e item generati on using the AIGENIE() functio n’s items.only mode, which generates items without running the reduction pipeline. This sec- tio n details the diﬀ erence between the reco mmended “in-built” pro mpt an d the ability to write custom pro mpts. • T he AIGENIE() Functi on w alks through the complete AIGENI E pipelin e fro m start to ﬁnish, ﬁrst with a simple Big Fi v e personalit y J ohn an d Srivastava, 1999 exampl e and then with a no vel, more extended AI Anxi et y construct ( AIA W ang and W ang, 2022) demonstration. • T he GENIE() Functio n provi des users with det ailed examples on ho w they can apply only the psychometric evaluation and reduction pipeline to user-supplied item pools. This capabilit y is 5 for researchers who already ha v e a large set of items that need to be checked for redundancy and stru cturally vali dated. Throughout this tutorial, t wo running examples are used. The ﬁrst is the well-established Big Fiv e personalit y model J ohn and Srivastava, 1999 to illustrate basic f uncti onalit y, as many readers will hav e familiarit y with this personalit y framework. The second is AI Anxiety ( AIA ) to illustrate the package’s utilit y f or dev eloping scales f or n o v el or un derrepresented constru cts. While AIA has receiv ed some gro wing research attenti o n Güv en et al., 2024; Li an d Huang, 2020; X. Liu an d Liu, 2025; W ang and W ang, 2022, it non etheless lacks the robust literature presence of the “Big Fiv e.” In other words, this co nstruct will likely be poorly represented in, or entirely absent fro m, the training data of many LLMs. 3. Inst alla tion and Setup Setting up the AIGENIE package requires several steps: installing the package and its R dependen- cies, conﬁguring the Pytho n backend (which the package uses to communicate with LLMs), an d installing support for f ully local models (if your machine is pow erful an d has enough compute to run LLMs, which is not necessary to use the package). A ltho ugh AIGENIE is an R package, it relies on Python in the background to communi cate with LLMs; many of the soft ware libraries used to call LLM AP Is an d run local models (e.g., the llama-cpp-python inference engine) are developed and maintained primarily in the Python. AIGENIE uses the reticulate R package Ushey et al., 2026 to call Pytho n functio ns directly fro m R, giving users access to the full capabilities of these librari es without ever needing to write or see Python code . T o be clear, AIGENIE oﬀers a seamless, entirely R- native experience. It does, ho w ev er, lev erage the mature Python infrastructure that underlies mu ch of the av ailable modern LLM tooling. Note that, for this section, there are platform-speciﬁc instructi ons for macOS/Linux and W in- do ws. 3.1 Set up the Python Virtual Environment AIGENIE communicates with LLMs through a Python backend. This Python environment is ac- cessed seamlessly from R via the reticulate package Ushey et al., 2026. T o manage the Python virtual enviro nment, the package uses uv Astral, 2024, a f ast, Rust-based utilit y that the reticulate ecosystem relies on to create virtual en viro nments. Once the uv utility is successfully installed, close R (if running) and fully rest art y our machin e. 3.1.1 MacOS/Linux Users MacOS users n eed Apple’s Comman d Line T ools to co mpile certain dependen cies (e.g., uv utilit y). If running macOS ( not Linux), open the T ermin al app and run the f ollo wing co mmand: x c o d e − s e l e c t − − i n s t a l l Note that Linux users shoul d skip this particular co mmand, as the necessary build tools are t ypically pre-installed or available thro ugh the system package manager. If already inst alled, a message beginning with C ommand line to ols are already installed will appear. Otherwise, a diﬀerent dialog will appear prompting you to inst all the tools. Follo w the on-screen instructi ons. Next, install the uv utilit y on y our sy stem. In the T ermin al , run the f ollo wing comman d: c u r l − L s S f h t t p s : / / a s t r a l . s h / u v / i n s t a l l . s h | s h If successf ul, yo u should see a message ending in everything’s installe d! It sho uld not say permission denie d anywhere in the output. If y ou see a permissi on error, run the f ollo wing co mmands in y our T ermin al : 6 Lara Russell-Lasalandra echo ' export PATH="$HOME/.local/bin:$PATH" ' >> ~/.zshrc source ~/.zshrc Then, retry the curl -LsSf https://astral.sh/uv/install.sh | sh command. 3.1.2 Windows Users If on a W in do ws computer, y ou will need to open the pre-inst alled P o w erS hell applicati on to install uv . In a Po w erS hell windo w, run the f ollo wing co mmand: i r m h t t p s : / / a s t r a l . s h / u v / i n s t a l l . p s 1 | i e x If successf ul, yo u should see a message ending in everything’s installed! For f urther troubleshooting, see the uv install ati on documentation. 3.2 Install P ackage and P ack age Dependencies Before downl oading the AIGENIE package, all users should close R (if running) an d f ully rest art their computer to ensure that R can locate the uv installation. W e recommend inst alling all package dependencies explicitly before inst alling AIGENIE itself. In an R script, run the foll o wing lines of code: install.packages("reticulate") install.packages("ggplot2") install.packages("igraph") install.packages("patchwork") install.packages("tm") install.packages("R.utils") install.packages("jsonlite") install.packages("EGAnet") W ith the dependencies and uv utilit y in pl ace, y ou can install the AIGENIE package. The package is a vailable on R-univ erse, which provi des pre-built binaries for all major operating systems. T o grab the package from the R-univ erse, run the f ollo wing: # Install AIGENIE from R-universe install.packages( "AIGENIE", repos = c("https://laralee.r-universe.dev", "https://cloud.r-project.org") ) 3.3 Conﬁrm UV Installation Once AIGENIE is inst alled, y o u can load the library an d conﬁrm the status of y our uv inst allatio n: library(AIGENIE) check_installation <- python_env_info() check_installation[["uv_available"]] # This should be TRUE If uv_available returns FALSE despite a successf ul inst allation, R may not be able to ﬁnd uv on y our sy stem’s P A TH. If running macOS, open the T erminal and run the f ollo wing: PATH="/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:$HOME/.local/bin: $HOME/.cargo/bin:$PATH" 7 If running W indo ws, open Po w erS hell and execute this command ( replace with your c omputer Admin username ): [Environment]::SetEnvironmentVariable("PATH", $env:PATH + ";C:\Users\ \.local\bin", "User") After applying either P A TH ﬁx, y ou must close your R script and rest art your machine (not just restart the R session) f or the changes to take eﬀect. 3.3.1 Python Environment Set up The ﬁrst time you call any AIGENIE f uncti on that requires Python, the package will automatically create a dedi cated Pytho n virtual en vironment an d install the n ecessary depen dencies. This process t ypically t akes 2–3 minutes on a st andard internet connecti on and only needs to happen once. T o setup the environment explicitly, run the ensure_aigenie_python() f uncti on. Y o u can also use this f unctio n to customize the inst allation, if necessary. T o reinstall the Python environment from scratch if y ou suspect so mething is awry, use the reinstall_python _env() f unctio n. 3.4 (Optionally) Install L ocal Model Support If running LLMs on your o wn machine instead of relying o n API service provi ders, you must inst all additio nal local model support. This support is required f or the local_AIGENIE() , local_GENIE() , and local_chat() f uncti o ns. Ho wev er, running models locally is neither necessary nor recommen ded f or most users. R e- motely accessing LLMs from providers like OpenAI and Anthropic will ensure access to fronti er models that are substantially more pow erful than the ones that can feasibly run on a personal com- puter. Models that run comfortably o n t ypical machines (t ypically 7–13 billio n parameters) are consi derably less capable, and their generated items will generally be of low er qu alit y. R unning larger, more capable models l ocally (e.g., 70 billi on parameters or abo v e) deman ds signiﬁ cant com- putational resources (e.g., a high-end GPU). Local models are best suited for situations where data privacy requirements prohibit sen ding item content to external serv ers. On macOS and Lin ux, local model support can be inst alled directly from within R by running the install_local_llm_support() f uncti on. For Apple Silicon Macs, this f uncti on aut omat- ically compiles llama-cpp-python with Met al accelerati on for GPU-accelerated inferen ce. For users with NVI DIA GPUs, additionally run install_gpu_support() to install CUD A-enabled PyT orch for accelerated embedding generatio n. W in do ws users need one prerequisite before this f uncti on will w ork: the Visual Stu dio C++ Buil d T ools from Microsoft. W ith local support inst alled, y ou can download GGUF-format models from HuggingFace using the get_local_llm() f uncti on. For example, to download the Mistral 7B Instruct model capable of text generati on, run the f ollo wing lin es of code # The function returns the path where the model was saved model_path <- get_local_llm( repo_id = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF", filename = "mistral-7b-instruct-v0.2.Q4_K_M.gguf" ) The message model downloaded successfully will appear in the console to in dicate that the model has been downl oaded to your machine. Y o u can also verif y the inst all by running check_local_llm_se tup(model_pa th) . 8 Lara Russell-Lasalandra 4. Unders tanding APIs An Applicatio n Programming Interf ace (API) is a set of protocols that allow machines to commu- nicate (see A uger et al. A uger and Saroyan, 2024). When AIGENIE generates items or embeddings using a clou d-based model, it does so by sen ding a request to the model provi der’s AP I. For example, AIGENIE coul d ask OpenAI’s servers to run GPT-4o o n a given prompt. Then, the AP I returns the model output to y our computer. The user never interacts with the AP I directly; the package handl es all communicati o n implicitly behin d the scenes. T o authenticate these requ ests, AP I pro vi ders require an A P I key : a uniqu e string of characters that functio ns simultaneously as a passw ord an d an I D. When AIGENIE sends a request to, say, OpenAI’s servers, it includes your API key so that the server can verif y your identit y, check that y ou ha v e permissio n to use the requ ested model, and log the u sage for billing purpo ses. 4.1 Best Practices f or Using API Ke ys There are sev eral important practices to keep in mind when w orking with AP I keys. 1. Sav e y our key immedi ately since provi ders will only displ ay your key once . If y ou n a vigate aw ay witho ut copying it, y ou will n eed to generate a n ew on e. 2. If you lose y our key or suspect it ma y hav e been stolen, y ou can easily create a new key. Just remember to rev oke (delete) y our ol d key. 3. Never share y o ur key or include it in code that others can see (e.g., publi c GitHub repo si- tories). Anyo ne with y o ur key can make requests that will be billed t o y our account. 4. S et a spending limit if possible. A spen ding cap protects against un expected charges in the event that a key is compromised. 5. Be aware of rate limits. Each provi der imposes limits on how frequently yo u can send requests within a giv en time win do w, whi ch is tied to y o ur AP I key. If yo u exceed these limits, the AP I will temporarily reject y our requests. 6. Note that AP I usage is tracked in tokens , not words or characters. A token is a short w ord or word fragment. As a rule of thumb, on e token is equivalent to roughly three-quarters of a w ord OpenAI, 2024b. S o, one million tokens is equivalent to about 750,000 words. Pro vi ders t ypically charge separately for input tokens (the prompt you send) and output tokens (the text the model generates), with output t okens being priced higher. 4.2 Supported API Pro viders The AIGENIE package supports API keys fro m ﬁv e pro viders. • OpenAI pro vides access to the GPT-f amily models (e.g., GPT-4o, GPT-5.1) for text generation. OpenAI also has excellent embedding models (recommen ded). The default embedding model in the package is their text-embedding-3-small model. A valid payment method is required to use an OpenAI API key, though costs for t ypical AIGENIE usage are extr emel y minimal (think cents, not dollars). • Anthropic provi des access to the Clau de-family models for text generati on. Anthropic requires that users prepay for API credits (minimum $5). By default, Anthropic will not ov ercharge y ou— on ce your prepaid credits are exhausted, requests will stop until you purchase more. How ev er, $5 should be more than pl ent y for t ypical AIGENIE usage. • Groq pro vides access to several open-source models (e.g., Ll ama, Mixtral, Gemma) for text gen- eratio n. Groq created the Language Processing Unit (LPU Groq, n.d.), which is a hardware technol ogy that enables extremely fast text generati on. In f act, text generatio n is so eﬃcient that lo w to moderate usage is co mpletely free . 9 • HuggingFace is an open-source repository hosting thousands of LLMs for text generation and embedding. On HuggingFace, API Keys are called tokens (n ot to be conf used with the tokens that represent small w ords or w ord fragments). • Jin a AI provi des access to open-source embedding models. The free tier cov ers the ﬁrst 10 millio n tokens. 4.3 Disco vering A vailable Models T o see which models yo u hav e access to based on yo ur API keys, use the list_available_models() f uncti on. For example, say y ou ha ve v alid OpenAI, Groq, and Anthropic AP I keys. The follo wing code chunk could be used t o determine whi ch models y our keys giv e y o u access to: # List all available models across all providers list_available_models( openai.API = "your-key", # ADD A VALID KEY! groq.API = "your-key", # ADD A VALID KEY! anthropic.API = "your-key" # ADD A VALID KEY! ) # Filter by provider list_available_models(provider = "groq", # shows only Groq models groq.API = groq.API) # Filter by type (text generation models vs. embedding models) list_available_models(type = "chat", # text models available openai.API = openai.API, groq.API = groq.API, anthropic.API = anthropic.API) list_available_models(type = "embedding", # embedding models available openai.API = openai.API, groq.API = groq.API, anthropic.API = anthropic.API) 5. T ex t Gener ation Before diving into ho w to run the reduction pipeline on AI-generated (or human-written) items, w e will ﬁrst outlin e simple text generati o n. The chat() functions pro vi de a straightf orward inter- face f or sending prompts to langu age models and receiving responses. These f unctio ns are usef ul for exploratory work or prompt testing. Throughout this section, the chat() f unctio n is used; ho w- ev er, the local_chat() could just as easily been used to demo nstrate the same i deas with a locally installed model. As a simple example, let’s say we wanted to see what kind of items a an LLM w oul d produce giv en minimal instructions to establish a sort of baseline. More speciﬁcally, we want the model generate items that t arget conscienti ou sness from the Big Five model of personalit y. A bare-bones comman d might be "g ener ate 5 items measuring c onscientiousness f or a personality scale." The follo wing code chunk and associated output sho ws GPT-4o’s respo nse to this prompt: # Basic usage with OpenAI response <- chat( prompts = "Generate 5 items measuring conscientiousness for a personality scale.", 10 Lara Russell-Lasalandra model = "gpt-4o", openai.API = "REDACTED" # the API key was removed ) # See the response response$response [ 1 ] T o m e a s u r e c o n s c i e n t i o u s n e s s o n a p e r s o n a l i t y s c a l e , y o u m i g h t c o n s i d e r i t e m s t h a t a s s e s s t r a i t s s u c h a s o r g a n i z a t i o n , d e p e n d a b i l i t y , d i l i g e n c e , a n d a t t e n t i o n t o d e t a i l . H e r e a r e f i v e e x a m p l e i t e m s : \ n \ n 1 . I p l a n m y t a s k s c a r e f u l l y a n d s t i c k t o m y s c h e d u l e t o e n s u r e e v e r y t h i n g g e t s d o n e o n t i m e . \ n 2 . I p a y c l o s e a t t e n t i o n t o d e t a i l s a n d m a k e f e w m i s t a k e s i n m y w o r k . \ n 3 . I a m a l w a y s p r e p a r e d a n d r a r e l y f i n d m y s e l f s c r a m b l i n g a t t h e l a s t m i n u t e . \ n 4 . I f o l l o w t h r o u g h o n m y c o m m i t m e n t s a n d c a n b e r e l i e d u p o n t o m e e t d e a d l i n e s . \ n 5 . I k e e p m y p e r s o n a l a n d w o r k s p a c e s o r g a n i z e d a n d t i d y . \ n \ n E a c h i t e m c a n b e r a t e d u s i n g a L i k e r t s c a l e , s u c h a s 1 ( S t r o n g l y D i s a g r e e ) t o 5 ( S t r o n g l y A g r e e ) , t o a s s e s s t h e l e v e l o f c o n s c i e n t i o u s n e s s i n i n d i v i d u a l s . 5.1 T op p and T emperature T wo parameters that appear throughout the package— temperature and top p — inﬂuence ho w an LLM selects its next token during text generati on. They directly aﬀect the div ersit y an d cre- ativit y of the items the model produces. In AIGENIE , both parameters def ault to 1.0, meaning no additio nal constraints are imposed beyon d the model’s o wn learn ed distributio n. In this tutorial, w e will demonstrate how output is aﬀected by changing both the temper ature and top p from their defaults. How ev er, whenever a change is made to one of the t w o parameters, the other is left un- changed. In practice, it is commo n to adjust one parameter while lea ving the other at its def ault rather than tuning both simultaneously.OpenAI, 2024a 5.1.1 T emperatur e LLMs generate text on e token at a time. At each step, the model assigns a probabilit y to ev ery po s- sible n ext token in its vocabulary. A model’s temperature value rescales this probabilit y distributio n before a token is sampled. Most models hav e a def ault temperature of 1.0; at this temperature, the model samples directly from its learned probabilities. L ow ering the temperature sharpens the dis- tributio n, making high-probabilit y tokens ev en more likely to be chosen. Doing so produ ces more predictable, repetitive, and less complex output. Raising the temperature ﬂattens the distribution, giving lo w er-probabilit y tokens a better chance of being selected and producing more varied (but potentially less c oherent ) output. The eﬀect of temperature on output qualit y (if there even is an eﬀect Patel et al., 2024) is not straight f orward and depends on the context as well a s the speciﬁc model R enze, 2024; Zhu et al., 2024. T emperature can be though of as a "creativit y knob," where t asks that require higher creativit y may ben eﬁt from higher temperatures (though, ev en this relati onship to creativity is not alwa ys a giv en Peeperkorn et al., 2024). For item generation within AIGENIE , simulation results R ussell-Lasal andra and Golino, 2026 show ed that AI-GENI E reliably improv ed structural validit y across all temperature settings (0.5, 1.0, an d 1.5), though the relati onship between temperature an d item qualit y was not alwa ys straightf orward an d depended o n the model used. In the exampl e abov e where GPT-4o was asked to g ener ate 5 items measuring c onscientiousness f or a personality scale , the temperature was left at the def ault of 1. How ev er, w e can modif y the opti onal 11 temperature argument to see ho w temperature tuning inﬂuences the output. Let’s increase the temperature to 1.5 and observ e its eﬀ ects: # Generate a response to the question at a higher temperature response_high_temp <- chat( prompts = "Generate 5 items measuring conscientiousness for a personality scale.", model = "gpt-4o", openai.API = "REDACTED", # API key removed temperature = 1.5 # Increased to a HIGHER temp ) # View the results response_high_temp$response [ 1 ] S u r e , h e r e a r e f i v e i t e m s d e s i g n e d t o m e a s u r e c o n s c i e n t i o u s n e s s f o r a p e r s o n a l i t y s c a l e : \ n \ n 1 . * * D u t i f u l n e s s * * \ n - I c o n s i s t e n t l y m e e t m y o b l i g a t i o n s e v e n i f t h e y f e e l i n c o n v e n i e n t o r c h a l l e n g i n g . \ n \ n 2 . * * O r d e r l i n e s s * * \ n - I p r e f e r t o h a v e a w e l l - o r g a n i z e d e n v i r o n m e n t a n d c o n s i s t e n t l y m a i n t a i n o r d e r i n m y d a i l y t a s k s a n d w o r k s p a c e . \ n \ n 3 . * * D e p e n d a b i l i t y * * \ n - P e o p l e c a n c o u n t o n m e t o f o l l o w t h r o u g h o n c o m m i t m e n t s w i t h o u t n e e d i n g r e m i n d e r s . \ n \ n 4 . * * A t t e n t i o n t o D e t a i l * * \ n - I p a y c l o s e a t t e n t i o n t o d e t a i l a n d t a k e c a r e t o e n s u r e a c c u r a c y i n e v e r y t h i n g I d o . \ n \ n 5 . * * S e l f - D i s c i p l i n e * * \ n - I e n s u r e t o f i n i s h t a s k s I s t a r t , e v e n i f t h e y a r e d u l l , r a t h e r t h a n a l l o w i n g d i s t r a c t i o n s t o u n d e r m i n e m y p r o d u c t i v i t y . \ n \ n T h e s e i t e m s a s s e s s v a r i o u s a s p e c t s o f c o n s c i e n t i o u s n e s s , s u c h a s d u t i f u l n e s s , o r d e r l i n e s s , d e p e n d a b i l i t y , m e t i c u l o u s n e s s , a n d s e l f - d i s c i p l i n e . In general, the items in the output produced by the higher temperature GPT-4o model are more lexically complex and v erbose than the items in the output using the default model. For exampl e, the item I keep my personal and work spaces org anize d and tidy came from the default temperature model, whereas the item I pref er to have a well-org anized envir onment and consistentl y maintain order in my dail y tasks and workspace came from the higher-temperature model. W e can also decrease the temperature argument to a value belo w 1 to in v estigate ho w a lo w er temperature setting may impact the o utput: # Generate a response to the question at a lower temperature response_low_temp <- chat( prompts = "Generate 5 items measuring conscientiousness for a personality scale.", model = "gpt-4o", openai.API = "REDACTED", # API key removed temperature = 0.5 # Decreased to a LOWER temp ) # View the results response_low_temp$response 12 Lara Russell-Lasalandra [ 1 ] C e r t a i n l y ! H e r e a r e f i v e i t e m s d e s i g n e d t o m e a s u r e c o n s c i e n t i o u s n e s s f o r a p e r s o n a l i t y s c a l e : \ n \ n 1 . * * I a m m e t i c u l o u s a n d p a y a t t e n t i o n t o d e t a i l s i n m y w o r k a n d p e r s o n a l t a s k s . * * \ n - S t r o n g l y D i s a g r e e \ n - D i s a g r e e \ n - N e u t r a l \ n - A g r e e \ n - S t r o n g l y A g r e e \ n \ n 2 . * * I o f t e n p l a n m y a c t i v i t i e s i n a d v a n c e a n d f o l l o w a s t r u c t u r e d s c h e d u l e . * * \ n - S t r o n g l y D i s a g r e e \ n - D i s a g r e e \ n - N e u t r a l \ n - A g r e e \ n - S t r o n g l y A g r e e \ n \ n 3 . * * I a m r e l i a b l e a n d c a n b e c o u n t e d o n t o f u l f i l l m y c o m m i t m e n t s a n d r e s p o n s i b i l i t i e s . * * \ n - S t r o n g l y D i s a g r e e \ n - D i s a g r e e \ n - N e u t r a l \ n - A g r e e \ n - S t r o n g l y A g r e e \ n \ n 4 . * * I p r e f e r t o k e e p m y l i v i n g a n d w o r k i n g s p a c e s o r g a n i z e d a n d t i d y . * * \ n - S t r o n g l y D i s a g r e e \ n - D i s a g r e e \ n - N e u t r a l \ n - A g r e e \ n - S t r o n g l y A g r e e \ n \ n 5 . * * I a m p e r s i s t e n t a n d w o r k d i l i g e n t l y t o a c h i e v e m y g o a l s , e v e n w h e n f a c e d w i t h o b s t a c l e s . * * \ n - S t r o n g l y D i s a g r e e \ n - D i s a g r e e \ n - N e u t r a l \ n - A g r e e \ n - S t r o n g l y A g r e e \ n \ n T h e s e i t e m s a r e d e s i g n e d t o a s s e s s v a r i o u s a s p e c t s o f c o n s c i e n t i o u s n e s s , s u c h a s a t t e n t i o n t o d e t a i l , o r g a n i z a t i o n , r e l i a b i l i t y , p l a n n i n g , a n d p e r s i s t e n c e . In this parti cular example, the items fro m the lo wer temperature model are fairly similar to that of the default temperature model. In fact, the item I pref er to keep my living and working spaces organize d and tidy from the lo wer-temperature model is almost identical to the item I keep my personal and work spaces org anize d and tidy fro m the default temperature model. Ho wev er, the eﬀects of temperature tuning may not be obvi ous across a single inst ance; in creas- ing the reps argument (which controls the number of times the model respo nds to the provided prompt) fro m the default of 1 t o 10 or 50 might rev eal so me systemati c diﬀ erences that are n ot im- mediately obvious in a single output. Altho ugh, model hyperparameter tuning is a far less import ant than model selection Evstafev, 2025. That is, determining whether GPT-4o is the best LLM for the ta sk is paramount; then, determining whi ch temperature setting of GPT-4o should be u sed can be consi dered. 5.1.2 T op p T op p (also call ed n ucl eus sampling) simply truncates the next-t oken probability distributio n rather than rescale it Holtzman et al., 2019. The model considers only the smallest set of tokens whose cumulative probabilit y exceeds the threshold p , and sampl es from that subset. For example, a top p of 0.9 means the model considers only the most probable tokens that together account for 90% of the probabilit y mass. The remaining 10% of unlikely tokens are compl etely discarded. A top p less than the default of 1 restricts the model to a narrow er set of high-probabilit y tokens. Thus, the def ault setting of AIGENIE includes the LLM’s entire token vocabulary and eﬀectiv ely disables nu cleus sampling. Consider the same example from abov e: we want the GPT-4o model to generate items that target conscienti ou sness given a bare-bones prompt (i.e., gener ate 5 items measuring conscientiousness f or a personality scale ). L et’s decrease the top p parameter to 0.5: # Generate a response to the question at a lower top p response_low_top_p <- chat( prompts = "Generate 5 items measuring conscientiousness for a personality scale.", model = "gpt-4o", 13 openai.API = "REDACTED", # API key removed top.p = 0.5 # Decreased to a LOWER top p value ) # View the results response_low_top_p$response [ 1 ] C e r t a i n l y ! H e r e a r e f i v e i t e m s d e s i g n e d t o m e a s u r e c o n s c i e n t i o u s n e s s f o r a p e r s o n a l i t y s c a l e : \ n \ n 1 . * * T a s k C o m p l e t i o n * * : I a l w a y s c o m p l e t e m y t a s k s t h o r o u g h l y a n d o n t i m e , e v e n w h e n t h e y a r e c h a l l e n g i n g o r t e d i o u s . \ n \ n 2 . * * O r g a n i z a t i o n * * : I k e e p m y w o r k s p a c e a n d p e r s o n a l a r e a s o r g a n i z e d a n d t i d y , e n s u r i n g e v e r y t h i n g i s i n i t s p r o p e r p l a c e . \ n \ n 3 . * * A t t e n t i o n t o D e t a i l * * : I p a y c l o s e a t t e n t i o n t o d e t a i l s a n d s t r i v e f o r a c c u r a c y i n e v e r y t h i n g I d o , a v o i d i n g c a r e l e s s m i s t a k e s . \ n \ n 4 . * * D e p e n d a b i l i t y * * : O t h e r s c a n r e l y o n m e t o f u l f i l l m y c o m m i t m e n t s a n d r e s p o n s i b i l i t i e s c o n s i s t e n t l y . \ n \ n 5 . * * G o a l - O r i e n t e d * * : I s e t c l e a r g o a l s f o r m y s e l f a n d w o r k d i l i g e n t l y t o a c h i e v e t h e m , e v e n w h e n i t r e q u i r e s s u s t a i n e d e f f o r t o v e r t i m e . \ n \ n T h e s e i t e m s a i m t o c a p t u r e v a r i o u s a s p e c t s o f c o n s c i e n t i o u s n e s s , s u c h a s r e l i a b i l i t y , o r g a n i z a t i o n , d i l i g e n c e , a n d a t t e n t i o n t o d e t a i l . These items, when compared to those generated with the def ault top p setting, exhibit a no- ticeably more uniform sentence structure. Each item begins with an independent cl ause and ends with a cl arif ying subordinate phrase (e.g., I set clear g oals f or myself and work dilig entl y to achieve them, even when it requires sustained e ort over time ). By contrast, when top p was left at the def ault value of 1.0, the syntactic stru cture of the generated items vari ed more naturally. For instance, the item I keep my personal and work spaces org anize d and tidy cont ains a single verb phra se, wherea s I plan my tasks caref ully and stick to my sche dule to ensur e everything g ets done on time chains multipl e v erb phra ses together. 5.2 Deﬁning a System Role U ntil this point in the tutorial, we’v e only experimented with changes to the user prompt . The user prompt, or the prompt that most people are readily f amiliar with, is the actu al request or instruc- tio n. How ev er, an other t ype of input is a system rol e pro mpt. A system rol e is a set of backgro und instructi ons that shape how the model beha v es Chen et al., 2025. It f unctio ns like an identit y or persona that the LLM should embody as it gen erates the output. For example, a system role might be something like you are an expert psychometrician and test developer spe cializing in personality assessment . This instructi on tells the model t o approach item gen- eratio n from the perspectiv e of a domain expert. This primes the model toward domain-co nsistent language, prof essio nal tone that a gen eric prompt alon e might n ot elicit A. Liu et al., 2024. Research on the eﬃcacy of persona prompting has shown that a ssigning a role to a model can improv e the qualit y an d coherence of its output, particularly f or t asks that beneﬁt fro m domain-speciﬁ c framing De Paoli, 2023; Jiang et al., 2024; Kong et al., 2024. How ev er, in so me contexts, the beneﬁts tend to only increase when the assigned rol e is w ell-align ed with the task domain Zheng et al., 2024. And, sometimes, adding a system role may hav e unforeseen limitations Hu et al., 2026. How ev er, in the context of item generati on, persona prompting will very likely only hav e a positiv e or negligible eﬀect o n output. T o add a system role, do the f ollo wing: # Define a helpful model persona 14 Lara Russell-Lasalandra system.role <- "You are an expert psychometrician who specializes in personality measurement. You know how to write clear, concise items that are robust." # Generate a response to the question with a system role response_system_role <- chat( prompts = "Generate 5 items measuring conscientiousness for a personality scale.", model = "gpt-4o", openai.API = "REDACTED", # API key removed system.role = system.role # System role prompt provided ) # View the results response_system_role$response [ 1 ] C e r t a i n l y ! H e r e a r e f i v e i t e m s d e s i g n e d t o m e a s u r e c o n s c i e n t i o u s n e s s f o r a p e r s o n a l i t y s c a l e . P l e a s e r a t e e a c h i t e m o n a s c a l e f r o m 1 ( S t r o n g l y D i s a g r e e ) t o 5 ( S t r o n g l y A g r e e ) : \ n \ n 1 . I a m m e t i c u l o u s i n m y a p p r o a c h t o o r g a n i z i n g a n d c o m p l e t i n g t a s k s . \ n 2 . I p l a n m y a c t i v i t i e s i n a d v a n c e t o e n s u r e t h a t I m e e t d e a d l i n e s c o n s i s t e n t l y . \ n 3 . I p a y c l o s e a t t e n t i o n t o d e t a i l s i n b o t h m y p e r s o n a l a n d p r o f e s s i o n a l l i f e . \ n 4 . I o f t e n s e t g o a l s f o r m y s e l f a n d w o r k d i l i g e n t l y t o a c h i e v e t h e m . \ n 5 . I h o l d m y s e l f a c c o u n t a b l e f o r c o m p l e t i n g t a s k s t o t h e b e s t o f m y a b i l i t y . W ith a system role, delivers the items directly, using more precise, psychometrically conv enti onal language (e.g., I am meticulous in my appr oach to org anizing and c ompleting tasks rather than I pay close attention to details and make f ew mistakes in my work ). The sy stem role shifts the model’s posture from educator expl aining the t ask to pr actitioner executing the task , producing output that is closer to the register expected of a ﬁnished assessment instrument. 6. Item Generation One of the core capabilities of the AIGENIE package is automated item generati on. Giv en a set of constru cts and their attributes, the package prompts an LLM to generate n o vel candidate items. This sectio n demonstrates ho w to generate items without running the f ull psychometric pipelin e, using the items.only = TRUE ﬂag within either the AIGENIE() or its local equiv alent local_AIGENIE() f uncti ons. Generating items in isolation is useful for exploring what the model produces and reﬁning prompts iterativ ely. 6.1 The item.attribut es Par amet er The single most import ant parameter in the AIGENIE() f uncti on is item.attributes . This pa- rameter is a named list in which each el ement represents an item type (i.e., a construct or dimensi on of interest), and the character vector within each element speciﬁes the attributes . Attributes are the speciﬁc facets, themes, or content areas that the generated items shoul d collectiv ely co v er. Consider the ﬁve traits within the "Big Fiv e" personalit y model J ohn and Sriv astava, 1999 (open- ness to experi ence, conscio usness, extrav ersi on, agreeableness, and neuroticism). Each personalit y trait en compasses several beha vi ors; f or example, so meon e who is exhibits high l evels of openness to experien ce may be very artistic, but that person could just a s ea sily enjoy philosophi cal discussion. 15 Being "creative" and "philosophical," therefore, describe t wo valid manifestatio ns of the same per- sonalit y trait. Building on this idea of deﬁning trait manif estations for each of the ﬁ ve traits, w e can create the foll o wing item.attributes object: big5_attributes <- list( openness = c("creative", "perceptual", "curious", "philosophical"), conscientiousness = c("organized", "responsible", "disciplined", "prudent"), extraversion = c("friendly", "positive", "assertive", "energetic"), agreeableness = c("cooperative", "compassionate", "trustworthy", "humble"), neuroticism = c("anxious", "depressed", "insecure", "emotional") ) Here, the ﬁve names of the list (openness, consci entio usness, etc.) are the item t ypes, and the char- acter v ectors within each are the attributes. The model will be instructed to generate items that target each of these attributes, producing items across the full breadth of each construct. It should be noted that attributes do not need to correspon d to formal subscales or established subdomains. They simply represent the range of content that the items, as a whole, should cov er. If a researcher is dev eloping a uni dimensio nal scales shoul d not think of attributes as separate subscal es; they reﬂect the distinct themati c f acets that a comprehensiv e set of a giv en t ype of item should span. Including attributes ensures that the LLM does n ot gen erate items that cluster aroun d a singl e narrow aspect of the construct while neglecting others. In AIGENIE , the model is instructed to produce items f or each attribute, which results in a more co ntent-div erse initial pool. A dditio nally, while the attributes within this tutorial are one word, attributes can just as well be richer, more v erbose phrases if prudent. The the n umber of attributes per item t ype through this tutorial are consistent (e.g., there are four attributes per OCEAN personalit y trait). Ho wev er, the number of attributes per item t ype can vary (e.g., openness could hav e eight attributes wherea s neur oticism co uld ha v e only three), so l ong as there are at least t w o f or any giv en item t ype. 6.2 Generating Items within the AIGENIE F unction Generating items using the AIGENIE() or local_AIGENIE() f uncti on oﬀers t wo distinct modes for how pro mpts are constructed an d sent to the model. The choi ce bet w een them determines ho w much co ntrol the user has o ver the exact w ording of the instru cti ons the LLM receiv es. 6.2.1 The In-Built Prompt In the built-in prompting mode (the default), the user provi des as many descriptive components a s possible and the package auto matically assembles them into a complete, w ell-stru ctured pro mpt be- hind the scen es (See Figure 2). These descriptiv e co mpon ents are analogo us bri cks; the AIGENIE() f uncti on arranges, aggregates, an d assembles these "bri cks" into a stable, coherent "stru cture." This "structure" is the primary user prompt passed to the LLM. Using this in-built prompt is recom- mended for most researchers because it incorporates the prompt engineering strategies shown to produce high-qualit y items in simulation studi es Russell-Lasalandra et al., 2024, and it a v oi ds the risk of omitting critical pro mpt compo nents that can degrade o utput qualit y. The descriptive compon ents (or "bricks") that the user can pro vide in the built-in prompting mode are all optional , but users should pro vi de as many as possible . These compon ents are as foll o ws: 16 Lara Russell-Lasalandra Figure 2. When using AIGENI E in the in-built prompt mode, the f uncti on takes the user-speciﬁed compon ents (e.g., domain, prompt.n otes, item.t ype.deﬁnitio ns, or scale.title) an d uses them to build a stro ng prompt auto matically. • domain speciﬁes the research domain (e.g., "personality measurement" or "child development" ). • scale.title pro vides the names of the scale being dev el oped. • audience describes the scale’s intended t arget population (e.g., "college-educated adults" or "children with ASD in second grade" ). • item.type.definitions is a named list pro vi ding a bri ef deﬁniti on of each item type, giving the model substantive context abo ut the constru ct. • response.options speciﬁes the intended response optio ns for the items (e.g., c("disagree", "neutral", "agree ") ). • item.examples is a dat a frame containing high-qualit y example items to gui de the model’s generati on st yle an d f ormat. • prompt.notes allo ws the user to append custom instructio ns to the automatically constructed prompt without having to write the entire prompt from scratch (e.g., "ensure every item begins with the stem ’I am someone who’..." or "items should be very brief and contain no words that would excee d a ﬁfth-grade vocabulary" ). That is, the user can inject speciﬁc requirements or constraints while still beneﬁting from the package’s built-in prompt engineering. If the built-in pro mpt handl es most of what yo u need and only a small adjustment is required, prompt.notes is the right tool f or the job. Previ ously, we deﬁned the big5_attributes object to demonstrate the item.attributes parameter. Now that the parameters pert aining to the in-built prompt hav e been deﬁned, we can write code to generate items using the AIGENIE() function: # First.. define the important item.attributes object big5_attributes <- list( openness = c("creative", "perceptual", "curious", "philosophical"), conscientiousness = c("organized", "responsible", "disciplined", "prudent"), extraversion = c("friendly", "positive", "assertive", "energetic"), 17 agreeableness = c("cooperative", "compassionate", "trustworthy", "humble"), neuroticism = c("anxious", "depressed", "insecure", "emotional") ) # Generate items using the built-in prompt items_builtin <- AIGENIE( item.attributes = big5_attributes, # defined above openai.API = "REMOVED", # API Key REMOVED model = "gpt-4o", # Descriptive components for prompt construction domain = "personality measurement", scale.title = "Big Five Personality Inventory", audience = "college-educated adults in the United States", item.type.definitions = list( openness = "Openness reflects intellectual curiosity, aesthetic sensitivity, and a preference for novelty and variety.", conscientiousness = "Conscientiousness reflects a tendency toward self-discipline, goal-directed behavior, and organization.", extraversion = "Extraversion reflects sociability, assertiveness, and the tendency to seek stimulation in the company of others.", agreeableness = "Agreeableness reflects a tendency to be cooperative, compassionate, and trusting toward others.", neuroticism = "Neuroticism reflects emotional instability, including proneness to anxiety, sadness, and mood swings." ), response.options = c("strongly disagree", "disagree", "neutral", "agree", "strongly agree"), prompt.notes = "All items should be written as first-person self-report statements beginning with ' I am someone who ' .", # System role... the model ' s persona system.role = "You are an expert psychometrician and test developer specializing in personality assessment.", target.N = 8, # Generating only 8 items per item type items.only = TRUE # ONLY generating items in this example ) # View the some of the resulting items items_builtin$statement[1:5] [ 1 ] I a m s o m e o n e w h o o f t e n f i n d s u n c o n v e n t i o n a l s o l u t i o n s t o 18 Lara Russell-Lasalandra p r o b l e m s . [ 2 ] I a m s o m e o n e w h o e n j o y s e n g a g i n g i n a c t i v i t i e s t h a t a l l o w m e t o e x p r e s s m y i m a g i n a t i v e i d e a s . [ 3 ] I a m s o m e o n e w h o n o t i c e s d e t a i l s i n t h e e n v i r o n m e n t t h a t o t h e r s m i g h t o v e r l o o k . [ 4 ] I a m s o m e o n e w h o i s a b l e t o d e t e c t s u b t l e d i f f e r e n c e s i n t o n e a n d m o o d d u r i n g c o n v e r s a t i o n s . [ 5 ] I a m s o m e o n e w h o s e e k s o u t n e w k n o w l e d g e a n d e x p e r i e n c e s f o r t h e s a k e o f l e a r n i n g . 6.2.2 Implicit Prompt Engineering within AIGENIE The qualit y of AI-generated items depends hea vily on how the LLM is prompted. As explained abo ve, when no custom prompts are pro vided, AIGENIE() automatically constru cts prompts using the inf ormati on supplied through its parameters. Recent simul ati on work within the AI-GENI E framew ork has demonstrated that prompt engineering strategies can subst antially inﬂuen ce the structural v alidity and redundancy of generated item pools, with eﬀ ects that scale with model capa- bilit y Russell-Lasalandra and Golino, 2026. Therefore, this section makes a note of which prompt engineering strategi es are used implicitly when using the in-built pro mpt. The default prompt architecture incorporates sev eral prompt engin eering best practices: • System role (persona prompting): A domain-speciﬁ c expert persona is assigned to the model, built from the domain , scale.title , an d audience parameters. • Contextual instructi ons: The prompt inclu des the construct deﬁnitio n (from item.type.definitions ), the target attributes, and the response f ormat. • Few-shot examples: If item.examples are provided, they are incorporated to anchor the model’s output st yle. • A daptive generation: when adaptive = TRUE (the default), previ ou sly gen erated items are ap- pended to the prompt to prevent repetition. A follo w-up simul atio n study Russell-Lasal andra and Golino, 2026 f urther demo nstrated that the prompt engineering strategy of adaptiv e pro mpt- ing Lightman et al., 2023 can meaningfully shape the qualit y of AI-gen erated item pools, with eﬀects that scale with model capabilit y. A daptiv e prompting is a prompting strategy where the LLM is sho wn a running list of ev erything it has already produ ced and expli citly instru cted not to repeat or rephrase any of those earlier items. Typi cally, LLMs left unconstrain ed tend to re- gurgitate the same i deas o v er and o v er again. A daptive prompting counteracts that ten dency by making the model’s output from previ ous iterati o ns part of its input context. 6.2.3 The Custom Prompt In the custom prompting mode , the user supplies f ully written prompts vi a the main.prompts parameter. There must be exactly one prompt per item t ype. In this mode, the user has complete control o v er the exact instructio ns the model receives. This mode is best suited f or researchers who want to incorporate speciﬁc prompt engineering strategies or exercise ﬁne-grained control o v er ev ery aspect of how the model is instru cted. Ho wev er, custom prompting is subst antially more demanding than the built-in mode and is not generally recommended for most users. When the package constructs prompts automati cally, it includes several compon ents that are easy to ov erlook when writing prompts from scratch. For instance, prompt-writers must consider clear t ask framing and communicati on, the explicit mention of all the item t ype’s attributes, decisi o ns on the number of items to generate per model inst ance, and diligent o utlining of all important co ntext the model n eeds. Omitting any one of these components can lead to items that are poorly formatted or oﬀ-t arget at best, and items that are unparseable or otherwise unusable f or reductio n analysis at worst. Thus, we recommend that most researchers 19 use the built-in mode and only switch to custom prompting after gaining familiarit y with the package and with pro mpt engineering more broadly. For researchers who do choose custom prompting, there are sev eral requirements and best prac- tices to follo w. First, main.prompts must be a named list with one prompt per item t ype , and the n ames must match the names in item.attributes exactly. Second, each prompt must ex- plicitly referen ce all of a given item t ype’s attributes listed in the corresponding element of item.attributes . The package uses these attributes to parse an d label the returned items; if an attribute is missing from the prompt, items t argeting that f acet will not be generated or will it be labeled correctly. Third, each prompt should be self-contained. The prompt should pro vi de all the context the model needs for that particular item t ype, because each prompt is sent to the model independently . A w ell-co nstructed custo m pro mpt typicall y in cludes the follo wing components, in roughly this order: • Task co ntext : a good description of what is being generated an d why. • Contextual background : det ails like a strong deﬁnitio n of the target construct, who the scale is intended f or, and all other pertin ent inf ormatio n. • Explicit gen eratio n instructio ns : the model must know ho w many items to generate an d how they should be distributed acro ss attributes (this compo nent is mandatory ). • a list of the attributes, named exactly a s they appear in item.attributes (this component is mandatory ). • Qualit y constraints : these constraints can include instructions to generate no v el items, av oi d existing measures, and use a speciﬁ c item format. Let’s examin e an example of a custo m prompt. Recall that the big5_attributes object deﬁn ed in previous examples listed attributes for each of the Big Five personalit y traits (e.g., "curi osit y" and "philosophical" were n amed for openness where as "friendly" and "positive" were n amed for extraversion ). The big5_attributes object was used to generate items using the in-built prompt, and we will use it again to guide our prompt constructi on f or this custom prompting example. T o achiev e the same result using custom prompts, the we must write a complete, self-contained prompt f or each of the ﬁ v e item types. Ev ery prompt must ref erence all attributes by name exactly as they appear in big5_attributes . Here’s what these prompts might look like: # Create the "main.prompts" object custom_prompts <- list( openness = "You are generating novel items targeting the Big Five personality trait openness to experience. Openness to experience is a personality trait that describes how open-minded, creative, and imaginative a person is. Generate EXACTLY eight items total for openness to experience; generate two items per attribute of openness to experience. These attributes are as follows: 1) creative, 2) perceptual, 3) curious, and 4) philosophical. Do NOT add or remove any attributes; use the attributes EXACTLY as provided. All items should be first-person self-report statements beginning with ' I am someone who ' . Do NOT look for items that already exist in the literature; all items should be novel. Don ' t be afraid to push the bounds of the construct.", conscientiousness = "You are generating novel items targeting the Big Five personality trait conscientiousness. Conscientiousness 20 Lara Russell-Lasalandra is a personality trait that describes one ' s tendency toward self- discipline, goal-directed behavior, and organization. Generate EXACTLY eight items total for conscientiousness; generate two items per attribute of conscientiousness. These attributes are as follows: 1) organized, 2) responsible, 3) disciplined, and 4) prudent. Do NOT add or remove any attributes; use the attributes EXACTLY as provided. All items should be first-person self-report statements beginning with ' I am someone who ' . Do NOT look for items that already exist in the literature; all items should be novel. Don ' t be afraid to push the bounds of the construct.", extraversion = "You are generating novel items targeting the Big Five personality trait extraversion. Extraversion is a personality trait that describes people who are more focused on the external world than their internal experience. Generate EXACTLY eight items total for extraversion; generate two items per attribute of extraversion. These attributes are as follows: 1) friendly, 2) positive, 3) assertive, and 4) energetic. Do NOT add or remove any attributes; use the attributes EXACTLY as provided. All items should be first-person self-report statements beginning with ' I am someone who ' . Do NOT look for items that already exist in the literature; all items should be novel. Don ' t be afraid to push the bounds of the construct.", agreeableness = "You are generating novel items targeting the Big Five personality trait agreeableness. Agreeableness is personality trait that describes one ' s tendency to be cooperative, compassionate, and trusting toward others. Generate EXACTLY eight items total for agreeableness; generate two items per attribute of agreeableness. These attributes are as follows: 1) cooperative, 2) compassionate, 3) trustworthy, and 4) humble. Do NOT add or remove any attributes; use the attributes EXACTLY as provided. All items should be first-person self-report statements beginning with ' I am someone who ' . Do NOT look for items that already exist in the literature; all items should be novel. Don ' t be afraid to push the bounds of the construct.", neuroticism = "You are generating novel items targeting the Big Five personality trait neuroticism. Neuroticism is a personality trait that describes one ' s tendency to experience negative emotions like anxiety, depression, irritability, anger, and self- consciousness. Generate EXACTLY eight items total for neuroticism; generate two items per attribute of neuroticism. These attributes are as follows: 1) anxious, 2) depressed, 3) insecure, and 4) emotional. Do NOT add or remove any attributes; use the attributes EXACTLY as provided. All items should be first- person self-report statements beginning with ' I am someone who ' . Do NOT look for items that already exist in the literature; all items should be novel. Don ' t be afraid to push the bounds of the 21 construct." ) Noti ce that each prompt contains task context, construct deﬁnitio ns, attribute lists, quantit y instructi ons, and qualit y constraints. Each prompt follo ws the same general templ ate but must be indivi dually tailored to the t arget constru ct (i.e., item t ype), an d the attributes must be listed by name exactly a s they appear in big5_traits . If ev en one attribute is misspelled or omitted, the package will n ot be able to parse an d label the resulting items correctly. Moreov er, if the researcher later decides to change the item stem from "I am someon e who" to, say, "I tend to," that change must be applied man ually across all ﬁve prompts in the custo m mode. In the built-in mode it requires editing only the single prompt.notes string. W ith these prompts drafted, we can call the AIGENIE() f uncti on to generate items. Note that parameters like audience and response.options are n ot supplied here as they w ere when using the in-built prompt (the "bricks" are u seless if y ou already ha v e a sturdy "wall/stru cture" made): # Generate items using the custom prompts items_custom <- AIGENIE( item.attributes = big5_attributes, # defined previously openai.API = "REMOVED", # API key removed model = "gpt-4o", main.prompts = custom_prompts, # these are the custom prompts # Give the model the same persona as before system.role = "You are an expert psychometrician and test developer specializing in personality assessment.", target.N = 8, # only generate 8 items per item type items.only = TRUE # only return items... no reduction pipeline ) # View the first 5 items generated items_custom$statement[1:5] [ 1 ] I a m s o m e o n e w h o f i n d s n o v e l w a y s t o e x p r e s s m y t h o u g h t s a n d i d e a s . [ 2 ] I a m s o m e o n e w h o e n j o y s t r a n s f o r m i n g o r d i n a r y o b j e c t s i n t o s o m e t h i n g a r t i s t i c . [ 3 ] I a m s o m e o n e w h o n o t i c e s s u b t l e d e t a i l s i n m y s u r r o u n d i n g s t h a t o t h e r s m i g h t m i s s . [ 4 ] I a m s o m e o n e w h o e n j o y s i m m e r s i n g m y s e l f i n n e w s e n s o r y e x p e r i e n c e s , l i k e t a s t i n g u n f a m i l i a r f o o d s o r l i s t e n i n g t o d i v e r s e m u s i c g e n r e s . [ 5 ] I a m s o m e o n e w h o f e e l s a s t r o n g n e e d t o i n v e s t i g a t e a n d u n d e r s t a n d h o w t h i n g s w o r k . 7. The AIGENIE Function The f ull AI-GEN I E pipeline automates the entire workﬂ o w from item generati on through struc- tural vali dati on. A single call to the AIGENIE() f uncti on (or local equivalent local_AIGENIE() f uncti on) generates items, embeds them, estimates the net work structure, remov es redundant items, assesses st abilit y, and returns a validated item pool. This section walks through the complete pipeline using t w o exampl es: a simple Big Fiv e demo nstratio n an d a more exten ded AI Anxi et y applicati on. 22 Lara Russell-Lasalandra 7.1 Understanding the Output of the AIGENIE F unction This section describes the def ault output that most researchers will see when they work with AIGENIE (i.e., the object returned when items.only , embeddings.only , run.overall , keep.org , and all.together are all set to their default value of FALSE ). V ariations introduced by each ﬂag are described at the end of this secti on. T o better help demonstrate the structure of the output, let’s begin with a focused example gener- ating items for three of the Big Fiv e traits. Note that this code is identical to the example listed un der text generati on, but with the items.only ﬂag remov ed, the target number of items increased to a more sizable count appropriate for redu ctio n, and the model changed to a more capabl e on e: # Generate items and run the reduction pipeline reduction_builtin <- AIGENIE( item.attributes = big5_attributes, # defined previously openai.API = "REMOVED", # API Key REMOVED model = "gpt-5.1", # using a more modern model # Descriptive components for prompt construction domain = "personality measurement", scale.title = "Big Five Personality Inventory", audience = "college-educated adults in the United States", item.type.definitions = list( openness = "Openness reflects intellectual curiosity, aesthetic sensitivity, and a preference for novelty and variety.", conscientiousness = "Conscientiousness reflects a tendency toward self-discipline, goal-directed behavior, and organization.", extraversion = "Extraversion reflects sociability, assertiveness, and the tendency to seek stimulation in the company of others.", agreeableness = "Agreeableness reflects a tendency to be cooperative, compassionate, and trusting toward others.", neuroticism = "Neuroticism reflects emotional instability, including proneness to anxiety, sadness, and mood swings." ), response.options = c("strongly disagree", "disagree", "neutral", "agree", "strongly agree"), prompt.notes = "All items should be written as first-person self-report statements beginning with ' I am someone who ' .", # System role... the model ' s persona system.role = "You are an expert psychometrician and test developer specializing in personality assessment.", target.N = 60, # Generating 60 items per item type ) 23 7.1.1 T op-Level Structur e The default output is a named list with t wo top-l ev el elements: item_type_level and overall : # See the top-level structure of the output names(reduction_builtin) [ 1 ] i t e m _ t y p e _ l e v e l o v e r a l l The item_type_level object is itself a named list containing one element per item t ype, named to match item.attributes . In our Big Five example, reduction_builtin$item_type_level contains ﬁve sublists corresponding to each of the ﬁv e personalit y traits: openness , conscientiousness , extraversion , agreeableness , and neuroticism . Each sublist holds the complete set of results from the redu ctio n pipeline as applied to that item type in isol atio n . The overall object is also a named list that aggregates the ﬁnal items and embeddings across all item types into a single object. The ﬁnal items remaining after the reducti on analysis f or items of all t ypes are stored as a data frame in the overall object: reduction_builtin$overall$final_items . The overall object also cont ains an element called embeddings . The embeddings object is a list containing the sparsiﬁed and the f ull embedding matrices used in the reducti on pipelin e. 7.1.2 Results on the Item Type Level Since items are evaluated in the reductio n pipeline compl etely agnosti c of items that are of other t ypes, separate, independent results are available for each of the item t ypes. In other words, con- tentio usness items go through the reductio n pipeline o nly with other co nscienti ousn ess times, neu- roticism items go through the reductio n pipeline only with other neuroticism, and so on. Each per- t ype sublist within reduction_builtin$item_type_level contains thirteen elements. There- fore, in our exampl e, there w o uld be thirteen result el ements pertaining to each of the ﬁv e person- alit y traits. These are the thirteen elements: # Pick a trait (openness) to see what results are available on the # item-type level in general names(reduction_builtin$item_type_level$openness) [ 1 ] f i n a l _ N M I i n i t i a l _ N M I e m b e d d i n g s [ 4 ] U V A b o o t E G A E G A . m o d e l _ s e l e c t e d [ 7 ] f i n a l _ i t e m s f i n a l _ E G A i n i t i a l _ E G A [ 1 0 ] s t a r t _ N f i n a l _ N n e t w o r k _ p l o t [ 1 3 ] s t a b i l i t y _ p l o t Here is closer l ook at each of these thirteen elements: • final_items is the most practically important element. It is a data.frame containing the items that surviv ed the full reduction pipelin e. Its columns are: – ID : A numeric i dentiﬁer assigned during gen erati on. – statement : The item’s actual text (e.g., I often lose myself in creative pr oje cts ) – attribute : The attribute the item was generated to reﬂect (e.g., "creativ e"). – type : The item t ype the item belongs to (e.g., "openness"). – EGA_com : The communit y assignment from the ﬁnal EGA net work. • start_N records the total number of items generated for this item t ype. In our example, reduction_builtin$item_type_level$agreeableness$start_N w o uld return 64 . Thus, there w ere 64 agreeableness items in the initial item pool before reducti on. 24 Lara Russell-Lasalandra • final_N records the total number of items generated for this item t ype. In our example, reduction_builtin$item_type_level$agreeableness$final_N w o uld return 49 . Thus, there w ere 49 agreeableness items in the ﬁn al item pool after the reducti o n analysis. • initial_NMI is a n umeri c value reporting the Normalized Mutu al Informati on (NMI) bet ween the net work-detected communit y stru cture an d the kno wn attribute assignments before any reducti on steps. • final_NMI is a numeric value reporting the NMI bet ween the net w ork-detected communit y structure an d the kno wn attribute assignments after all redu ctio n steps. • UVA is a list containing results from the Uni que V ariable An alysis (UV A) redun dancy reducti on step. UV A identiﬁ es pairs or sets of items whose embeddings are so simil ar that ret aining them all w ould introdu ce redundan cy. The list contains: – n_removed is the total number of redundant items remo v ed acro ss all UV A sw eeps. – n_sweeps is the number of iterative passes UV A required before no further redundanci es w ere detected. – redundant_pairs is a data.frame logging ev ery redun dancy decisi on. Each row records the sw eep in which the redun dancy w as detected, the items in v olved, whi ch item was kept, and whi ch item or items w ere remo ved. • bootEGA is a list containing results from the bootstrap Exploratory Graph Analysis (bootEGA) step. After UV A remov es semantically redundant items, bootEGA evaluates the structural st a- bilit y of the remaining items by repeatedly resampling the embedding matrix and estimating the net work structure. Items that frequently change communit y membership across bootstrap samples are pruned. The list cont ains: – initial_boot is the bootEGA object (from the EGAnet package H. Golino and Christensen, 2025) estimated on the post-UV A item pool, before any stabilit y-based pruning. – final_boot is the bootEGA object estimated after st abilit y-based pruning. – n_removed reports the number of items remov ed due to low structural st abilit y acro ss all bootEGA stabilit y sweeps. – items_removed is a data.frame logging the speciﬁ c items that were identiﬁ ed and pruned during the stabilit y check. – initial_boot_with_redundancies is a bootEGA object estimated on the f ull, pre-UV A item pool. This object is useful for comparing the stabilit y of the item pool bef ore an d after redundan cy reducti on. • EGA.model_selected is a character string indi cating which net w ork estimatio n model was selected: T riangulated Maximally Filtered Graph ( TMFG Massara et al., 2016) or Exten ded Bayesian Criterion Graphical Least Absolute Shrinkage and Selectio n Operator ( EBICglasso or gl asso Foygel and Drton, 2010; Friedman et al., 2008). When EGA.model = NULL (the default), the package tests both models and sel ects whichever produces a higher NMI. If the user speciﬁes a model via the EGA.model parameter, this ﬁeld simply reﬂ ects that choi ce. • initial_EGA is an EGA object (from the EGAnet package) representing the n et w ork stru cture of the item pool before the redu cti on pipeline. • final_EGA is an EGA object representing the network structure after the redu cti on pipeline • embeddings is a list containing the embedding matrices used in the an alysis. The list contains three elements: – full is the dense (full) embedding matrix f or the ﬁnal retained items. R o ws are embedding dimensio ns, columns are items (column names correspond to item IDs). – sparse is the sparsiﬁed embedding matrix f or the ﬁnal retained items. Sparsiﬁcatio n zeroes out valu es in the mi ddle 95% of the distribution, retaining only the most extreme embedding entities. 25 – selected names the embedding t ype (either f ull or sparse) used in the an alysis. The package tests both and sel ects whichev er yiel ds a higher NMI. • network_plot stores the ggplot W ickham, 2016/ patchwork P edersen, 2025 object displaying a si de-by-side co mparison of the E GA n et works bef ore an d after redu ctio n with the initial n et- w ork on the left and the ﬁnal net w ork on the right. NMI values are annotated on each panel. An example of this network plot f or agreeableness items is sho wn in Figure 3. • stability_plot is a ggplot / patchwork object comparing item stabilit y before (the plot on the l eft) an d after (the pl ot o n the right) redu ctio n. Item stabilit y ref ers to ho w consistently each item is assigned t o the same community across bootstrap sampl es. Higher stabilit y valu es (closer to 1) indi cate that an item reliably bel ongs to its assign ed dimensio n. An example of such a pl ot for agreeableness items is sho wn in Figure 4. Figure 3. A plot showing the EGA net w ork before item reduction (on the left) compared to the net work after item reductio n (on the right) f or Agreeableness items. 26 Lara Russell-Lasalandra Figure 4. The stabilit y of the EGA n et w ork before reducti on (left) versus after (right) reductio n. The stabilit y plot sho ws a massive impro vement in stabilit y post-reducti on. A dditionally, the NMI has increased by about 4% 27 7.1.3 Output Variations by Flag The aforementi o ned output stru cture occurs un der default settings and is likely suﬃ cient f or many researchers. How ev er, this output structure may change, depending on if any of these f our ﬂags are changed from the def ault of FALSE to TRUE : items.only , embeddings.only , keep.org , and run.overall . A dditio nally, the output changes when the ﬂag all.together = TRUE , but in a more f un damental way; this ﬂag is cov ered in Secti on 7.1.4. As discussed previously, when items.only = TRUE , the f uncti on skips embedding, network analysis, and reduction entirely. It returns a simple data.frame with fo ur columns ( ID , statement , type , and attribute ). This dat a frame co ntains the ra w gen erated item pool. This is useful when the researcher wants to inspect or man ually curate items before running the psychometri c pipelin e, or when items will be embedded externally and passed to GENIE() . When embeddings.only = TRUE , the f unctio n generates items and computes embeddings but skips the psychometric reducti on pipelin e. It returns a n amed list with t wo elements: embeddings (the embedding matrix) and items (the items data.frame described abov e). When keep.org = TRUE , the top-level stru cture remains the same ( item_type_level and overall ), but each per-t ype sublist gains an initial_items ﬁeld. This ﬁeld contains a dat a frame of all items generated bef or e reduction. The embeddings sublists also gain full_org and sparse_org matrices corresponding to the f ull pre-reducti on item pool. The overall element similarly gains initial_items , full_org , and sparse_org . When run.overall = TRUE , the item_type_level results are un changed, but the overall element becomes a f ull an alysis object rather than a simple aggregation. It includes its own final_NMI , initial_NMI , EGA.model_selected , final_EGA , initial_EGA , start_N , final_N , and network_plot . Critically, this ov erall an alysis does not perform additio nal reductio n — it evaluates the com- bined post-redu cti on item pool as a whole, whi ch is useful for assessing cross-trait dimensi onalit y. 7.1.4 The all. toge ther Flag By def ault, the reductio n pipeline processes each item t ype independently. In our example, for in- stance, openness items are embedded, analyzed, and pruned witho ut any kno wl edge of the consci- entio usness items, and vice versa. This strategy is likely the most appropriate strategy for researchers. Setting all.together = TRUE changes this behavior f undamentally . Instead of running sep- arate pipelines for each item t ype, the f unctio n pools every generated item into a single batch and runs the f ull reductio n pipeline on the entire pool simult aneou sly. This means that UV A can no w detect and remov e redundancies that span item t ypes (e.g., a "warmth" item un der agreeableness that is semantically indistinguishable from a "frien dliness" item under extrav ersi on), an d EGA estimates a single network stru cture across all items at on ce. Internally, when all.together = TRUE , the package rea ssigns each item’s attribute to a con catenation of its original t ype and attribute (e.g., "openness creativ e", "neuroticism anxio us") and collapses all items into a singl e n ominal t ype. The pipeline then proceeds as tho ugh there w ere only o ne item t ype, and NMI is co mputed against this con catenated attribute structure. The output structure also changes. Rather than the t w o-lev el item_type_level / overall or- ganization, the function returns a ﬂ at named list containing: final_NMI , initial_NMI , embeddings , UVA , bootEGA , EGA.model_selected , final_items , final_EGA , initial_EGA , start_N , final_N , network_plot , and stability_plot . These are the same elements described in the per-t ype breakdo wn abo ve, but appli ed to the combin ed pool. When the boun daries bet ween item t ypes are theoretically f uzzy and the researcher suspects that constru cts ma y share substantial semantic o v erlap, altering this ﬂag ma y be a good expl oratory step. A dditio nally, when the researcher wants to let the network structure emerge entirely from the data rather than imposing a t ype-lev el separation a priori, changing this ﬂag ma y pro v e fruitf ul. 28 Lara Russell-Lasalandra Note that all.together is ignored when only one item type is present, since there is no distinctio n bet ween per-type and pooled reducti on in that case. Lastly, it’s is worth explicitly co ntrasting all.together with run.overall since their n ames are so simil ar. The run.overall ﬂag does not change how reducti on w orks— items are still pruned indepen dently within each t ype. It simply adds a post-hoc EGA analysis of the combined ﬁnal pool to assess cross-trait structure. Whereas with all.together , the reducti on itself operates on the combined pool, meaning that items can be remo v ed because of cro ss-t ype redundancy. The t wo ﬂags answ er diﬀerent questi ons. The run.overall ﬂ ag asks what does the cr oss-tr ait structure lo ok like after per-type r e duction? While the all.together ﬂag asks what happens when we let the r e duction pipeline see all items at once? 7.2 Running AIGENIE on An Emerging Construct: AI Anxiety No w that the AIGENIE() functio n use and output has been demonstrated using the established "Big Fiv e" personalit y model, we n o w turn to a more realistic applicatio n: developing a scale to measure anxiet y about AI. AI Anxiety ( AIA W ang and W ang, 2022) is a construct of growing research interest that currently lacks a well-established, comprehensi ve mea surement instrument. Thus, AIA is an ideal candidate for demonstrating the package’s value for devel oping scales where w ell established instruments likely do not exist within the LLM training dat a. W ang & W ang W ang and W ang, 2022 identiﬁ ed a fo ur-factor structure based on distinct sources of anxi et y that individuals experience in response to the development, deployment, and societal integratio n of AI technol ogies. These four factors are: • L earning anxiet y is cognitive ov erwhelm in the f ace of AI complexity, including perceiving on e’s kno wledge as insuﬃ cient f or AI deman ds and f eeling daunted by the pace of AI advan ce- ment. • Job replacement describes fear of professio nal obsolescence, inclu ding anxiet y that AI will elim- inate on e’s professional role, belief that one’s skills can be automated, an d un certaint y about career stabilit y. • Sociotechnical blindness is concern about societ al AI impacts, including loss of auton omy to AI sy stems, co ncerns about AI-enabled privacy vi olations, and w orry about o v er-reliance on AI technol ogy. • AI conﬁguration describes un ease regarding AI system opacit y, including lacking conﬁ dence in AI decision-making, confusion about how AI systems operate, and doubting AI’s reliabilit y and saf et y. The original AI-GENI E paper inclu ded a small-scale simulation using this co nstruct Russell- Lasal andra et al., 2024, and w e foll o w the same operatio naliz atio n here. When creating the item.attributes object for AIA, it is important to consi der the central themes of each factor: # Defining the item.attributes object for AIA ai_anxiety_attributes <- list( learning_anxiety = c("overwhelmed", "inadequacy", "intimidated"), job_replacement = c("threatened", "replaceable", "insecure"), sociotechnical_blindness = c("powerless", "overly dependent", "surveilled"), ai_configuration = c("distrustful", "uncertain", "vulnerable") ) W ith the item.attributes object deﬁned, o nly a f ew more el ements are n eeded before a scal e can be generated: 29 # Provide detailed definitions for each factor ai_anxiety_definitions <- list( learning_anxiety = paste( "Learning anxiety refers to cognitive overwhelm in the face of AI complexity.", "It includes perceiving one ' s knowledge as insufficient for AI demands", "and feeling daunted by the pace of AI advancement." ), job_replacement = paste( "Job replacement anxiety refers to fear of professional obsolescence.", "It includes fearing that AI will eliminate one ' s professional role,", "believing one ' s skills can be automated, and experiencing uncertainty", "about career stability." ), sociotechnical_blindness = paste( "Sociotechnical blindness refers to concern about societal AI impacts.", "It includes loss of autonomy to AI systems, concerns about AI-enabled", "privacy violations, and worrying about over-reliance on AI technology." ), ai_configuration = paste( "AI configuration anxiety refers to anxiety about AI system opacity.", "It includes lacking confidence in AI decision-making, confusion about", "how AI systems operate, and doubting AI ' s reliability and safety." ) ) # Run the full pipeline ai_anxiety_results <- AIGENIE( item.attributes = ai_anxiety_attributes, openai.API = "REMOVED", # API Key removed model = "gpt-5.1", embedding.model = "text-embedding-3-small", domain = "technology-related psychological assessment", scale.title = "AI Anxiety Scale", audience = "adults who use or are exposed to AI technologies in daily life", item.type.definitions = ai_anxiety_definitions, response.options = c("strongly disagree", "disagree", "slightly disagree", "slightly agree", "agree", "strongly agree"), target.N = 80 ) The results from this singular run w ere promising. The NMI consistently trended upward post- reducti on (see Figures 5, 6, 7, an d 8). 8. The GENIE Function Not all researchers need AI-generated items, hence, the advent of GENIE . This f uncti on performs all the same redu cti on steps as AIGENIE without the LLM-gen eratio n step (that is, AIGENE without "AI"). The GENIE() f unctio n (or its local equival ent local_GENIE ) applies the f ull net work psy- chometric evaluati on pipeline (embedding, EGA, UV A, bootE GA) to any user-supplied set of items, without generating any n ew content. 30 Lara Russell-Lasalandra Figure 5. The EGA net work bef ore (left) vs after (right) reduction for AI L earning anxiet y items. The NMI improv ed by 9.48%. The ﬁnal NMI is 86.91% Figure 6. The EGA network before (left) vs after (right) reduction for AI J ob Replacement items. The NMI improved by 9.39%. The ﬁnal NMI is 100%. 31 Figure 7. The EGA net w ork before (left) vs after (right) reducti on for AI Conﬁguratio n items. The NMI improv ed by 6.88%. The ﬁnal NMI is 90.93%. Figure 8. The EGA net w ork before (left) vs after (right) reducti on for AI Conﬁguratio n items. The NMI improv ed by 9.37%. The ﬁnal NMI is 100%. 32 Lara Russell-Lasalandra 8.1 Using GENIE with AIA Items The GENIE() f unctio n requires a data frame with four columns: statement (the item text), attribute (the sub-f acet or characteristic the item targets), type (the o v erarching construct), and ID (a unique identiﬁ er). The structure of this data.frame is identi cal to the dat a frame that is returned in AIGENIE when the items.only ﬂag is set to TRUE . T o better explicate the use of the GENIE f unctio n, we will continu e considering the emerging constru ct AIA discussed in Sectio n 7.2. Firstly, an initial item pool needs to be speciﬁ ed and l oaded into the environment. W e hav e pro vided a miniature initi al item pool. T his item pool contains too few items (only 5 per item t ype) for a meaningf ul reductio n an alysis , but is nonethel ess still useful to understand the expected structure of the required data frame: # Example structure showcasing how GENIE expects the items to be # formatted my_ai_anxiety_items <- data.frame( statement = c( # Learning anxiety items "I feel overwhelmed by how quickly AI technology is advancing", "I worry that my knowledge is not enough to keep up with AI", "The complexity of AI systems makes me feel inadequate", "I am intimidated by the amount I would need to learn about AI", "I feel daunted when I try to understand how AI works", # Job replacement items "I worry that AI will make my job obsolete", "The thought of AI replacing human workers makes me anxious", "I am concerned that AI will take over my career field", "I feel insecure about my professional future because of AI", "I fear that my skills will become irrelevant as AI improves", # Sociotechnical blindness items "I feel powerless in the face of AI-driven decisions that affect me", "I worry about becoming too dependent on AI technology", "It bothers me that AI can track and predict my behavior", "I worry about losing my privacy to AI-powered surveillance", "I am concerned about how much autonomy I am giving up to AI", # AI configuration items "I do not trust AI systems to make important decisions", "I am uncertain about how AI systems actually arrive at their answers", "It concerns me that I cannot verify whether AI output is reliable", "I feel vulnerable when I have to rely on AI I do not understand", "I doubt that AI systems are safe enough to be widely deployed" ), attribute = c( rep("overwhelmed", 2), rep("inadequacy", 2), "intimidated", rep("threatened", 2), rep("replaceable", 2), "insecure", "powerless", "overly dependent", rep("surveilled", 2), "powerless", "distrustful", rep("uncertain", 2), "vulnerable", "distrustful" ), type = c( rep("learning_anxiety", 5), rep("job_replacement", 5), rep("sociotechnical_blindness", 5), 33 rep("ai_configuration", 5) ), ID = paste0("AI_", 1:20), stringsAsFactors = FALSE ) W ith items prepared, running the GENIE f unctio n is straightforw ard (a much l arger dataset must be used, so this code only serv es as a referen ce and w as not actually run): # Run the full GENIE pipeline on your items genie_results <- GENIE( items = my_ai_anxiety_items, # ENSURE LARGE ENOUGH openai.API = "REDACTED" # API Key REMOVED ) The output of the GENIE() f unctio n is the same a s the output of the AIGENIE() functio n, so reviewing the results of the GENIE output can be done ea sily, especially if on e already has some familiarit y with the AIGENIE() f uncti on. 8.2 Using GENIE with Existing Embeddings If you hav e already embedded your items (perhaps using a diﬀerent tool or during a previo us session), y ou can supply the embedding matrix directly to skip the embedding step: # Suppose you have a pre-computed embedding matrix # (rows = embedding dimensions, columns = items, colnames = item IDs) # my_embeddings <- [your embedding matrix here] genie_results_precomputed <- GENIE( items = my_ai_anxiety_items, embedding.matrix = my_embeddings # Provide your own embeddings ) Skipping the embedding step is particularly useful for researchers who want to compare ho w diﬀer- ent embedding models aﬀect the structural an alysis, or who w ant to embed items using a speci alized model n ot nativ ely supported by the package. How ev er, a vast majorit y of users will very likely want to use on e of the native methods. A dditionally, if the embeddings are provi ded to the GENIE() f unc- tio n, no API calls (nor their keys) are required. Therefore, using GENIE() to repli cate and share results may be i deal. 9. Discussion The present tutorial has introduced the AIGENIE R package, a comprehensi v e tool for automated psychological scale dev el opment an d structural v alidati o n. The package implements the AI-GEN I E framew ork Russell-Lasal andra et al., 2024, integrating LLM-based item generatio n with net work psychometric methods to produce structurally vali dated item pools eﬃciently. This R package allo ws users an easy and practi cal wa y to div e into the emerging realm of Generativ e Psycho metrics . W e ha v e demonstrated the package’s core capabiliti es fro m basic inst allation an d setup, thro ugh text generatio n and embedding, to the full psychometric pipeline. T w o running examples (the w ell- established Big Fiv e personalit y model J ohn and S rivastava, 1999 an d the emerging co nstruct of AI Anxiet y W ang and W ang, 2022) illustrated the package’s utilit y across varying lev els of construct maturit y in the literature. 34 Lara Russell-Lasalandra The most immedi ate advant age of working within this framework is the subst antial reducti on in the time and co st required to produ ce a structurally v alid initial item pool that has been checked for redun dancy. T raditi onal scale dev elopment t ypically requires a team of content experts to draft items, multiple roun ds of review and revision, and l arge-scale pilot testing before psychometri c evaluatio n can even begin — a process that can span months or years and cost tens of thousands of dollars Fenn et al., 2020. AIGENIE compresses much of this early-stage w ork into a singl e function call that generates, embeds, and psychometrically ev aluates an item pool in minutes. By packaging state-of-the-art prompt engineering, text embedding, and network psychometri c methods into an accessible R interface, AIGENIE lo w ers the barrier to entry f or scale constructi o n. This could lead to a broader and more diverse set of psychological assessments being a vailable, particularly for constructs that are culturally speciﬁc, n ewly emerging, or otherwise underserv ed by existing instruments. A dditio nally, the deterministic and source-agnosti c n ature of the reductio n pipeline ha s impli- catio ns beyo nd AI-generated items. Any researcher with an existing item pool— whether expert- authored, adapted fro m pri or instruments, or co mpiled from qualitativ e research— can submit it to the same embedding, UV A, and bootEGA pipelin e f or an objectiv e stru ctural evaluatio n. Thus, the AIGENIE approach surpa sses any method that uses the output of LLMs to evaluate item qualit y (e.g., asking an LLM whether an item is robust) in terms of its replicabilit y. A dditionally, AIGENIE av o ids the issue of using an LLM to ev aluate an LLM, which raises a whole ho st of potential red ﬂags. 9.1 Limitations and Considerations Several limit ations should be kept in mind when using this tool. First, while AI-GENI E provi des structural vali dati on in sili co, it does not replace the need f or empirical vali dati on with human par- ticipants. The generated and reduced item pools shoul d be consi dered strong can didates for em- pirical testing, not ﬁnished instruments. The original AI-GEN I E paper demonstrated that in silico structural validit y closely tracks empirical structural vali dit y for the best-perf orming models R ussell- Lasal andra et al., 2024, but this correspo nden ce should not be t aken for granted in every applicatio n. Ho wev er, while AIGENIE does not eliminate the need for expert ov ersight, it does f unda- mentally change where human judgment is most needed. AIGENIE shifts the focus a w ay from the labori ous drafting of initial items and to w ard the higher-order tasks of revi ewing, reﬁning, an d empirically vali dating a pre-curated pool. A dditio nally, the qu alit y of generated items depends hea vily on the LLM used. New er and more capable models generally produce better results, particularly when combined with advanced prompting strategies Russell-Lasalandra and Golino, 2026. How ev er, the LLM landscape is evolving rapidly— models that are state-of-the-art at the time of writing will mo st deﬁnitively be superseded by the time this tutorial is read. The package’s provi der-agno stic architecture is designed to accom- modate this ev olutio n. Finally, the AIGENIE package is under active development. The latest versi on of AIGENIE can alwa ys be f o und o n R-univ erse at https://laralee.r- univ erse.dev/AIGENI E, and the full source code is publicly available for inspection and contribution. W e welcome feedback, f eature requests, an d contributi ons fro m the research communit y. Appendix 1. Quick Refer enc e Appendix 1.1 Core Functions T able 1 pro vides a co mplete list of the functio ns av ailable in the AIGENIE package. Appendix 1.2 Combining Pro viders The AIGENIE package allows mixing providers for text generati on and embedding. For example, a researcher could use Groq for f ast item generatio n with an open-source model while relying on OpenAI for embeddings: 35 T able 1. Functions a vailable in the AIGENIE package. Functio n Description AIGENIE() Full pipeline: generate items, embed, EGA, UV A, bootEGA GENIE() V alidatio n pipeline f or user-pro vided items chat() Send prompts to supported LLMs list_available_models() List models available acro ss pro viders local_AIGENIE() Run full pipeline with locally ho sted LLMs local_GENIE() V alidate items with local models local_chat() Chat with local models ensure_aigenie_python() Conﬁgure Python en vironment python_env_info() Sho w enviro nment details reinstall_python_env() R ebuild Python en vironment set_huggingface_token() Conﬁgure Hugging Face access install_local_llm_support() Install local LLM dependenci es install_gpu_support() Enable GPU accelerati on check_local_llm_setup() V erif y local LLM conﬁgurati on get_local_llm() Downl oad local LLM models results <- AIGENIE( item.attributes = item_attributes, groq.API = "your-groq-key", openai.API = "your-openai-key", model = "llama-3.3-70b-versatile", embedding.model = "text-embedding-3-small", target.N = 60 ) A lternativ ely, Anthropic’s Clau de models can be paired with Jina AI embeddings: results <- AIGENIE( item.attributes = item_attributes, anthropic.API = "your-anthropic-key", jina.API = "your-jina-key", model = "sonnet", embedding.model = "jina-embeddings-v3", target.N = 60 ) Appendix 1.3 Querying Available Models Model av ailabilit y changes a s providers update their cat alogs. The current list of available models can be queri ed directly: # Per-provider queries list_available_models("openai", openai.API = openai_key) list_available_models("groq", groq.API = groq_key) list_available_models("anthropic", anthropic.API = anthropic_key) list_available_models("jina") 36 Lara Russell-Lasalandra # All providers at once list_available_models( openai.API = openai_key, groq.API = groq_key, anthropic.API = anthropic_key ) # Filter by type list_available_models(openai.API = openai_key, type = "chat") list_available_models(openai.API = openai_key, type = "embedding") Appendix 1.4 Supported Chat Models T able 2 lists commonly used chat models and their shorthand ali ases supported by AIGENIE . Note that this is a ref erence snapshot; the list_available_models() f uncti on alw ay s returns the cur- rent catalog. T able 2. C ommo nly used chat models and their shorthan d aliases. Provider Models A liases OpenAI gpt-4o, gpt-4-turbo, gpt-5.1, gpt-5.2 gpt4o, chatgpt Anthropic claude-opus-4, clau de-opus-4.6 sonnet, opus, haiku, clau de Groq llama-3.3-70b-versatile, qw en-2.5-72b llama3, mixtral, gemma, qwen Appendix 1.5 Supported Embedding Models T able 3 lists commo nly used embedding models. T able 3. C ommo nly used embedding models. Provider Models OpenAI text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 Jina AI jina-embeddings-v4, jina-embeddings-v3, jin a-embeddings-v2-base-en Hugging Face BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, thenlper/gte-small Local sentence-transf ormers/all-MiniLM-L6-v2, bert-base-uncased Refer enc es Astral. (2024). Uv: An extr emely f ast python package and pr oje ct manag er . https://github.com/astral- sh/uv Asudani, D. S., Nagwani, N. K., & Singh, P. (2023). Impact of word embedding models on text analytics in deep learning environment: A revi ew. Artiﬁcial intellig ence r eview , 56 (9), 10345–10425. A uger, T., & Saroyan, E. (2024). Overview of the openai apis. In Gener ative ai for web development: Building web applications powere d by openai apis and next. js (pp. 87–116). Springer. Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quiñon ez, H. R., & Y oung, S. L. (2018). Best practices for de- vel oping and validating scales f or health, social, and beha vioral research: A primer. F r ontiers in Public Health , 6 , 149. Carlini, N., Tramer, F., W allace, E., Jagielski, M., Herbert-V oss, A., Lee, K., R oberts, A., Brown, T., Song, D., E rlingsson, U., et al. (2021). Extracting training data from large language models. 30th USENIX se curity symposium (USENIX Security 21) , 2633–2650. Chakrabart y, T., & D hillon, P. S. (2026). Can good writing be generativ e? expert-lev el ai writing emerges through ﬁne- tuning on high-qualit y books. Chen, B., Zhang, Z., L angréné, N., & Zhu, S. (2025). Unleashing the potential of prompt engineering in l arge language models: A comprehensiv e review. P atterns , 6 (6), 101260. 37 Christensen, A. P., Garri do, L. E., & Golino, H. (2023). U ni que variable analysis: A n etwork psychometrics method to detect local dependence. Multivariate Behavioral Research , 58 (6), 1165–1182. https : / / doi . org / 10 . 1080 / 00273171 . 2023 . 2194606 Christensen, A. P., & Golino, H. (2021). E stimating the stabilit y of psychologi cal dimensions via bootstrap exploratory graph analysis: A monte carlo simulatio n and tutorial. P sych , 3 (3), 479–500. https://doi.org/10.3390/psych3030032 Clark, L. A., & W atson, D. (2016). Constructing v alidity: Basic issues in objectiv e scale dev elopment. In A. E. Kazdin (Ed.), Methodological issues and str ateg ies in clinical research (4th ed., pp. 187–203). American Psychologi cal Associatio n. https://doi.org/10.1037/14805- 012 Danon, L., Di az-Guilera, A., Duch, J., & Arena s, A. (2005). C omparing communit y structure i dentiﬁcatio n. Journal of sta- tistical mechanics: Theory and experiment , 2005 (09), P09008–P09008. De Paoli, S. (2023). Impro v ed prompting and process for writing user personas with llms, using qu alitative interviews: Capturing behavi our and perso nalit y traits of users. arXiv preprint . Evstafev, E. (2025). The paradox of stochasticit y: Limited creativit y and comput atio nal decoupling in temperature-vari ed llm outputs of structured ﬁ ctio nal data. arXiv preprint . Fenn, J., T an, C.-S., & George, S. (2020). Development, validation and translation of psychological tests. BJP sych advances , 26 (5), 306–315. Foygel, R., & Drton, M. (2010). Extended bayesian information criteria for gaussian graphical models. Adv ances in neural inf ormation pr ocessing systems , 23 . Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse cov ariance estimation with the graphical lasso. Biostatistics , 9 (3), 432–441. Garrido, L. E., Ru ssell-Lasal andra, L., & Golino, H. (2025). Estimating dimensional structure in generativ e psychometrics: Comparing pca and n et work methods u sing large langu age model item embeddings. P syArXiv Pr eprints . Golino, H., & Christensen, A. (2025). Eganet: Exploratory graph analysis – a fr amework for estimating the number of dimensions in multivariate data using network psychometrics [R package v ersio n 2.1.1]. https :/ /doi. org/10 .32614 /CRAN. package. EGAnet Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PLoS ONE , 12 (6), e0174035. https://doi.org/10.1371/journal.pon e.0174035 Götz, F. M., Maertens, R., Loomba, S., & V an Der Linden, S. (2024). Let the algorithm speak: Ho w to use neural n et w orks for aut omatic item gen eratio n in psychological scale dev el opment. P sycholog ical Methods , 29 (3), 494. Groq. (n.d.). Lpu architecture [A ccessed: 2026-03-29]. https://groq.com/lpu- architecture Güven, G. Ö., Yilmaz, Ş., & Inceoğlu, F. (2024). Determining medical students’ anxiet y and readiness levels about artiﬁ cial intelligence. Heliyon , 10 (4). Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751 . Hommel, B. E., W ollang, F.-J. M., Koto va, V., Zacher, H., & Schmukle, S. C. (2022). T ransformer-based deep neural language modeling for construct-speciﬁ c auto matic item generati on. P sychometrika , 87 (2), 749–772. Hu, Z., R o stami, M., & Thomason, J. (2026). Expert personas impro ve llm alignment but damage accuracy: Bootstrapping intent-based persona routing with prism. arXiv preprint . Jiang, H., Z hang, X., Cao, X., Breazeal, C., R oy, D., & Kabbara, J. (2024). Personallm: Investigating the abilit y of large language models to express personalit y traits. F inding s of the association f or computational ling uistics: N AA CL 2024 , 3605–3627. J ohn, O. P., & Srivastava, S. (1999). The Big Fiv e trait taxonomy: History, measurement, and theoretical perspectives. In L. A. P ervin & O. P. J ohn (Eds.), Handbo ok of personality: Theory and research (2nd, pp. 102–138). Guilf ord Press. Keane, D., & McN aughto n, R. B. (2026). U sing generativ e ai to enhance psychometric scale development in market research. International Journal of Market Research , 68 (2), 194–218. Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., Zhou, X., W ang, E., & Dong, X. (2024). Better zero-shot reasoning with role-play prompting. Procee ding s of the 2024 Conf erence of the N orth A merican Chapter of the Association f or C omputational Ling uistics: H uman Languag e T echnologies (V olume 1: Long Papers) , 4099–4113. Li, J., & Huang, J.-S. (2020). Dimensions of artiﬁcial intelligence anxiety based on the integrated fear acquisition theory. T e chnology in society , 63 , 101410. Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., L ee, T., Leike, J., Schulman, J., Sutskever, I., & Cobbe, K. (2023). Let’s verif y step by step. Liu, A., Diab, M., & Fri ed, D. (2024). Evaluating large language model bia ses in persona-steered generation. F inding s of the Association f or Computational Ling uistics: A CL 2024 , 9832–9850. Liu, X., & Liu, Y. (2025). Developing and validating a scale of artiﬁcial intelligence anxiety among chinese eﬂ teachers. Eur opean Journal of Education , 60 (1), e12902. Martin Kowal, J., Hurley Bryant, K., S egall, D., & Kantrowitz, T. (2025). Harnessing generativ e ai for a ssessment item devel opment: Comparing ai-generated and human-authored items. International Journal of Selection and Assessment , 33 (3), e70021. Massara, G. P., Di Matteo, T., & Aste, T. (2016). Net work ﬁltering for big data: T riangulated maximally ﬁltered graph. Journal of complex N etworks , 5 (2), 161–178. OpenAI. (2024a). Api reference: Chat co mpletio ns. https://platform.openai.com/docs/api- reference/chat/create 38 Lara Russell-Lasalandra OpenAI. (2024b). What are tokens and how to count them? https://help.openai.com/en/arti cles/4936856- what- are- tokens- and- how- to- count- them Patel, D., Timsina, P., Raut, G., F reeman, R., l evin, M. A., Nadkarni, G. N., Glicksberg, B. S., & Klang, E. (2024). Exploring temperature eﬀects on large language models across vario us clinical tasks. medRxiv , 2024–07. P earson, K. (1901). On lines and planes of clo sest ﬁt to systems of points in space. The London, Edinburgh, and Dublin Philo- sophical Mag azine and Journal of Science , 2 (11), 559–572. P edersen, T. L. (2025). P atchwork: T he c omposer of plots [R package v ersio n 1.3.2]. https://doi.org/10.32614/CRAN.package. patchwork P eeperkorn, M., Kouwenho v en, T., Brown, D., & J ordanou s, A. (2024). Is temperature the creativit y parameter of l arge language models? arXiv preprint . R enze, M. (2024). The eﬀect of sampling temperature on problem solving in large l anguage models. F inding s of the association f or c omputational linguistics: EMNLP 2024 , 7346–7356. Russell-Lasalandra, L. L., Christensen, A. P., & Golino, H. (2024). Generativ e psychometrics via ai-genie: A utomati c item generatio n and v alidati on via network-integrated evaluatio n. P syArXiv Pr eprints . Russell-Lasalandra, L. L., & Golino, H. (2026). P rompt engineering f or scale development in gen erativ e psychometrics. arXiv preprint arXiv:2603.15909 . Shin, D., K w on, S. K., & L ee, Y. (2025). Examining the eﬃcacy of generative artiﬁcial intelligence in item generation: Comparativ e analysis of human-developed and ai-generated reading tests. Education and Inf ormation T echnolog ies , 30 (16), 23981–24007. T ao, C., Shen, T., Gao, S., Zhang, J., Li, Z., Hua, K., Hu, W., T ao, Z., & Ma, S. (2025). Llms are also eﬀectiv e embedding models: An in-depth ov ervi ew. T engler, K., & Brandhofer, G. (2025). Exploring the diﬀerence and qualit y of ai-generated versus human-written texts. Discover Education , 4 (1), 113. Ushey, K., A llaire, J., & T ang, Y. (2026). Reticulate: Interface to ’python’ [R package versi on 1.45.0]. https://rstudio.github.i o/ reticulate/ V aswani, A., Shazeer, N., P armar, N., Uszkoreit, J., J ones, L., Gomez, A. N., Kaiser, Ł., & P ol osukhin, I. (2017). Attentio n is all yo u need. A dv ances in neur al inf ormation pr ocessing systems , 30 . W ang, Y.-Y., & W ang, Y.-S. (2022). Development an d validati on of an artiﬁ cial intelligence anxiety scale: An initial appli- cation in predi cting motivated l earning behavi or. Inter active Learning Envir onments , 30 (4), 619–634. W ickham, H. (2016). Ggplot2: Eleg ant gr aphics for data anal ysis . Springer-V erl ag New Y ork. https://ggplot2.ti dyverse.org Zhang, B., Horvath, S., et al. (2005). A general framew ork for weighted gene co-expression network analysis. Statistical applications in g enetics and molecular biology , 4 (1), 1128. Zheng, M., P ei, J., Logeswaran, L., Lee, M., & Jurgens, D. (2024). When "a helpf ul assist ant" is not really helpf ul: Perso nas in system prompts do not improve performances of l arge l anguage models. F indings of the Association f or C omputational Linguistics: EMNLP 2024 , 15126–15154. Zhu, Y., Li, J., Li, G., Zhao, Y., Jin, Z., & Mei, H. (2024). Hot or cold? adaptive temperature sampling for code generati on with large langu age models. Pr ocee ding s of the AAAI C onf er ence on Artiﬁcial Intellig ence , 38 (1), 437–445.

The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment