Grok in the Wild: Characterizing the Roles and Uses of Large Language Models on Social Media

Grok in the Wild: Characterizing the Roles and Uses of Large Language Models on Social Media
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

xAI’s large language model, Grok, is called by millions of people each week on the social media platform X. Prior work characterizing how large language models are used has focused on private, one-on-one interactions. Grok’s deployment on X represents a major departure from this setting, with interactions occurring in a public social space. In this paper, we systematically sample three months of interaction data to investigate how, when, and to what effect Grok is used on X. At the platform level, we find that Grok responds to 62% of requests, that the majority (51%) are in English, and that engagement is low, with half of Grok’s responses receiving 20 or fewer views after 48 hours. We also inductively build a taxonomy of 10 roles that LLMs play in mediating social interactions and use these roles to analyze 41,735 interactions with Grok on X. We find that Grok most often serves as an information provider but, in contrast to LLM use in private one-on-one settings, also takes on roles related to dispute management, such as truth arbiter, advocate, and adversary. Finally, we characterize the population of X users who prompted Grok and find that their self-expressed interests are closely related to the roles the model assumes in the corresponding interactions. Our findings provide an initial quantitative description of human-AI interactions on X, and a broader understanding of the diverse roles that large language models might play in our online social spaces.


💡 Research Summary

This paper presents a systematic, three‑month observational study of Grok, xAI’s large language model (LLM), as it is publicly invoked on the social media platform X (formerly Twitter). Unlike prior work that focuses on private, one‑to‑one chat logs, Grok operates in a public, many‑to‑many environment, allowing the authors to examine how millions of users interact with an LLM in real‑world social discourse.

Data collection leveraged X’s official APIs. From August 15 to November 17 2025 the authors gathered 41,735 “conversation chains” comprising 142,895 posts. Each chain includes a user’s @grok prompt, Grok’s reply, and, when present, the immediate parent post and the root of the thread, preserving contextual information. Parallelly, the recent‑tweet‑counts endpoint supplied aggregate metrics on total @grok mentions, Grok replies, and language distribution. User profile data for 31,111 distinct accounts were also retrieved.

Key quantitative findings: Grok responded to 62 % of all @grok mentions; 51 % of prompts were in English, with the next 10 languages accounting for the remainder. Engagement with Grok’s replies was low—half of the replies received 20 or fewer views, likes, retweets, or replies within 48 hours.

To move beyond raw counts, the authors performed a mixed‑methods analysis. First, a manual content coding of 500 samples identified four primary use categories: information provision (51 %), fact‑checking (21.7 %), opinion/advice seeking (12.8 %), and creative/generative tasks (12.4 %). These categories were then used to train an LLM‑based classifier that labeled the full dataset.

Building on role theory, the authors inductively derived a taxonomy of ten social roles that Grok assumes on X. The most frequent role is “Information Provider,” but several dispute‑management roles emerge prominently: “Truth Arbiter” (deciding contested facts), “Advocate” (supporting a user’s position), “Adversary” (taking an opposing stance), and “Platform Insider” (supplying detailed knowledge about X itself). The analysis shows that fact‑checking tends to occur deeper in conversation threads, whereas opinion/advice and creative requests appear near the thread’s start, suggesting that users enlist Grok as a mediator during ongoing debates and as a brainstorming partner at the outset.

User‑level analysis reveals that Grok’s callers are highly active: over 75 % have posted at least 1,000 times and have accounts older than 1.6 years. Topic modeling of user bios indicates that expressed interests (e.g., politics, technology, culture) correlate with the roles Grok plays in their interactions. Politically engaged users more often invoke Grok as a “Truth Arbiter” or “Advocate,” while tech‑oriented users lean toward “Information Provider.”

The paper acknowledges limitations: API rate limits forced a stratified hourly sampling that may introduce temporal bias; about 15 % of posts were deleted before the 48‑hour engagement snapshot, potentially skewing role frequencies; and low engagement metrics do not capture subtler influences such as decision‑making impact.

Overall, the study provides the first large‑scale quantitative portrait of LLM behavior in a public social media setting. It demonstrates that LLMs can function not only as knowledge engines but also as active participants in social negotiation, opinion shaping, and even adversarial discourse. These findings have direct implications for AI ethics, platform governance, and the design of policies governing LLM deployment in public communication spaces. Future work is suggested to track longitudinal engagement dynamics, incorporate user surveys on perceived AI roles, and explore mitigation strategies for potential misuse in dispute contexts.


Comments & Academic Discussion

Loading comments...

Leave a Comment