Pet-Bench: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

As interest in using Large Language Models for interactive and emotionally rich experiences grows, virtual pet companionship emerges as a novel yet underexplored application. Existing approaches focus on basic pet role-playing interactions without systematically benchmarking LLMs for comprehensive companionship. In this paper, we introduce Pet-Bench, a dedicated benchmark that evaluates LLMs across both self-interaction and human-interaction dimensions. Unlike prior work, Pet-Bench emphasizes self-evolution and developmental behaviors alongside interactive engagement, offering a more realistic reflection of pet companionship. It features diverse tasks such as intelligent scheduling, memory-based dialogues, and psychological conversations, with over 7,500 interaction instances designed to simulate pet behaviors. Evaluation of 28 LLMs reveals significant performance variations linked to model size and inherent capabilities, underscoring the need for specialized optimization in this domain. Pet-Bench serves as a foundational resource for benchmarking pet-related LLM abilities and advancing emotionally immersive human-pet interactions.

💡 Research Summary

As Large Language Models (LLMs) continue to evolve, the potential for creating emotionally intelligent virtual companions, such as “E-Pets” in Social Network Services (SNS), has become a significant area of interest. However, traditional LLM evaluation frameworks are predominantly designed to measure cognitive tasks like reasoning, coding, and mathematical problem-solving, failing to capture the nuanced, emotional, and autonomous behaviors required for true companionship. To bridge this gap, this paper introduces “Pet-Bench,” a specialized benchmark designed to evaluate the capabilities of LLMs acting as virtual pets.

The core innovation of Pet-Brech lies in its dual-dimensional evaluation framework: Self-interaction and Human-interaction. Unlike previous benchmarks that focus on static role-playing, Pet-Bench emphasizes “self-evolution” and “developmental behaviors.” In the Self-interaction dimension, the benchmark assesses the model’s ability to simulate an autonomous life, including “intelligent scheduling”—where the model manages its own routines—and the ability to evolve its persona over time, mimicking the growth of a living creature.

In the Human-interaction dimension, the focus shifts to the quality of emotional engagement. This includes “memory-based dialogues,” which test the model’s ability to maintain long-term relational consistency by recalling past interactions, and “psychological conversations,” which evaluate the model’s capacity for empathy and emotional support. To ensure a rigorous evaluation, the researchers developed a dataset of over 7,500 interaction instances specifically designed to simulate realistic pet-like behaviors and emotional exchanges.

The study conducted an extensive evaluation of 28 different LLMs, revealing that while model scale and inherent linguistic capabilities significantly impact performance, a larger model does not automatically equate to a superior virtual pet. The findings suggest that while larger models possess the necessary foundation for understanding complex emotional contexts, specialized optimization and domain-specific fine-tuning are essential to achieve the unique persona, emotional consistency, and autonomous traits required for high-quality pet companionship.

Ultimately, Pet-Bench serves as a foundational resource for the AI community, providing a standardized metric to advance the development of emotionally immersive and autonomous digital companions. This work paves the way for the next generation of social AI, where LLMs can move beyond being mere information providers to becoming meaningful, interactive, and evolving digital companions in our daily lives.

Pet-Bench: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services

💡 Research Summary

Comments & Academic Discussion

Leave a Comment