PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

Reading time: 1 minute
...

📝 Original Info

  • Title: PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
  • ArXiv ID: 2510.12409
  • Date: 2025-10-14
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (원문에 저자 리스트가 포함되지 않음) **

📝 Abstract

We present PricingLogic, the first benchmark that probes whether Large Language Models(LLMs) can reliably automate tourism-related prices when multiple, overlapping fare rules apply. Travel agencies are eager to offload this error-prone task onto AI systems; however, deploying LLMs without verified reliability could result in significant financial losses and erode customer trust. PricingLogic comprises 300 natural-language questions based on booking requests derived from 42 real-world pricing policies, spanning two levels of difficulty: (i) basic customer-type pricing and (ii)bundled-tour calculations involving interacting discounts. Evaluations of a line of LLMs reveal a steep performance drop on the harder tier,exposing systematic failures in rule interpretation and arithmetic reasoning.These results highlight that, despite their general capabilities, today's LLMs remain unreliable in revenue-critical applications without further safeguards or domain adaptation. Our code and dataset are available at https://github.com/EIT-NLP/PricingLogic.

💡 Deep Analysis

📄 Full Content

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut