When more precision is worse: Do people recognize inadequate scene representations in concept-based explainable AI?

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Explainable artificial intelligence (XAI) aims to help uncover flaws in an AI model’s internal representations. But do people draw the right conclusions from its explanations? Specifically, do they recognize an AI’s inability to distinguish between relevant and irrelevant features? In the present study, a simulated AI classified images of railway trespassers as dangerous or not. To explain which features it has used, other images from the dataset were shown that activate the AI in a similar way. These concept images varied in three relevant features (i.e., a person’s distance to the tracks, direction, and action) and in an irrelevant feature (i.e., scene background). When the AI uses a feature in its decision, this feature is retained in the concept images, otherwise the images randomize over it (e.g., same distance, varied backgrounds). Participants rated the AI more favorably when it retained relevant features. For the irrelevant feature, they did not mind in general, and sometimes even preferred it to be retained. This suggests that people may not recognize it when an AI model relies on irrelevant features to make its decisions.

💡 Research Summary

**
The paper investigates whether users can detect when a concept‑based explainable AI (XAI) system relies on irrelevant visual features, and how this influences their overall evaluation of the AI. The authors simulate an AI that classifies photographs of railway trespassers as “dangerous” or “not dangerous.” To explain its decisions, the AI presents five “concept images” from the same dataset that activate the model in a similar way. Four visual attributes are manipulated across these concept images: (1) the person’s distance to the tracks, (2) the direction the person faces relative to the tracks, (3) the person’s action (walking versus other actions), and (4) the scene background. Distance, direction, and action are deemed relevant to human judgments of danger, whereas background is considered irrelevant (a typical source of shortcut learning or dataset bias). For each attribute, the concept images either all match the classified image (“same”) or vary across the five images (“varied”).

A within‑subjects 2 × 2 × 2 × 2 factorial design (distance × direction × action × background) was combined with a binary “agreement” factor indicating whether participants thought the AI’s danger judgment was correct. Fifty‑nine participants (average age 28 years) viewed 128 trials, each consisting of a classified image, the AI’s decision, a binary correctness judgment, and finally a slider (0–100) rating the AI’s overall performance. Trial order was randomized per participant.

Statistical analysis employed a repeated‑measures ANOVA with the five factors. The main findings are:

Relevant features drive higher ratings when they are consistent. For distance, direction, and action, the “same” condition yielded significantly higher performance ratings than the “varied” condition (distance: 60.1 vs. 48.5, ηp² ≈ 0.46; direction: 58.4 vs. 50.2, ηp² ≈ 0.40; action: 57.4 vs. 51.2, ηp² ≈ 0.47). This confirms that participants reward the AI for preserving information that humans consider important for assessing danger.
Irrelevant background does not affect ratings. The mean rating for “same” background (55.0) versus “varied” background (53.6) was not statistically different (F(1,58)=2.80, p≈0.10, ηp²≈0.05). Thus, participants neither penalized nor rewarded the AI for keeping a non‑informative feature constant across concept images.
Agreement with the AI amplifies the benefit of consistency for relevant features. Significant interactions were found between agreement and distance (F=4.46, p=0.039), agreement and direction (F=19.10, p<0.001), and a three‑way interaction among distance, direction, and agreement (F=5.28, p=0.025). In other words, when participants thought the AI’s decision was correct, they were even more sensitive to whether the AI preserved relevant visual cues.
No interaction involving background. The background factor did not interact significantly with any other factor, reinforcing the notion that participants were largely indifferent to the presence or absence of a consistent irrelevant cue.

The authors interpret these results as evidence that current concept‑based XAI methods may fail to alert users to shortcut learning or over‑fitting on irrelevant cues. Because the repeated background can be perceived as a sign of “precision,” users may mistakenly infer that the AI has a robust, detailed understanding of the scene, even when the underlying model is actually relying on a spurious correlation.

The paper discusses several implications:

Design of explanations: Simply showing similar images is insufficient when the similarity includes non‑informative attributes. XAI designers should consider explicitly randomizing irrelevant features or providing meta‑information that flags which attributes are being used by the model.
Human‑AI trust calibration: Users appear to equate consistency with competence, especially when they already agree with the AI’s decision. This can lead to over‑trust in models that are, in fact, brittle.
Limitations: All stimuli were captured at a single railway testing facility, limiting the ecological validity of background variation. Participants had no prior knowledge of the AI’s internal workings, which may differ from expert users. Moreover, only one irrelevant attribute (background) was examined; other potential confounds such as lighting or color were not controlled.
Future work: Extending the paradigm to other domains (medical imaging, autonomous driving), incorporating multiple irrelevant cues, and comparing novice versus expert user groups would deepen our understanding of how explanation design influences bias detection.

In conclusion, the study demonstrates that while people readily reward AI for preserving relevant visual information, they are largely blind to the AI’s reliance on irrelevant features when those features are presented consistently. This finding cautions against assuming that concept‑based XAI automatically enhances model transparency; careful attention must be paid to how explanations are constructed so that users can correctly infer both the strengths and the hidden weaknesses of AI systems.

When more precision is worse: Do people recognize inadequate scene representations in concept-based explainable AI?

💡 Research Summary

Comments & Academic Discussion

Leave a Comment