How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?

Reading time: 5 minute
...

📝 Original Info

  • Title: How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
  • ArXiv ID: 2602.16253
  • Date: 2026-02-18
  • Authors: ** (논문에 명시된 저자 정보가 제공되지 않아 “저자 미상”으로 표기합니다.) **

📝 Abstract

Anomalous sound detection (ASD) benchmarks typically assume that the identity of the monitored machine is known at test time and that recordings are evaluated in a machine-wise manner. However, in realistic monitoring scenarios with multiple known machines operating concurrently, test recordings may not be reliably attributable to a specific machine, and requiring machine identity imposes deployment constraints such as dedicated sensors per machine. To reveal performance degradations and method-specific differences in robustness that are hidden under standard machine-wise evaluation, we consider a minimal modification of the ASD evaluation protocol in which test recordings from multiple machines are merged and evaluated jointly without access to machine identity at inference time. Training data and evaluation metrics remain unchanged, and machine identity labels are used only for post hoc evaluation. Experiments with representative ASD methods show that relaxing this assumption reveals performance degradations and method-specific differences in robustness that are hidden under standard machine-wise evaluation, and that these degradations are strongly related to implicit machine identification accuracy.

💡 Deep Analysis

📄 Full Content

Anomalous sound detection (ASD) aims to detect unusual sounds and is widely used in machine condition monitoring. The DCASE Challenge [1] has become a central benchmark for ASD. Recent editions emphasize robustness to domain shifts [2]- [6], motivating extensive work on domain adaptation and generalization [7]. Despite these advances, current ASD evaluations rely on an implicit assumption: Test recordings are associated with a known machine identity and evaluated in a machine-wise manner. While the first-shot setting [4]- [6] reduces overly optimistic evaluation by enforcing disjoint machine types between system development and evaluation, it still assumes access to machine identity at test time.

This assumption does not always hold in practice. In realistic monitoring scenarios, multiple known machines may operate concurrently, and test recordings cannot always be reliably attributed to a specific machine. From an operator perspective, it is therefore desirable for a system to identify which machine behaves abnormally, rather than assuming that machine identity is given. One alternative is to use microphone arrays and sound source localization to infer the originating machine, but such solutions increase cost, system complexity, and installation effort, and are often impractical in existing industrial settings. Moreover, relying on dedicated sensing infrastructure for each machine limits flexibility when machine layouts or inventories change. Evaluating ASD without machine identity therefore supports more scalable and reusable monitoring systems that can be deployed across different factories and operating conditions.

These practical constraints are not reflected in current evaluation protocols. Many approaches rely on machine-specific models [8], machine-specific operating-condition information [9], [10], or machine-specific test-set statistics for anomaly score normalization [11]. All of these require access to machine identity or delayed aggregation of test data. Such assumptions may not hold in realistic monitoring scenarios, causing standard evaluations to hide robustness differences that only emerge when they are violated.

Throughout this work, we consider the same scope assumptions as current DCASE benchmarks: single-channel recordings that contain sounds from a single machine, and a fixed, known set of machines. The only assumption that is relaxed is the availability of machine identity at test time. Test recordings from multiple known machines are merged into a single test set and evaluated jointly, while training data and evaluation metrics remain unchanged. Machine identity labels are used only for post hoc evaluation. Rather than proposing new detection models, our goal is to analyze how representative ASD methods behave when this single evaluation assumption is relaxed. The contributions of this work are:

• We make explicit the implicit assumption of known machine identity in current ASD evaluation protocols; • We propose a minimal evaluation protocol that removes machine identity at inference time while keeping all other aspects unchanged; • We empirically show that relaxing this assumption exposes method-specific robustness differences and link anomaly detection performance degradation to implicit machine identification accuracy, quantified via an auxiliary post hoc identification task.

We briefly summarize the standard ASD formulation and evaluation protocol in the DCASE Challenge to make explicit the assumptions relevant to this work and position our setting relative to existing benchmarks.

In acoustic machine condition monitoring, ASD aims to detect sounds that deviate from a machine’s normal operat-arXiv:2602.16253v1 [eess.AS] 18 Feb 2026 ing behavior, typically under the assumption that anomalous examples are rare or unavailable during training. The DCASE Challenge on unsupervised ASD [2]- [6], [12] has become the de facto benchmark for this task.

In the standard DCASE setting, datasets are organized into development and evaluation splits, each containing machinespecific training and test data. Training data consist exclusively of normal recordings from individual machines and are associated with machine IDs. At evaluation time, test recordings are grouped by machine, include both normal and anomalous samples, and performance is reported using machine-wise metrics. Models are therefore trained either separately for each machine or using a shared model with access to machine identity at inference time.

An implicit but central assumption in this protocol is that machine identity is known at test time. Each test recording is assumed to be associated with a specific machine ID, and anomaly scores are computed and evaluated within machinewise partitions. While this assumption simplifies both modeling and evaluation, it does not necessarily hold in practical monitoring scenarios, where multiple known machines may operate concurrently and recordings cannot always be reliably attributed

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut