Matched Filter-Based Molecule Source Localization in Advection-Diffusion-Driven Pipe Networks with Known Topology

Matched Filter-Based Mole cule Source Localization in Adv ection-Diusion-Driven Pipe Networks with Known T opology Timo Jakumeit 1, * , Bastian Heinlein 1,2, * , V ukašin Spasojević 1 , V ahid Jamali 2 , Robert Schober 1 , and Maximilian Schäfer 1 1 Friedrich- Alexander-Universität Erlangen-Nürnberg (F AU), Erlangen, Germany 2 T echnical University of Darmstadt, Darmstadt, Germany Abstract Synthetic molecular communication ( MC ) has emerged as a pow- erful framework for modeling, analyzing, and designing commu- nication systems where information is encoded into properties of molecules. Among the envisione d applications of MC is the localization of molecule sources in pipe networks (PNs) like the human cardiovascular system ( CVS ), sewage networks (SNs), and industrial plants. While existing algorithms mostly focus on sim- plied scenarios, in this paper , we propose the rst framework for source localization in complex PNs with known top ology , by leveraging the mixture of inverse Gaussians for hemodynamic trans- port ( MIGH T ) model as a closed-form representation for advection- diusion-driven MC in PNs. W e propose a matched lter ( MF )- based approach to identify molecule sources under realistic condi- tions such as unknown release times, random numbers of released molecules, sensor noise, and limited sensor sampling rate . W e apply the algorithm to localize a source of viral markers in a real-world SN and show that the proposed scheme outperforms randomly guess- ing sources even at low signal-to-noise ratios (SNRs) at the sensor and achieves error-fr ee localization under favorable conditions, i.e., high SNRs and sampling rates. Furthermore , by identifying clusters of frequently confused sources, reliable cluster-level localization is possible at substantially lower SNRs and sampling rates. 1 Introduction In the past years, synthetic mole cular communication ( MC ) has emerged as a p owerful framework for mo deling, analyzing, and designing information exchange in systems where molecules act as information carriers, and numerous innovative applications within the human body [ 1 ], between plants [ 2 ], or in industrial environ- ments have been envisioned [3–5]. One application that has attracted sustained interest is anomaly localization in MC systems [ 6 ]. Espe cially molecule source local- ization , i.e., the estimation of the location of a transmitter ( T x ) emitting signaling molecules within an MC system, has proven particularly relevant for various application scenarios. In biomed- ical applications, source localization shall enable the localization of disease d tissue within the human body or more precisely , the * Both authors contributed equally to this work. This work was funded in part by the German Federal Ministry of Research, T echnology and Space (BMFTR) through Project Internet of Bio-Nano-Things (IoBN T) – grant numbers 16KIS1987 and 16KIS1992, in part by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) under GRK 2950 – ProjectID 509922606 and under grant number SCHA 2350/2-1, in part by the European Union’s Horizon Europe – HORIZON-EIC-2024-P A THFINDEROPEN-01 under grant agreement Project N. 101185661, and in part by the Horizon Europe Marie Skodowska Curie Actions (MSCA)-UNITE under Project 101129618. cardiovascular system ( CVS ), with the potential to identify patholo- gies much earlier than classical medical imaging techniques [ 1 ]. In epidemiological surveillance, source localization based on molecu- lar signals in sewage networks (SNs) can support the early detection and spatial attribution of infections, oering a cost-eective alterna- tive to population-wide testing [ 7 ]. In industrial systems, molecule source localization could provide a complementary approach for identifying leaks or contaminant releases in pipe networks (PNs) by exploiting concentration measurements of sp ecic chemical species. The aforementioned application domains span dierent size scales (see Fig. 1a)) but share a common structural characteristic: they can b e represented as PNs, i.e., networks of interconnected con- duits through which molecules are transported by owing liquids or gases. In the CVS , blood vessels form PNs carrying biomarkers, nutrients, and metabolic waste. In SNs, pip es transport wastewater and potential infection markers. In industrial gas or uid networks, pipes convey process media, while released chemicals, pollutants, or tracers propagate through the PN and are detectable downstream. PNs thus represent a particularly important class of environments for molecule source localization. Howev er , the structural features of PNs induce complex molecule transport dynamics. Therefore, accurate molecule source localization in PNs requires mathemati- cal models that explicitly account for advection, diusion, and the underlying network topology [8]. Most of the existing literature on molecule source localization in MC is focused on unbounde d diusion environments, and only a limited number of studies have considered source localization in single pipes or individual pip e branchings, where mole cule propagation is governed by adv ection and diusion. In [ 9 ], a one- dimensional ( 1-D ) analytical source localization approach for a single pip e is presented, where the source (i.e., T x ) p osition is inferred via T x -receiver ( Rx ) distance estimation. In [ 10 ], a 1-D analytical framework for co operative abnormality detection and localization is proposed, in which mobile sensors propagate in a single pipe and release signaling molecules upon sensing an ab- normality . In [ 11 ], a learning-based distance estimation approach for branched MC systems is proposed, where multiple T xs emit molecules that propagate by advection-only through a simple Y- shaped pipe topology towards the Rx . Recurrent neural networks are used to infer T x-Rx distances from the received signal. In [12], a learning-based framework for infection source lo calization in the CVS is proposed. Using the BloodV oyagerS simulator , biomarkers released by static infection sources propagate through a PN of the major arteries via advection and subject to chemical degradation. Localization is formulated as a multi-class classication problem Jakumeit and Heinlein, et al. based on r eceived biomarker signals using simulation data and stacked ensemble learning. In summary , existing works on molecule source localization in MC predominantly consider highly simplied channel topologies and transport dynamics. Most approaches are limited to diusion- only or advection-only models, with only a few studies accounting for their combined eects. Moreover , no existing works provide an- alytically tractable and physically interpretable localization frame- works that are applicable to PNs across dierent scales and dierent application scenarios. T o overcome the drawbacks of e xisting ap- proaches, we propose the rst molecule source localization frame- work for advective-diusive MC in PNs. Our approach builds on the mixture of inverse Gaussians for hemodynamic transport ( MIGH T ) model [ 13 ], which provides a closed-form description of molecule transport in PNs and is extended here to explicitly account for key system uncertainties, including unknown release times and quan- tities, sensor noise, and dierent sampling rates at the Rx . Based on this model, we develop a matched lter ( MF )-based localization algorithm and demonstrate that clustering frequently confused Txs can signicantly enhance lo calization performance on a cluster- level. The eectiveness of the proposed framework is illustrated by applying it for the localization of a source of viral markers in a real-world SN in the Zolkiewka Commune in Poland, highlighting its practical relevance and applicability . The main contributions of this work are as follows: • W e extend the deterministic MIGH T channel model from [ 13 ] to account for log-normal-distributed random release quantities at the T xs and sensor noise at the Rx. • W e propose an MF -based algorithm for molecule source localization in advection-diusion-driven PNs of arbitrary sizes and structural complexities. • W e introduce methods for the clustering of commonly con- fused Txs to impr ove cluster-level localization accuracy . • W e illustrate the accuracy of the proposed lo calization framework for dierent sensor sampling rates by its ap- plication for source localization in a real-world SN. The remainder of the paper is organize d as follows: Section 2 in- troduces the system model, reviews the MIGH T model, and presents the Rx noise model. Se ction 3 presents the MF -based localization ap- proach and the T x clustering metho d. Section 4 discusses numerical results, and Section 5 concludes the paper . 2 System and Channel Mo del In the following, we rst dene PNs, then describe the assumptions on the molecule sources, i.e., T xs, including the randomness of the number of released molecules, and the advective-diusive transp ort. Subsequently , we review the MIGH T channel model. Finally , we discuss the Rx model and the associated sensor noise. 2.1 Pipe Network Denition T o model molecule transp ort in PNs, we adopt the denition from [ 13 ] and approximate the PN topology using three segment typ es 1 : (1) Pipe: A pipe 𝑝 𝑖 is a cylindrical conduit transp orting uid from its inlet to its outlet, with length 𝑙 𝑖 and radius 𝑟 𝑖 . Pipes 1 Please refer to Fig. 2a) in [13] for a visual illustration of the segment types. connect to other pipes, bifurcations, or junctions. The PN contains 𝐸 pipes. (2) Bifurcation: A bifurcation 𝑏 𝑚 is a connection where one or more inow pipe(s) split(s) into multiple outo w pipes. The set of its outow pipes is denoted by O ( 𝑏 𝑚 ) . A PN contains 𝐵 bifurcations. (3) Junction: A junction is a conne ction where multiple inow pipes merge into one outow pipe. Bifurcations, junctions, inlet(s), outlet(s), and any connection point are modeled as nodes. W e dierentiate between three node types: (1) Inlet node: Inlet nodes 𝑛 in , 𝑎 ∈ N in = { 𝑛 in , 1 , . . . , 𝑛 in ,𝐼 } with 𝐼 ∈ N exist at the points of the PN where uid ow is introduced into the PN. Here, N denotes the set of natural numbers. (2) Outlet node: Outlet nodes 𝑛 out , 𝑏 ∈ N out = { 𝑛 out , 1 , . . . , 𝑛 out , 𝑂 } with 𝑂 ∈ N exist at the points of the PN where uid ow leaves the PN. (3) Connecting node: All other 𝐶 ∈ N points in the PN where pipes are connected to one another are referred to as con- necting nodes 𝑛 𝑖 ∈ N con = { 𝑛 1 , . . . , 𝑛 𝐶 } . Pipes ar e represented as directed edges between nodes, aligned with the uid ow direction, determined in Se ction 2.3. For any node type, the no de at the inlet and outlet of a pipe 𝑝 𝑖 , i.e., the source and destination node , is denoted by S ( 𝑝 𝑖 ) and D ( 𝑝 𝑖 ) , respectively 2 . The representation base d on no des and directe d edges allows any PN to be described as a directed multigraph. The set of all distinct directe d paths between a given (inlet/connecting) node 𝑛 𝑎 ∈ N in ∪ N con and another (connecting/outlet) node 𝑛 𝑏 ∈ N con ∪ N out is denoted by P ( 𝑛 𝑎 , 𝑛 𝑏 ) . Each path 𝑃 𝑘 comprises a subset of pipes and bifurcations given by 𝑃 𝑘 = { 𝑝 𝑖 | 𝑖 ∈ E 𝑘 } ∪ { 𝑏 𝑚 | 𝑚 ∈ B 𝑘 } , (1) where E 𝑘 ⊆ { 1 , . . . , 𝐸 } and B 𝑘 ⊆ { 1 , . . . , 𝐵 } are the index sets of the pipes and bifurcations 3 included in 𝑃 𝑘 . Any path must contain at least two pipes, i.e., | E 𝑘 | > 1 , where | · | denotes the cardinality . 2.2 Molecule Sources W e assume that there are 𝑈 T xs. Any T x 𝑔 ∈ N Tx = { Tx 1 , . . . , Tx 𝑈 } is a zero-dimensional ( 0-D ) p oint at position 𝑧 Tx 𝑔 ∈ [ 0 , 𝑙 𝑖 ] in pip e 𝑝 𝑖 . W e denote the source node of pipe 𝑝 𝑖 containing T x 𝑔 as S Tx ( T x 𝑔 ) . Moreover , w e assume that each T x 𝑔 can impulsively release 𝑀 𝑔 molecules. Here, the number of released mole cules 𝑀 𝑔 is mod- eled as a random variable ( RV ) with mean ¯ 𝑀 𝑔 , realization 𝑚 𝑔 , and probability density function ( PDF ) 𝑓 𝑀 𝑔 ( 𝑚 𝑔 ) , chosen according to the application scenario (see Section 4). 2.3 Advective-Diusiv e Molecule T ransport In many practical PNs, including SNs, the CVS , and gas pipelines, see Fig. 1a), mole cule transport is gov erned primarily by advec- tion, i.e., transp ort with the bulk uid motion, and diusion, see Fig. 1b), encompassing both mole cular and turbulent dispersive ef- fects. W e summarize the advective-diusive transport assumptions underlying the proposed model below . 2 Please refer to Figs. 1a) and 1c) in [13] for a visual illustration of the notation. 3 Potential bifurcations at 𝑛 𝑎 or 𝑛 𝑏 are e xcluded from the set in (1) , as molecules in any path from 𝑛 𝑎 and 𝑛 𝑏 do not travel through 𝑛 𝑎 or 𝑛 𝑏 , but start at 𝑛 𝑎 and end at 𝑛 𝑏 . Matched Filter-Based Molecule Source Localization in Advection-Diusion-Driven Pipe Networks with Known T op ology a) PNs acr oss scales V asculatur e 1 cm Pip elines 1 m Se wage netw orks 1 km b) (Multi- )graph r epr esentation A ctiv e T x Inlet no de & T x Conne cting no de Outlet no de & Rx Pip e Mole cule ¯ 𝑢 𝑖 𝐷 c) Matche d lter-base d lo calization  𝑦 max 𝑖 ∗  2 ˆ 𝜎 2 > 20? 𝑣 T x 1 [ 𝑘 ] . . . . . . . . . 𝑣 T x 𝑈 [ 𝑘 ] ℎ LP [ 𝑘 ] 1 max 𝑘 ( · ) × max 𝑘 ( · ) max 𝑘 ( · ) 𝑖 ∗ = arg min 𝑖 ∈ { 1 , . . . , 𝑈 } | 𝑦 max 𝑖 − 1 | Op erator Filter 𝑦 max 1 𝑦 max 𝑈 𝑖 ∗ Y es T x 𝑖 ∗ No T erminate 𝑘 𝑘 𝑘 𝑘 1 𝑘 1 𝑘 𝑘 𝑟 [ 𝑘 ] Figure 1: System mo del. a) Advection-diusion-driven PNs across dierent scales. b) (Multi-)graph representation of a PN . c) MF bank-based molecule source localization algorithm, exploiting the MIGH T mo del for MF design. Each inlet node 𝑛 in , 𝑎 ∈ N in is assigne d a ow rate 𝑄 in , 𝑎 > 0 , inducing ow in each pipe 𝑝 𝑖 in the PN , characterized by the ow rate 𝑄 𝑖 and the cross-sectional average velocity ¯ 𝑢 𝑖 = 𝑄 𝑖 / ( 𝜋 𝑟 2 𝑖 ) as determined using an equivalent electrical circuit mo del [ 13 ]. Additionally , molecules propagate via diusion, characterized by the eective diusion coecient in pipe 𝑝 𝑖 [14, Eq. (26)] ¯ 𝐷 𝑖 = 𝑟 2 𝑖 ¯ 𝑢 2 𝑖 48 𝐷 + 𝐷 , (2) where 𝐷 denotes the diusion coecient. 2.4 MIGH T Channel Mo del In this work, we adopt MIGH T [ 13 ] to model the PN . MIGH T ex- ploits that the rst passage times ( FPT s) of molecules in advective- diusive channels follow an inv erse Gaussian ( IG ) distribution, and yields tractable, closed-form expressions for molecule transport in PNs, making it well-suited for applications that require a mathe- matically w ell-behaved framework, such as source localization [ 13 ]. Below , we briey introduce the core equations of MIGH T. 2.4.1 Path Molecule Flux. Let T x 𝑔 ∈ N Tx be at position 𝑧 Tx 𝑔 in pipe 𝑝 𝑞 and let the Rx be at p osition 𝑧 Rx in pipe 𝑝 𝑤 . Moreover , any path contained in { 𝑃 𝑘 ∈ P ( S Tx ( T x 𝑔 ) , D ( 𝑝 𝑤 ) ) | 𝑝 𝑞 , 𝑝 𝑤 ∈ 𝑃 𝑘 } leads from T x 𝑔 to the Rx and contains pipes 𝑝 𝑞 and 𝑝 𝑤 . Then, the molecule ux observed at the Rx position 𝑧 Rx in pipe 𝑝 𝑤 due to the release of 𝑚 𝑔 = 1 molecule at T x 𝑔 follows an IG distribution and is obtained in s − 1 as [13, Eqs. (8), (14), (15), (17)] ¯ 𝑗 𝑘 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑔 ) = ¯ 𝜇 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 )  2 𝜋 ¯ 𝜃 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) 𝑡 3 exp  − ( 𝑡 − ¯ 𝜇 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) ) 2 2 ¯ 𝜃 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) 𝑡  , (3) with mean, variance, and scale parameter of path 𝑃 𝑘 given by [ 13 , Eq. (15), (16), (17)] ¯ 𝜇 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) = 𝜇 𝑞 ( 𝑙 𝑞 − 𝑧 Tx 𝑔 ) + 𝜇 𝑤 ( 𝑧 Rx ) +  𝑖 ∈ E 𝑘 \ { 𝑞,𝑤 } 𝜇 𝑖 ( 𝑙 𝑖 ) , (4) ¯ 𝜎 2 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) = 𝜎 2 𝑞 ( 𝑙 𝑞 − 𝑧 Tx 𝑔 ) + 𝜎 2 𝑤 ( 𝑧 Rx ) +  𝑖 ∈ E 𝑘 \ { 𝑞,𝑤 } 𝜎 2 𝑖 ( 𝑙 𝑖 ) , (5) ¯ 𝜃 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) = ¯ 𝜎 2 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) ¯ 𝜇 𝑘 ( 𝑧 Rx ; 𝑧 Tx 𝑔 ) , (6) and mean and variance of pipe 𝑝 𝑖 given by [13, Eq. (7)] 𝜇 𝑖 ( 𝑧 𝑖 ) = 𝑧 𝑖 ¯ 𝑢 𝑖 , 𝜎 2 𝑖 ( 𝑧 𝑖 ) = 2 ¯ 𝐷 𝑖 𝑧 𝑖 ¯ 𝑢 3 𝑖 , (7) where 𝑧 𝑖 ∈ [ 0 , 𝑙 𝑖 ] denotes the longitudinal position within pip e 𝑝 𝑖 . 2.4.2 Channel Impulse Response. Given the path molecule ux in (3) , the channel impulse response ( CIR ) between T x 𝑔 and position 𝑧 Rx in pipe 𝑝 𝑤 in s − 1 is obtained by summing up the weighted path uxes of all paths between Tx 𝑔 and the Rx [13, Eq. (19)], i.e., ℎ Tx 𝑔 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑔 ) =  { 𝑃 𝑘 ∈ P ( S Tx ( T x 𝑔 ) , D ( 𝑝 𝑤 ) ) | 𝑝 𝑞 ,𝑝 𝑤 ∈ 𝑃 𝑘 } 𝛾 𝑃 𝑘 ¯ 𝑗 𝑘 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑔 ) , (8) where the fraction of molecules 𝛾 𝑃 𝑘 propagating through path 𝑃 𝑘 is given as [1, Eq. (22)] 𝛾 𝑃 𝑘 =  𝑝 𝑖 , 𝑏 𝑚 ∈ 𝑃 𝑘 , 𝑝 𝑖 ∈ O ( 𝑏 𝑚 ) 𝑄 𝑖  𝑝 𝑣 ∈ O ( 𝑏 𝑚 ) 𝑄 𝑣 . (9) 2.5 Receiver Model In this work, we focus on localization based on the signal received at a single sensor , i.e., Rx . W e model the sensor as a transparent molecule counting Rx positioned in pipe 𝑝 𝑤 and characterized by its length 𝑙 Rx ∈ ( 0 , 𝑙 𝑤 ] and its longitudinal center p osition 𝑧 Rx ∈ [ 0 + 𝑙 Rx / 2 , 𝑙 𝑤 − 𝑙 Rx / 2 ] . Then, assuming the advective ux in pipe 𝑝 𝑤 dominates the diusive ux, the expecte d number of molecules at the Rx due to a release of 𝑀 𝑔 molecules at T x 𝑔 follows as [ 13 , Eqs. (23), (24)] 𝑁 Rx , T x 𝑔 ( 𝑡 ) = 𝑀 𝑔 ¯ 𝑢 𝑤  𝑧 Rx + 𝑙 Rx / 2 𝑧 Rx − 𝑙 Rx / 2 ℎ Tx 𝑔 ,𝑤 ( 𝑧 𝑤 , 𝑡 ; 𝑧 Tx 𝑔 ) d 𝑧 𝑤 ≈ 𝑀 𝑔 𝑙 Rx ¯ 𝑢 𝑤 ℎ Tx 𝑔 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑔 ) , (10) where the latter approximation is valid under the uniform concen- tration assumption (UCA). In practice, the Rx does not have access to time-continuous molecule counts 𝑁 Rx , T x 𝑔 ( 𝑡 ) . Instead it obtains every 𝑇 s seconds a noisy sample of 𝑁 Rx , T x 𝑔 ( 𝑡 ) , which is inherently band-limited. In this paper , we neglect channel-induced counting noise in (11) , as 𝑁 Rx , T x 𝑔 ( 𝑡 ) will be typically very large and thus counting noise will be negligible [ 8 ]. Howev er , we assume that the 𝑘 -th sample is analyzed by a sensor which is subje ct to noise 𝑁 s [ 𝑘 ] , where 𝑁 s [ 𝑘 ] Jakumeit and Heinlein, et al. is modele d as zero-mean additive white Gaussian noise ( A W GN ) with variance 𝜎 2 s . Considering this model, where the sensor noise power is independent of the sampling fr equency 𝑓 s = 1 / 𝑇 s , we arrive at the following expr ession for the sensor response 𝑟 [ 𝑘 ] = 𝑁 Rx , T x 𝑔 ( 𝑘𝑇 s ) + 𝑁 s [ 𝑘 ] : = 𝑁 Rx , T x 𝑔 [ 𝑘 ] + 𝑁 s [ 𝑘 ] . (11) 3 Molecule Source Localization W e consider source lo calization in PNs with multiple Txs and a single Rx , where exactly one active T x , T x 𝑔 , emits molecules in a single impulsive release 4 , see Fig. 1b). While the PN topology and all T x positions are known, the active T x 𝑔 is unkno wn , and lo calization is dened as identifying it based on the received signal in (11). 3.1 Matched Filter Bank Algorithm W e propose an MF -based algorithm that aims to distinguish b e- tween individual T xs based on the r eceived signal by e xploiting the known CIR in (8) between any T x and the Rx, see Fig. 1c). 3.1.1 Signal Pre-Processing. For a single active T x 𝑔 releasing 𝑀 𝑔 molecules, the discrete expected number of molecules arriving at the Rx in sampling interval 𝑘 is given as 𝑁 Rx , T x 𝑔 [ 𝑘 ] , see (10) . Since 𝑀 𝑔 is an RV whose realization is unknown at the Rx and may vary over sev eral orders of magnitude [ 15 , 16 ], the amplitude of the re- ceived signal in (11) cannot be reliably exploited for lo calization, se e Figs. 3c) and 3d). Normalizing the received signal by its maximum value helps to mitigate this problem, but due to the A W GN at the Rx , it is not possible to kno w the exact signal maximum. Therefor e, the received signal is rst ltered with a low-pass ( LP ) lter ℎ LP [ 𝑘 ] , whose cut-o frequency is chosen based on the frequency contents of the known CIRs, resulting in the ltered received signal 𝑟 LP [ 𝑘 ] = 𝑟 [ 𝑘 ] ∗ ℎ LP [ 𝑘 ] (11) = 𝑁 Rx , T x 𝑔 [ 𝑘 ] ∗ ℎ LP [ 𝑘 ]    : = 𝑁 Rx , T x 𝑔 , LP [ 𝑘 ] + 𝑁 s [ 𝑘 ] ∗ ℎ LP [ 𝑘 ]    : = 𝑁 s , LP [ 𝑘 ] , (12) where ∗ denotes convolution with r espect to time, and 𝑁 Rx , T x 𝑔 , LP [ 𝑘 ] and 𝑁 s , LP [ 𝑘 ] are the LP -ltered signal and sensor noise components of 𝑟 LP [ 𝑘 ] , respectively . Afterwards, 𝑟 LP [ 𝑘 ] is normalized by it’s max- imum value, which is no w much less impacted by noise, yielding an LP-ltered, normalized version of the received signal ˜ 𝑟 LP [ 𝑘 ] = 𝑟 LP [ 𝑘 ] max 𝑘 ( 𝑟 LP [ 𝑘 ] ) . (13) The normalization of the received signal r emoves all information about the number of emitted mole cules in 𝑟 LP [ 𝑘 ] . 3.1.2 Matched Filter Design. The signal used to construct the MF for a giv en T x 𝑔 does not rely directly on the noisy received sig- nal in (11) . Rather , it is calculated using the analytical channel model in (10) . Since the MF is applied after the LP ltering of the received signal, in the rst step of the MF design, the number of observed molecules in (10) is LP -ltered with the same lter ℎ LP [ 𝑘 ] as in (12) . This yields the LP -ltered expected number of molecules 𝑁 Rx , T x 𝑔 , LP [ 𝑘 ] arriving at the Rx (see also (12) ). T o account for the normalization of the received signal in (13) , we also normalize the 4 The latter assumption is justied in cases where emissions at the active T x are rare. 0 50 100 150 200 250 0 2 4 6 · 10 − 2 𝑘 𝑣 Tx 𝑖 [ 𝑘 ] 0 25 50 75 100 125 𝑡 [ s ] Figure 2: MF s 𝑣 Tx 𝑖 [ 𝑘 ] used for source localization in the PN shown in Fig. 3a) with 𝑈 = 33 T xs at a sensor sampling frequency of 𝑓 s = 2 Hz . expected number of mole cules by its maximum value, i.e ., ˜ 𝑁 Rx , T x 𝑔 , LP [ 𝑘 ] = 𝑁 Rx , T x 𝑔 , LP [ 𝑘 ] max 𝑘 ( 𝑁 Rx , T x 𝑔 , LP [ 𝑘 ] ) . (14) Because we use an LP lter in (12) , the originally white sensor noise becomes colored. T o ensure the MF accounts for this, w e rst compute the colored noise autocorrelation function (A CF) as 𝑅 𝑛𝑛 [ 𝑘 ] = 𝜎 2 s 𝛿 [ 𝑘 ] ∗ ℎ LP [ 𝑘 ] ∗ ℎ LP [ − 𝑘 ] = 𝜎 2 s ℎ LP [ 𝑘 ] ∗ ℎ LP [ − 𝑘 ] , (15) where 𝛿 [ 𝑘 ] denotes the Kronecker delta function. From the A CF in (15) , the colored noise autocorrelation matrix follows as a T oeplitz matrix [17] R 𝑛𝑛 =          𝑅 𝑛𝑛 [ 0 ] 𝑅 𝑛𝑛 [ 1 ] 𝑅 𝑛𝑛 [ 2 ] · · · 𝑅 𝑛𝑛 [ 𝑀 − 1 ] 𝑅 𝑛𝑛 [ 1 ] 𝑅 𝑛𝑛 [ 0 ] 𝑅 𝑛𝑛 [ 1 ] · · · 𝑅 𝑛𝑛 [ 𝑀 − 2 ] . . . . . . . . . . . . . . . 𝑅 𝑛𝑛 [ 𝑀 − 1 ] 𝑅 𝑛𝑛 [ 𝑀 − 2 ] 𝑅 𝑛𝑛 [ 𝑀 − 3 ] · · · 𝑅 𝑛𝑛 [ 0 ]          . (16) Next, we represent the expe cted, LP -ltered, and normalized signal ˜ 𝑁 Rx , T x 𝑔 , LP [ 𝑘 ] as a vector of nite length who’s elements are the sampled signal values, i.e., ˜ N Rx , T x 𝑔 , LP = [ ˜ 𝑁 Rx , T x 𝑔 , LP [ 0 ] , . . . , ˜ 𝑁 Rx , T x 𝑔 , LP [ 𝑀 − 1 ] ] ⊤ , (17) where ( ·) ⊤ denotes the transpose operator and 𝑀 is chosen large enough such that the expected signal has decayed to zero. Finally , the MF for Tx 𝑔 is 5 a vector of length 𝑀 [18, Eq. (9.19)] v Tx 𝑔 = R − 1 𝑛𝑛 ˜ N Rx , T x 𝑔 , LP ˜ N ⊤ Rx , T x 𝑔 , LP R − 1 𝑛𝑛 ˜ N Rx , T x 𝑔 , LP , 𝑔 ∈ { 1 , . . . , 𝑈 } , (18) where the numerator maximizes the signal-to-noise ratio ( SNR ) for the received signal from T x 𝑔 , and the denominator is a normaliza- tion scalar ensuring a maximum MF output amplitude of 1 when convolved with the corresponding received signal. W e denote the 𝑘 -th element of the MF as 𝑣 Tx 𝑔 [ 𝑘 ] , 𝑘 ∈ { 0 , . . . , 𝑀 − 1 } . See Fig. 2 for exemplary MF s. Negative values arise fr om the LP ltering in (12) . 3.1.3 Matched Filtering of Received Signal. T o p erform the matched ltering, we collect for all time steps in the observation inter val the past 𝑀 samples of the LP -ltered, normalized received signal ˜ 𝑟 LP [ 𝑘 ] in a vector ˜ r LP [ 𝑘 ] according to ˜ r LP [ 𝑘 ] = [ ˜ 𝑟 LP [ 𝑘 ] , . . . , ˜ 𝑟 LP [ 𝑘 − 𝑀 + 1 ] ] ⊤ . (19) 5 The MF design in (18) is motivated by radar signal pr ocessing [ 18 ], where such lters are known as Capon or Minimum V ariance Distortionless Response (MVDR) b eamformers. The Tx decision rule in (21) is motivated by the same framework. Matched Filter-Based Molecule Source Localization in Advection-Diusion-Driven Pipe Networks with Known T op ology T o identify the active T x , the signal in (19) is ltered by a lter bank comprising the MF s of all T xs for all time steps in the observation interval, i.e., 𝑦 𝑖 [ 𝑘 ] = v ⊤ Tx 𝑖 ˜ r LP [ 𝑘 ] , 𝑖 ∈ { 1 , . . . , 𝑈 } . (20) Then, the T x whose MF output has the maximum amplitude closest to 1, denoted by Tx 𝑖 ∗ , is selected as the candidate active T x, i.e., 𝑖 ∗ = arg min 𝑖   𝑦 max 𝑖 − 1   , 𝑦 max 𝑖 = max 𝑘 ( 𝑦 𝑖 [ 𝑘 ] ) , 𝑖 ∈ { 1 , . . . , 𝑈 } . (21) Lastly , to prevent pure noise at the Rx from being misidentied as a signal, the ratio of the peak power to noise variance, ( 𝑦 max 𝑖 ∗ ) 2 / ˆ 𝜎 2 is computed for the MF output corresponding to the candidate active T x . The noise variance ˆ 𝜎 2 is estimated within an observation window around the peak, where samples are include d only once their amplitude drops below 10 % of the peak amplitude. 3.1.4 Basic Lo calization Algorithm. The individual steps of the lo- calization algorithm are illustrated in Fig. 1c). The received signal is rst passed through an LP lter , see (12) , and then normalized by its maximum value, see (13) . Subsequently , an MF bank compris- ing the MF s of all Txs, see (20) , is applied. Finally , the MF whose maximum output is closest to 1 is identied, se e (21) , and the T x corresponding to this lter is declared as the localized source only if the peak p ower to noise variance ratio surpasses a threshold. For our simulations, we have set this threshold to be equal to 20. 3.2 Clustering Likely Confused Sources Localization based on an MF bank performs well when the received signals associated with dierent Txs exhibit suciently distinct temporal shapes . However , depending on the PN topology and the spatial proximity of the T xs, signals from dierent T xs may have highly similar shapes, see Se ction 4. In such cases, it can be advan- tageous to group T xs with similar CIRs into clusters and p erform localization at a cluster-level, where localization is considered suc- cessful if the estimated T x belongs to the same cluster as the active T x . Although this approach may not identify the exact active T x , it narrows the set of prospective T xs to a small subset of all T xs. In this work, clustering is p erformed using the Louvain com- munity detection algorithm [ 19 ] applied to a graph whose nodes represent all Txs in the PN and whose e dge weights quantify the sim- ilarity between their corr esponding CIRs. This similarity graph can be constructed either empirically from the confusion matrix ( CM ) or analytically from the cosine similarity matrix ( CSM ). Because the Louvain algorithm cho oses clusters that maximize modular- ity , a measure for how well a graph is separated into tightly-knit groups, it groups very similar CIRs together as these have high edge weights. In the following, we describe the construction of the CM and CSM, and the subsequent evaluation of the clustering. 3.2.1 Confusion Matrix. The CM is denoted by CM ∈ [ 0 , 1 ] 𝑈 × 𝑈 . It is constructe d by applying the localization algorithm in Section 3.1.4 𝑁 sim times for each T x 𝑖 ∈ N Tx , using independent noisy realizations of the received signal in (11) . After a total of 𝑈 · 𝑁 sim runs, the entry 𝐶 𝑀 𝑖 𝑗 ∈ [ 0 , 1 ] of CM represents the fraction of simulation runs where, instead of the active T x 𝑖 , T x 𝑗 was classied as the signal source. Entries on the main diagonal repr esent the fractions of simulation runs where the active T x was correctly identied. W e nd that binarizing CM reduces variability across simulation runs and improves clustering quality ( see Section 3.2.3). The bina- rized matrix CM b is obtained by setting entries with fe wer than 5 % confusions to zero and all others to one, i.e ., 𝐶 𝑀 b 𝑖 𝑗 =  1 , if 𝐶 𝑀 𝑖 𝑗 ≥ 0 . 05 0 , if 𝐶 𝑀 𝑖 𝑗 < 0 . 05 . (22) Matrix CM b is interpreted as a directed graph, on which the Louvain community detection algorithm [19] is applied to obtain clusters. 3.2.2 Cosine Similarity Matrix. Contrary to the CM , the CSM , de- noted by CSM ∈ [ 0 , 1 ] 𝑈 × 𝑈 , does not rely on empirical simulations. It is given by the pairwise cosine similarities of the Tx CIRs, i.e ., 𝐶 𝑆 𝑀 𝑖 𝑗 =  ∞ − ∞ ℎ Tx 𝑖 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑖 ) ℎ Tx 𝑗 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑗 ) d 𝑡   ∞ − ∞ ℎ 2 Tx 𝑖 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑖 ) d 𝑡   ∞ − ∞ ℎ 2 Tx 𝑗 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑗 ) d 𝑡 . (23) Binarization is performed similarly as for the CM , with an adjusted threshold, as the cosine similarity metric assigns values close to one for similar signals, i.e., 𝐶 𝑆 𝑀 b 𝑖 𝑗 =  1 , if 𝐶 𝑆 𝑀 𝑖 𝑗 ≥ 0 . 95 0 , if 𝐶 𝑆 𝑀 𝑖 𝑗 < 0 . 95 . (24) Note that CSM b is symmetric and is interpreted as an undir ected graph in the Louvain community algorithm [ 19 ], which is applied to obtain clusters of likely confused Txs. 3.2.3 Clustering ality . Txs ar e considered well clustered if CIRs within each cluster are highly similar , while diering signicantly across dierent clusters. T o quantify clustering quality , we employ the silhouette score [ 20 ] based on the squared Euclidean distance, denoted by ∥ · ∥ 2 , between the normalized CIRs, i.e., 𝑑 ( Tx 𝑖 , Tx 𝑗 ) =    ˜ ℎ Tx 𝑖 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑖 ) − ˜ ℎ Tx 𝑗 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑗 )    2 2 , (25) with ˜ ℎ Tx 𝑖 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑖 ) = ℎ Tx 𝑖 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑖 ) max 𝑡 ( ℎ Tx 𝑖 ,𝑤 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑖 ) ) . (26) The silhouette score ranges from − 1 to 1 , with higher values indi- cating compact, well-separated clusters and lower values reecting poor clustering quality . T o compare the similarity of the clustering results obtained from CM b and CSM b , we use the adjusted rand index ( ARI ) [ 21 , Eq. (21)], ranging from − 0 . 5 to 1 , where a value of 1 indicates identical clustering based on CM b and CSM b . 4 Results T o investigate the performance of the propose d localization ap- proaches, we rst intr oduce a real-world SN as an exemplary appli- cation domain and then assess the localization performance. 4.1 Sewage Pipe Network W e simulate part of a SN in the Zolkiewka Commune, Poland [ 22 ], as shown in Fig. 3a). W e make the following simplifying assump- tions: Each house connected to the sewage system is modeled as an inlet node 𝑛 in , 𝑔 with a constant inow rate 𝑄 in , 𝑔 = 5 × 10 − 3 m 3 s − 1 , resulting in realistic ow rates within the PN , see Fig. 3a). At each inlet 𝑛 in , 𝑔 , a potentially active T x 𝑔 is placed at 𝑧 Tx 𝑔 = 0 , repre- senting a source of viral markers. Molecule transport in the SN is Jakumeit and Heinlein, et al. a) b) c) d) T x 31 T x 32 T x 33 T x 30 T x 29 T x 27 T x 28 T x 26 T x 23 T x 24 T x 25 T x 15 T x 16 T x 17 T x 18 T x 19 T x 20 T x 21 T x 22 T x 14 T x 11 T x 12 T x 13 T x 10 T x 9 T x 8 T x 7 T x 6 T x 5 T x 4 T x 3 T x 2 T x 1 Rx Rx 200 m Inlet no de & Tx Connecting node Outlet no de & Rx Pipe 0 . 005 0 . 165 Flow rate 𝑄 𝑖 [ m 3 s − 1 ] T x 1 T x 2 T x 3 T x 4 T x 5 T x 6 T x 7 T x 8 T x 9 T x 10 T x 11 T x 12 T x 13 T x 14 T x 15 T x 16 T x 17 T x 18 T x 19 T x 20 T x 21 T x 22 T x 23 T x 24 T x 25 T x 26 T x 27 T x 28 T x 29 T x 30 T x 31 T x 32 T x 33 0 250 500 750 1 , 000 1 , 250 T x–Rx distance [ m ] 0 100 200 300 400 Mean arrival time [ s ] 10 3 10 6 10 9 0 2 4 6 8 · 10 − 9 𝑚 𝑔 𝑓 𝑀 𝑔 ( 𝑚 𝑔 ) PDF at Tx 𝑔 · 10 3 0 100 200 300 400 0 2 4 6 𝑡 [ s ] Received signals T x 1 active W/ Rx noise W/o Rx noise 0 100 200 300 400 0 0 . 5 1 1 . 5 · 10 4 𝑡 [ s ] T x 7 active W/ Rx noise W/o Rx noise 0 100 200 300 400 0 0 . 5 1 1 . 5 · 10 5 𝑡 [ s ] T x 18 active W/ Rx noise W/o Rx noise · 10 3 0 100 200 300 400 0 2 4 6 𝑡 [ s ] T x 30 active W/ Rx noise W/o Rx noise Figure 3: a) T opology of a SN in Zolkiewka Commune, Poland [ 22 ]. Pipe lengths 𝑙 𝑖 are drawn to scale (see scale bar) and 𝑟 𝑖 = 5 . 5 cm , ∀ 𝑝 𝑖 . Inlet, connecting, and outlet node positions are illustrated in orange, blue, and red, respectively . At each inlet node 𝑛 in , 𝑔 ∈ N in , a T x 𝑔 is present with 𝑄 in , 𝑔 = 5 × 10 − 3 m 3 s − 1 . F low directions are given by arrows, ow rates are color-co ded in the e dge colors. The Rx is lo cated at the outlet node. b) T x - Rx distances (orange) and T x - Rx path mean arrival times (purple) for all Txs. c) Log-normal PDF for the number of released viral particles 𝑀 𝑔 at the active Tx 𝑔 . d) Four exemplary received signals from dierent active Txs with and without additive Rx noise. assumed to be advection-diusion-driven with diusion coecient 𝐷 = 0 . 2 m 2 s − 1 [ 23 ], while gravitational eects and half-lled pipes typical of SNs are neglected [22]. At the active T x 𝑔 , we assume a single impulsive r elease of viral markers. The number of released markers, denoted by 𝑀 𝑔 , is mod- eled as log-normally distribute d, i.e., ln ( 𝑀 𝑔 ) ∼ N ( 𝜇 Tx , 𝜎 2 Tx ) , with mean 𝜇 Tx = 18 . 69 and variance 𝜎 2 Tx = 2 . 46 [15, 16], see Fig. 3c). At the Rx , a default sampling rate of 𝑓 s = 2 Hz and noise power 𝜎 2 s = 10 3 are assumed. W e emphasize that MIGH T serves as a simplied proxy for sewage transport, as our focus is on generic lo- calization techniques in PNs rather than high-delity SN modeling. 4.2 Simulation Results Below , we rst present results on the CM and CSM and investi- gate the T x clustering. Subsequently , we illustrate the localization accuracy for various sensor noise p ow ers and sampling fr equencies. 4.2.1 Transmier Confusion Matrix and Cosine Similarity Matrix. Due to the similarity of the temporal shapes of the received signals associated with dierent T xs, see Fig. 5, several T x pairs are likely confused by the localization algorithm in Section 3.1.4. For 𝑁 sim = 100 simulation runs per T x , the resulting confusion rates ar e shown in Fig. 4 using the CM (left) and its binarized version (middle). In the CM , the observed confusion rates o the main diagonal range from 0 % to 36 % , with particularly high confusions between T x 8 and T x 28 ( 36 % ). Note that CM b is approximately symmetric. Tx 1 Tx 5 Tx 10 Tx 15 Tx 20 Tx 25 Tx 30 Tx 33 Tx 33 Tx 30 Tx 25 Tx 20 Tx 15 Tx 10 Tx 5 Tx 1 0 0.2 0.4 0.66 CM (fraction) Tx 1 Tx 5 Tx 10 Tx 15 Tx 20 Tx 25 Tx 30 Tx 33 Tx 33 Tx 30 Tx 25 Tx 20 Tx 15 Tx 10 Tx 5 Tx 1 0 1 CM b (threshold 0.05) Tx 1 Tx 5 Tx 10 Tx 15 Tx 20 Tx 25 Tx 30 Tx 33 Tx 33 Tx 30 Tx 25 Tx 20 Tx 15 Tx 10 Tx 5 Tx 1 0 1 CSM b (threshold 0.95) Figure 4: Likely confused Txs, illustrated using the empirically de- termined CM (left) and its binarized version (middle). Confusions predicted by the binarized CSM are shown on the right. The right plot in Fig. 4 shows the confusions predicte d by CSM b . The agreement b etween clustering based on CM b and two other clustering methods is quantied in Table 1 using the ARI , while clustering quality is evaluated via the silhouette score in (25) . Both clustering methods based on CM b and CSM b yield good silhouette scores ( 0 . 49 and 0 . 60 , respectively). Moreover , CM b and CSM b show good agreement ( ARI = 0 . 70 ), conrming that the CSM is a useful proxy for the actual CM . As a baseline , we include K -means clus- tering based on the path means ¯ 𝜇 𝑔 , with the number of clusters set equal to that obtaine d with the CM b and CSM b methods. Compared to these approaches, K-means yields poorer clusters (silhouette score − 0 . 2763 ) and do es not agree with the CM ( ARI = − 0 . 0053 ). Matched Filter-Based Molecule Source Localization in Advection-Diusion-Driven Pipe Networks with Known T op ology T able 1: Clustering evaluation metrics. Clustering Silhouette score ARI CM b 0 . 49 1 . 00 CSM b 0 . 60 0 . 70 K-means ( ¯ 𝜇 𝑔 ) − 0 . 2763 − 0 . 0053 Tx 31 Tx 32 Tx 33 Tx 30 Tx 29 Tx 27 Tx 28 Tx 26 Tx 23 Tx 24 Tx 25 Tx 15 Tx 16 Tx 17 Tx 18 Tx 19 Tx 20 Tx 21 Tx 22 Tx 14 Tx 11 Tx 12 Tx 13 Tx 10 Tx 9 Tx 8 Tx 7 Tx 6 Tx 5 Tx 4 Tx 3 Tx 2 Tx 1 Rx 200 m 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Norm. shi. CIRs Cluster 1 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Cluster 2 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Cluster 3 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Cluster 4 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Norm. shi. CIRs Cluster 5 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Cluster 6 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Cluster 7 0 30 60 90 120 0 0 . 5 1 𝑡 [ s ] Cluster 8 Figure 5: Spatial distribution of T x clusters. Below the SN , the nor- malized and time-shifted CIRs of all clusters are shown. 4.2.2 Spatial Spread of Source Clusters. In the top part of Fig. 5, the cluster assignment of each T x is indicated by its color , as deter- mined using CSM b in (24) . The bottom part shows the normalized and time-shifted CIRs of all Txs within each cluster . The time shift accounts for the unknown emission time at the T x . Interestingly , while the CIRs of T xs within a cluster are ver y similar , the correspond- ing T xs are often not geographically close . This is because many parts of the SN exhibit heter ogeneous ow rates in their immediate neighborhood, se e Fig. 3a), leading to strongly varying CIRs of the T xs in these neighborhoods. This can also be obser ved in Fig. 3b), where it becomes clear that the T x - Rx distances are not necessarily correlated to mean arrival times, due to varying ow rates. This has direct implications when using an MF -based localization algorithm to identify , e.g., an infected individual in a community . Investigating only the members of the identied household and geographically neighboring households is insucient. Instead, households that are close in terms of molecule propagation time to the Rx must be considered, as the molecule release time is not known. 4.2.3 Localization Accuracy . Firstly , in Fig. 6, we show the local- ization accuracy of the MF -based localization algorithm proposed in Section 3.1.4 as a function of the SNR for both a sampling rate of 𝑓 s = 2 Hz (pink solid line) and 𝑓 s = 0 . 2 Hz (blue solid line). For this algorithm, the localization accuracy is dened as the percentage of simulation runs in which the active T x is correctly identied. Secondly , Fig. 6 shows the localization accuracy for the CSM -based clustering approach in Section 3.2 for b oth sampling rates (thin dashed pink and blue lines) and clustering baselines based on K- means clustering (thick dashed pink and blue lines). In this setting, the localization accuracy is dened as the p ercentage of simulation runs in which the identied T x lies within the cluster containing the activ e T x . Thir dly , Fig. 6 shows the missed detection probability computed from the simulation runs (cyan solid lines), i.e., the prob- ability that a signal is confuse d with noise and no lo calization is performed, which only dep ends on the sampling rate and the SNR . The SNR is obtained by rst calculating, for each T x , the ratio between the expected received signal’s (10) average power (using − 20 − 10 0 10 20 30 40 50 0 25 50 75 100 SNR dB [ dB ] Localization accuracy [ % ] 𝑓 s = 2 Hz Random Tx guessing Basic algor. (Sec. 3.1.3) 𝑓 s = 0 . 2 Hz K-means clustering ( ¯ 𝜇 𝑔 ) Cluster algor. (Sec. 3.2) 0 25 50 75 100 Missed detection probability [%] Missed detect. prob. 𝑓 s = 2 Hz Missed detect. prob. 𝑓 s = 0 . 2 Hz Figure 6: Localization accuracy of the proposed algorithms and missed detection probability at dierent sensor sampling rates. the mean number of released molecules ¯ 𝑀 𝑔 ) and the sensor noise power , and then averaging over all 𝑈 Txs in the PN, i.e., SNR dB = 10 log 10      1 𝑈  Tx 𝑔 ∈ N Tx 1 𝑡 2 𝑔 − 𝑡 1 𝑔  𝑡 2 𝑔 𝑡 1 𝑔 𝑁 2 Rx , T x 𝑔 ( 𝑧 Rx , 𝑡 ; 𝑧 Tx 𝑔 ) d 𝑡 𝜎 2 𝑠      , (27) where [ 𝑡 1 𝑔 , 𝑡 2 𝑔 ] represents the time interval during which the signal of T x 𝑔 carries most of its energy . In our simulations, SNR dB is varied by varying the sensor noise power 𝜎 2 s . Starting with the sensor with sampling frequency 𝑓 s = 2 Hz (pink lines), the basic algorithm outperforms random guessing ( green line, accuracy 1 / 𝑈 = 1 / 33 ) already at SNR dB = − 10 dB . For lower SNRs, the algorithm cannot distinguish signal from noise and thus fails to dete ct any Txs, resulting in large missed detection probabili- ties (cyan line with round markers) and zero correct identications. As the SNR increases, the algorithm eventually achieves p erfect localization accuracy . When considering the ability to predict the correct cluster (thin pink dashed line), we observe a gain of mor e than 10 dB compared to predicting the exact T x (pink solid line), conrming the intuition from Section 3.2 that most confusions oc- cur in small clusters of T xs. Moreover , at both sampling rates, the clustering algorithm from Section 3.2 (thin dashed lines) outper- forms the algorithm based on K-means clustering (thick dashed lines). At the lower sampling rate of 𝑓 s = 0 . 2 Hz (blue lines), both the basic and clustering algorithms achieve signicantly lower ac- curacy than for 𝑓 s = 2 Hz . This is due to two eects: the missed detection probability shifts by ∼ 10 dB to the right (see cyan lines), and ev en when it reaches zer o, perfect localization accuracy cannot be achie ved indep endent of the SNR , since only limited information about the signal shape can be recovered at lower sampling rates. 5 Conclusion In this work, we pr oposed the rst analytical framework for mol- ecule source localization in advection-diusion-driven PNs with known topology based on the MIGH T channel mo del. The proposed MF -based approach reliably identies a single activ e T x among mul- tiple known T xs from the signal of a single Rx , as illustrated for a model of a real-world SN . Furthermore, frequently confuse d T xs can be grouped into clusters that remain reliably distinguishable , drastically reducing the numb er of T xs requiring manual inspec- tion in, e.g., epidemiological applications. Futur e work will study the transferability of the framework b eyond SNs across applica- tion domains and scales, and investigate inter- T x interference and experimental validation in branched MC testbeds. Jakumeit and Heinlein, et al. References [1] Reza Mosayebi et al. Early cancer detection in blood vessels using mobile nanosensors. IEEE Trans. NanoBiosci. , 18:103–116, April 2019. [2] Bige D. Unluturk and Ian F . Akyildiz. An end-to-end model of plant pheromone channel for long range molecular communication. IEEE Trans. NanoBiosci. , 16(1): 11–20, 2017. [3] Luca Felicetti et al. Applications of molecular communications to medicine: A survey . Nano Commun. Netw . , 7:27–45, 2016. [4] Nariman others Farsad. A comprehensive survey of recent advancements in molecular communication. IEEE Commun. Sur v . T utor . , 18(3):1887–1919, 2016. [5] Dadi Bi et al. A survey of molecular communication in cell biology: Establishing a new hierarchy for inter disciplinary applications. IEEE Commun. Sur v . Tutor . , 23(3):1494–1545, 2021. [6] Ali Etemadi et al. Abnormality detection and localization schemes using mole c- ular communication systems: A survey . IEEE Access , 11:1761–1792, 2023. [7] Y u Deng et al. Use of sewage surveillance for CO VID-19 to guide public health response: A case study in Hong Kong. Sci. T otal Environ. , 821:153250, 2022. [8] Timo Jakumeit, Lukas Brand, Jens Kirchner , Robert Schob er , and Sebastian Lotter . V essel network topology in molecular communication: Insights from experiments and theory . arXiv preprint , Decemb er 2025. [9] Meriç T uran et al. Transmitter localization in vessel-like diusive channels using ring-shaped molecular receivers. IEEE Commun. Lett. , 22(12):2511–2514, 2018. [10] Ladan Khaloopour et al. Theoretical concept study of cooperative abnormality detection and localization in uidic-me dium molecular communication. IEEE Sens. J. , 21(15):17118–17130, August 2021. [11] Martin Schottlender , Maximilian Schäfer , and Ricardo A. V eiga. Neural network- based distance estimation for branched molecular communication systems. In Proc. 12th A nnu. ACM Int. Conf. Nanoscale Comput. Commun. , pages 28–33, 2025. [12] Saswati Pal et al. Machine learning-driven localization of infection sources in the human cardiovascular system. IEEE Trans. Mol. Biol. Multi-Scale Commun. , 11(4):524–530, 2025. [13] Timo Jakumeit, Bastian Heinlein, Nunzio Tuccitto , Robert Schober , Sebastian Lotter , and Maximilian Schäfer . Mixture of Inverse Gaussians for Hemo dynamic Transport (MIGH T) in multiple input multiple output vascular networks. arXiv preprint arXiv:2510.11743v2 , Februar y 2026. [14] Rutherford Aris. On the dispersion of a solute in a uid owing through a tube. Proc. R. Soc. (London) A , 235:67–77, April 1956. [15] P. Foladori et al. SARS-Co V-2 from faeces to wastewater treatment: What do we know? A review . Sci. Total Environ. , 743:140444, 2020. [16] C. Rose, A. Parker , B. Jeerson, and E. Cartmell. The characterization of feces and urine: A review of the literature to inform advanced treatment technology . Crit. Rev . Environ. Sci. T echnol. , 45(17):1827–1879, 2015. [17] Monson H. Hayes. Statistical Digital Signal Processing and Modeling . Wiley , 1996. [18] Mark A. Richards. Fundamentals of Radar Signal Processing . McGraw-Hill Electronic Engineering. McGraw-Hill, 2005. [19] Vincent D. Blondel et al. Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. , 2008(10):P10008, 2008. [20] Peter J. Rousseeuw . Silhouettes: a graphical aid to the interpretation and valida- tion of cluster analysis. J. Comput. Appl. Math. , 20:53–65, 1987. [21] Lawrence Hubert and Phipps Arabie. Comparing partitions. J. Classication , 2 (1):193–218, 1985. [22] T adeusz Nawrot et al. A case study of a small diameter gravity sewerage system in Zolkiewka Commune, Poland. W ater , 10(10), 2018. [23] Fred Sonnenwald et al. Quantifying mixing in sewer networks for source local- ization. J. Environ. Eng. , 149(5):04023019, 2023.

Matched Filter-Based Molecule Source Localization in Advection-Diffusion-Driven Pipe Networks with Known Topology

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment