On the Value of Base Station Motion Knowledge for Goal-Oriented Remote Monitoring with Energy-Harvesting Sensors

This paper investigates goal-oriented remote monitoring of an unobservable Markov source using energy-harvesting sensors that communicate with a mobile receiver, such as a Low Earth Orbit (LEO) satellite or Unmanned Aerial Vehicle (UAV). Unlike conve…

Authors: Sehani Siriwardana, Jean Michel de Souza Sant'Ana, Richard Demo Souza

On the Value of Base Station Motion Knowledge for Goal-Oriented Remote Monitoring with Energy-Harvesting Sensors
On the V alue of Base Station Motion Kno wledge for Goal-Oriented Remote Monitoring with Ener gy-Harvesting Sensors Sehani Siriwardana, Jean M. S. Sant’Ana, Richard Demo Souza, Abolfazl Zak eri, Onel L. A. L ´ opez Centr e for W ireless Communications, Univer sity of Oulu , Oulu, Finland Electrical & Electr onics Engineering Department, F eder al University of Santa Catarina , Florian ´ opolis, Brazil stharindra@gmail.com, { jean.desouzasantana, abolfazl.zakeri, onel.alcarazlopez } @oulu.fi, richard.demo@ufsc.br Abstract —This paper in vestigates goal-oriented remote moni- toring of an unobservable Marko v source using energy-har vesting sensors that communicate with a mobile recei ver , such as a Lo w Earth Orbit (LEO) satellite or Unmanned Aerial V ehicle (U A V). Unlike con ventional systems that assume stationary base stations, the proposed framework explicitly accounts for receiv er mobility , which induces time-varying channel characteristics modeled as a finite-state Markov process. The remote monitoring problem is formulated as a partially observable Mark ov decision process (POMDP), which is transformed into a tractable belief-state MDP and solved using relativ e value iteration to obtain optimal sampling and transmission policies. T wo estimation strategies are considered: Maximum Likelihood (ML) and Minimum Mean Distortion (MMD). Numerical results demonstrate that incor- porating r eceiver mobility and channel state information into the optimization reduces the average distortion by 10% to 42% compared to baseline policies and constant-channel assumptions, highlighting the importance of base station motion knowledge for effective goal-oriented communication. Keyw ords — Goal-oriented remote monitoring; mobile receiver; energy harvesting; Markov decision process (MDP). I . I N T RO D U C T I O N Internet of Things (IoT) applications are presented in several areas of the world. Some consist of remote areas, beyond cel- lular network coverage, energy grid av ailability , or areas with difficult maintenance access, which may require on-site energy generation methods, such as energy harvesting [1]. In such cases, energy consumption reduction is crucial, motiv ating the research on efficient sampling and data transmission, so that only the information necessary to achiev e a specific objectiv e is collected and transmitted, in which is referred to as goal- oriented remote monitoring [2]. In cases with limited cellular coverage, other solutions can be used to serve as base stations. Some examples includes Low-Earth Orbit (LEO) satellites [3], Unmanned Aerial V ehi- cles (U A Vs) [4], [5] and maritime vessels [6], in which can be used as an enabler for IoT connectivity in remote regions. All of the aforementioned examples consist of mobile base stations. In those cases, the performance of the transmission between devices and the monitor at the base station is di- rectly connected to their mov ement and position. Therefore, this paper in vestigates goal-oriented remote monitoring of an unobservable source using sensors deployed in remote regions that transmit to mobile monitors. In [7], a goal-oriented real-time remote monitoring frame- work is presented for autonomous systems. Its system model consists of an information source, a sampler , a transmitter , and a stationary receiv er . Zakeri et al. [8] considered a system wherein a transmitter sends information from multiple sources to a monitor through a relay . They formulated a constrained Marko v decision process (CMDP), which is then transformed into a standard Markov decision process (MDP) and solved using a structure-aw are relative value iteration algorithm (R VIA). Later, the w ork in [9] presented a goal- oriented remote tracking system with ener gy constraints, where the system comprises an information source, a sensor , and a monitor . Therein, the information source and sensor are deployed in a remote area, and the sensor is po wered by energy harvesting. The information source is modeled as a Marko v chain with sev eral states, and the ov erall system is formulated as a partially observable MDP (POMDP). Then, this POMDP is transformed into an MDP and solv ed using a R VIA. The aforementioned works considered a stationary monitor, where the channel state is constant. This may hold true in standard cellular communication or in GEO satellite systems, where the satellite orbit follows Earth’ s rotation and co vers the same area of the globe. Ho wev er, in scenarios like LEO satellite systems or U A V communication, the monitor appears to mov e from the transmitter’ s point of view , generating a time-varying channel. T o better model this scenario, the motion of the base stations should be incorporated. In previous research [10]–[14], satellite motion is modeled as a binary Markov chain with two states: good and bad. The good state represents a line-of-sight (LoS) condition, whereas the bad state corresponds to a non-line-of-sight (NLoS) condition. E. Lutz [10] further proposes a four-state Marko v chain model in which they model satellite diversity and their channel corre- lation. In another study [15], a three-state Marko v chain was introduced, where the states are defined based on the variation of the channel quality as the satellite moves. Specifically , state 1 represents a clear LoS condition between the satellite and the transmitter, state 2 corresponds to moderate shadowing, and state 3 denotes a complete absence of LoS. Furthermore, another study [16] proposed a fiv e-state Markov chain model, with the states defined according to the motion dynamics of the satellite. In this work, we propose a goal-oriented remote monitoring system for satellite communications that accounts for the base station mobility inherent in LEO and UA V systems. T o account that, we extend the framew ork proposed in [9]. The general operation of the system is formulated as a POMDP , which is then transformed into an MDP . The resulting MDP is solved to obtain the optimal sensor policies for sampling and transmission. W e analyze the impact of optimal policies in terms of distortion based on a gi ven source application. Our results sho w that the optimized average distortion can be reduced from 10% to 42% when considering the base station motion behavior in the optimization. I I . S Y S T E M M O D E L W e consider a system model similar to the one in [9] as illus- trated in Figure 1. The model consists of an information source, a sensor, and a monitor . The sensor comprises a sampler , a controller , a b uffer , and a transmitter . The sensor operation is po wered by a battery that is continuously charged using an energy harvesting method. The sampler first samples the information from the information source, stores the sampled information in the buf fer, and then the transmitter sends the buf fered information to the moving monitor through a wireless channel. Ne xt, within the sensor , the controller regulates the sampling and transmission processes by issuing commands to the sampler and the transmitter . It is assumed that the controller observes the battery le vel, the information in the b uffer , and the success or f ailure of each transmission e vent. Since the monitor is in motion, the wireless channel varies with its position in the sky relativ e to the sensor; consequently , the reception success rate (RSR) also varies. The information source is modeled as a Markov source with 𝑀 states. The entire system operates in discrete time with time slots 𝑡 ∈ { 0 , 1 , 2 , . . . } . Then, the information state at the source at each time slot 𝑡 is denoted by 𝑋 𝑡 , where 𝑋 𝑡 ∈ 𝑆 𝑋 = { 1 , 2 , . . . , 𝑀 } , and 𝑆 𝑋 is the state space. Then, the information source state transition matrix is 𝑃 =  𝑝 𝑖 𝑗  , where 𝑝 𝑖 𝑗 represents the probability of transitioning from the state 𝑖 to the state 𝑗 . Next, the information state in the buf fer is indicated by ˜ 𝑋 𝑡 , while the most recent information receiv ed at the monitor is indicated by ¯ 𝑋 𝑡 . The channel is modeled as a Markov source with 𝐾 states to mimic the monitor motion where the RSR varies ov er time. The motion transition matrix is denoted by Q = [ 𝑝 𝑘 𝑙 ] , where 𝑝 𝑘 𝑙 represents the probability of transitioning from state 𝑘 to the state 𝑙 , and the state of the monitor at time 𝑡 is denoted by 𝑄 𝑡 ∈ { 1 , 2 , . . . , 𝐾 } . Finally , we hav e the RSR at time 𝑡 represented as 𝑞 𝑠 ( 𝑄 𝑡 ) , where 𝑞 𝑠 is a vector of length 𝐾 that contains RSR for each respectiv e state. It is important to note that when 𝐾 = 1 , there is only one monitor state and the RSR is constant, and the model conv erges to the one in [9]. Then, the controller commanding process for each sam- pling and transmission at each time slot is formulated as a binary decision model, represented by 𝛼 𝑡 and 𝛽 𝑡 , respecti vely . Precisely , 𝛼 𝑡 = 1 defines that the sampler collects a sample from the source, while 𝛼 𝑡 = 0 indicates that no sampling occurs. Similarly , 𝛽 𝑡 = 1 denotes that the transmitter sends the currently stored sample ˜ 𝑋 𝑡 from the buf fer to the monitor , while 𝛽 𝑡 = 0 indicates that no transmission has occurred. Thus, it forms a combination of 4 possible actions in each time slot 𝑡 , as shown in T able I. W e assume energy is consumed only during two operations: (i) sampling the source information state, and (ii) transmitting the buf fered information state to the monitor . The correspond- ing energy costs are denoted by 𝜅 for sampling and 𝜏 for transmission. Moreover , ener gy must be harv ested from the en vironment and it is stored in a battery with discrete energy lev els, where the maximum capacity is 𝐸 . The energy arriv al process, denoted by 𝑢 𝑡 , is modeled as a Bernoulli process following [17], with an energy arriv al rate 𝜇 , such that P { 𝑢 𝑡 = 1 } = 𝜇 . Moreover , the battery state at time slot 𝑡 , denoted by 𝑒 𝑡 ∈ { 0 , . . . , 𝐸 } , e volves according to 𝑒 𝑡 + 1 = min { 𝑒 𝑡 + 𝑢 𝑡 − ( 𝜅 𝛼 𝑡 + 𝜏 𝛽 𝑡 ) , 𝐸 } . (1) At each time slot 𝑡 , executing an action requires suf ficient energy av ailable in the battery , which can be expressed as 𝑒 𝑡 − 𝛼 𝑡 𝜅 − 𝛽 𝑡 𝜏 ≥ 0 , ∀ 𝑡 . (2) W e define a cost function 𝑑 𝑡 , which incorporates an estimate of the information source state, along with the information state at the source during the same time slot 𝑡 , which is expressed as 𝑑 𝑡 ≜ 𝑓 ( 𝑋 𝑡 , ˆ 𝑋 𝑡 ) , (3) where ˆ 𝑋 𝑡 denotes the monitor estimation of the source infor- mation state 𝑋 𝑡 . Then, to obtain the estimation of 𝑋 𝑡 , two approaches are considered: Maximum Likelihood (ML) [18] and Minimum Mean Distortion (MMD) estimations. Moreo ver , the cost function reflects the cost incurred at each time slot when estimation is required whenev er the source information is not sampled, transmitted, and recei ved successfully by the monitor within that time slot. F or optimized system perfor - mance, the cost must be minimized, as successful sampling and transmission within the same time slot are required, subject to the constraint in (2). The system problem can be formulated as minimize lim sup 𝑇 →∞ 1 𝑇 Í 𝑇 𝑡 = 1 E { 𝑑 𝑡 } subject to 𝑒 𝑡 − 𝛼 𝑡 𝜅 − 𝛽 𝑡 𝜏 ≥ 0 , ∀ 𝑡 , (4) where Í 𝑇 𝑡 = 1 E { · } represents the sum of the expected cost obtained with respect to the randomness of the system, which includes the source, the battery charging and the channel. By solving this problem, we get the optimal action values 𝛼 𝑡 and 𝛽 𝑡 that should be taken at each time slot. T able I: Controller actions. 𝛼 𝑡 and 𝛽 𝑡 Action 𝑎 cost 𝑐 ( 𝑎 ) Controller action 𝛼 𝑡 = 0 , 𝛽 𝑡 = 0 0 0 No sampling and no transmitting 𝛼 𝑡 = 0 , 𝛽 𝑡 = 1 1 𝜏 No sampling and just transmitting 𝛼 𝑡 = 1 , 𝛽 𝑡 = 0 2 𝜅 Just sampling and no transmitting 𝛼 𝑡 = 1 , 𝛽 𝑡 = 1 3 𝜏 + 𝜅 Both sampling and transmitting Figure 1: System model illustrating the sensor components (sampler , b uffer , transmitter, controller), energy harvesting module, and mobile monitor connected through a time-v arying wireless channel. I I I . P RO B L E M S O L U T I O N Since the controller does not directly access the state of the information source, there is only partial observability . T o account for this, the system dynamics are modeled as a POMDP , which is subsequently transformed into an equi valent Markov Decision Process (MDP) and solved using R VIA, which facilitates the determination of optimal control policies. The elements of the POMDP are defined next. State Space ( 𝑆 ) : In each time slot 𝑡 , the system state is represented by 𝑠 𝑡 =  𝑒 𝑡 , 𝑋 𝑡 , ˜ 𝑋 𝑡 , ¯ 𝑋 𝑡 , 𝜃 𝑡 , 𝛿 𝑡 , 𝑄 𝑡  and 𝑆 is an infinite set, where 𝜃 𝑡 is the age of information (AoI) at the transmitter (age of the last sampled state at the buf fer) and 𝛿 𝑡 is the AoI at the monitor (age of the last receiv ed sample at the monitor). Action Space ( A 𝑠 ) : Let A 𝑠 be the action space that consists of all admissible actions that the controller can take, i.e., actions satisfying the energy constraint, when the system is at state 𝑠 . The action selected at time slot 𝑡 is (generally) represented as 𝑎 𝑡 ∈ { 0 , 1 , 2 , 3 } , where 𝑎 𝑡 = 0 indicates that the sampler and transmitter stay idle, 𝑎 𝑡 = 1 indicates that the transmitter re-transmits the sample in the buffer , 𝑎 𝑡 = 2 indicates that the sampler takes a ne w sample, and 𝑎 𝑡 = 3 indicates that the sampler takes a ne w sample and the transmitter transmits that sample. The actions are determined by a policy 𝜋 , which maps 𝑆 to A 𝑠 . Observation Space ( Ω ) : The observ ation space Ω represents the set of all observ ations av ailable to the controller . The obser - vation 𝑜 𝑡 at time slot 𝑡 is given by 𝑜 𝑡 = ( 𝑒 𝑡 , ˜ 𝑋 𝑡 , ¯ 𝑋 𝑡 , 𝜃 𝑡 , 𝛿 𝑡 , 𝑄 𝑡 ) . State T ransition Probabilities ( 𝑃 POMDP ) : The state transi- tion probabilities characterize the chance of moving from the current state 𝑠 𝑡 =  𝑒 𝑡 , 𝑋 𝑡 , ˜ 𝑋 𝑡 , ¯ 𝑋 𝑡 , 𝜃 𝑡 , 𝛿 𝑡 , 𝑄 𝑡  to the next state 𝑠 𝑡 + 1 =  𝑒 𝑡 + 1 , 𝑋 𝑡 + 1 , ˜ 𝑋 𝑡 + 1 , ¯ 𝑋 𝑡 + 1 , 𝜃 𝑡 + 1 , 𝛿 𝑡 + 1 , 𝑄 𝑡 + 1  , conditioned on the action 𝑎 𝑡 selected at time slot 𝑡 . This probability is denoted by P { 𝑠 𝑡 + 1 | 𝑠 𝑡 , 𝑎 𝑡 } . Since the system processes including the dynamics of the information source, monitor motion, AoI ev olution, battery states, and source estimation are assumed to ev olve independently , the ov erall transition probability can be represented as the product of the transition probabilities of these individual components as P { 𝑠 𝑡 + 1 | 𝑠 𝑡 , 𝑎 𝑡 } = P { 𝜃 𝑡 + 1 | 𝜃 𝑡 , 𝑎 𝑡 } × P { 𝛿 𝑡 + 1 | 𝜃 𝑡 , 𝛿 𝑡 , 𝑎 𝑡 } P { 𝑒 𝑡 + 1 | 𝑒 𝑡 , 𝑎 𝑡 } × P  ˜ 𝑋 𝑡 + 1 | 𝑋 𝑡 , ˜ 𝑋 𝑡 , 𝑎 𝑡  P { 𝑋 𝑡 + 1 | 𝑋 𝑡 } × P { 𝑄 𝑡 + 1 | 𝑄 𝑡 } P  ¯ 𝑋 𝑡 + 1 | 𝑋 𝑡 , ˜ 𝑋 𝑡 , ¯ 𝑋 𝑡 , 𝑎 𝑡  , (5) where P { 𝜃 𝑡 + 1 | 𝜃 𝑡 , 𝑎 𝑡 } = 𝛿 𝜃 𝑡 + 1 1 { 𝑎 𝑡 ∈ { 2 , 3 } } + ( 𝜃 𝑡 + 1 ) 1 { 𝑎 𝑡 ∈ { 0 , 1 } } , (6) P { 𝛿 𝑡 + 1 | 𝜃 𝑡 , 𝛿 𝑡 , 𝑄 𝑡 , 𝑎 𝑡 } =                  𝑞 ( 𝑄 𝑡 ) , if 𝑎 𝑡 = 3 , 𝛿 𝑡 + 1 = 1 , 𝑞 ( 𝑄 𝑡 ) , if 𝑎 𝑡 = 1 , 𝛿 𝑡 + 1 = 𝜃 𝑡 + 1 , ¯ 𝑞 ( 𝑄 𝑡 ) , if 𝑎 𝑡 ∈ { 1 , 3 } , 𝛿 𝑡 + 1 = 𝛿 𝑡 + 1 , 1 , if 𝑎 𝑡 ∈ { 0 , 2 } , 𝛿 𝑡 + 1 = 𝛿 𝑡 + 1 , 0 , otherwise. (7) P  ˜ 𝑋 𝑡 + 1 | 𝑋 𝑡 , ˜ 𝑋 𝑡 , 𝑎 𝑡  =        1 , if 𝑎 𝑡 ∈ { 2 , 3 } , ˜ 𝑋 𝑡 + 1 = 𝑋 𝑡 , 1 , if 𝑎 𝑡 ∈ { 0 , 1 } , ˜ 𝑋 𝑡 + 1 = ˜ 𝑋 𝑡 , 0 , otherwise, (8) P { 𝑋 𝑡 + 1 | 𝑋 𝑡 } = 𝑝 𝑋 𝑡 , 𝑋 𝑡 + 1 , (9) P { 𝑄 𝑡 + 1 | 𝑄 𝑡 } = 𝑝 𝑄 𝑡 , 𝑄 𝑡 + 1 , (10) P  ¯ 𝑋 𝑡 + 1 | 𝑋 𝑡 , ˜ 𝑋 𝑡 , ¯ 𝑋 𝑡 , 𝑄 𝑡 , 𝑎 𝑡  =                  𝑞 ( 𝑄 𝑡 ) , if 𝑎 𝑡 = 3 , ¯ 𝑋 𝑡 + 1 = 𝑋 𝑡 𝑞 ( 𝑄 𝑡 ) , if 𝑎 𝑡 = 1 , ¯ 𝑋 𝑡 + 1 = ˜ 𝑋 𝑡 ¯ 𝑞 ( 𝑄 𝑡 ) , if 𝑎 𝑡 ∈ { 1 , 3 } , ¯ 𝑋 𝑡 + 1 = ¯ 𝑋 𝑡 1 , if 𝑎 𝑡 ∈ { 0 , 2 } , ¯ 𝑋 𝑡 + 1 = ¯ 𝑋 𝑡 0 , otherwise, (11) P { 𝑒 𝑡 + 1 | 𝑒 𝑡 , 𝑎 𝑡 } =        𝜇, 𝑒 𝑡 + 1 = min { 𝑒 𝑡 + 1 , 𝐸 } − 𝑐 ( 𝑎 𝑡 ) , ¯ 𝜇, 𝑒 𝑡 + 1 = 𝑒 𝑡 − 𝑐 ( 𝑎 𝑡 ) , 0 , otherwise, (12) where ¯ 𝑞 ( 𝑄 𝑡 ) = 1 − 𝑞 ( 𝑄 𝑡 ) , ¯ 𝜇 = 1 − 𝜇 , 𝑄 𝑡 ∈ 1 , 2 , 3 , . . . , 𝐾 and 𝑐 ( 𝑎 ) is the energy cost in respectiv e to action 𝑎 . Observation Function ( 𝑂 ) : The observation function spec- ifies the likelihood of observing 𝑜 𝑡 at time 𝑡 , conditioned on the current state 𝑠 and the action 𝑎 𝑡 − 1 ex ecuted in the previous time slot. It can be written as 𝑂 ( 𝑜 𝑡 | 𝑠 𝑡 , 𝑎 𝑡 − 1 ) = P { 𝑜 𝑡 | 𝑠 𝑡 , 𝑎 𝑡 − 1 } . (13) Cost Function ( 𝐶 ) : Follo wing [9], the cost function 𝐶 ( 𝑠 𝑡 ) , representing the cost at time slot 𝑡 is giv en by 𝐶 ( 𝑠 𝑡 ) = 𝑓 ( 𝑋 𝑡 , ˆ 𝑋 𝑡 ) , (14) where the estimate ˆ 𝑋 𝑡 depends on the most recently received sample on the monitor , ¯ 𝑋 𝑡 , and the AoI on the monitor , 𝛿 𝑡 . Initial Belief State ( 𝑏 0 ) : The initial belief state 𝑏 0 defines the probability distrib ution ov er all possible system states at the start of the process. A. MDP Reformulation As discussed in [19], the solution of a POMDP can be approached by reformulating it as a MDP through the in- troduction of a complete information state, I 𝑡 . The complete information state I 𝑡 is composed of three elements: the initial probability distrib ution ov er the state space, the sequence of observations up to time 𝑡 , i.e., { 𝑜 0 , 𝑜 1 , . . . , 𝑜 𝑡 } , and the set of actions executed by the controller until time 𝑡 − 1 , i.e., { 𝑎 0 , . . . , 𝑎 𝑡 − 1 } . Based on this representation, a belief state 𝑏 𝑖 𝑡 is defined at each time slot 𝑡 , which describes the probability that the system is in state 𝑖 , conditioned on the complete information state 𝐼 𝑡 . The belief state is 𝑏 𝑖 𝑡 ≜ P { 𝑋 𝑡 = 𝑖 | I 𝑡 } , 𝑖 = 1 , . . . , 𝑀 . (15) According to Proposition 1 in [9], given the current belief state 𝑏 𝑖 𝑡 and the chosen action 𝑎 𝑡 at time slot 𝑡 , the subsequent belief state 𝑏 𝑖 𝑡 + 1 is obtained by 𝑏 𝑖 𝑡 + 1 =  𝑏 𝑖 𝑡 𝑝 𝑖 𝑖 + Í 𝑗 𝑗 ≠ 𝑖 𝑏 𝑗 𝑡 𝑝 𝑗 𝑖 if 𝑎 𝑡 ∈ { 0 , 1 } , 𝑝 ˜ 𝑋 𝑡 + 1 𝑖 if 𝑎 𝑡 ∈ { 2 , 3 } , (16) where 𝑖 = 1 , . . . , 𝑀 and 𝑗 = 1 , . . . , 𝑀 . Then, in the transformation from the POMDP into an equiv alent MDP , the belief state is integrated into the original POMDP structure, and the components of the resulting MDP are defined as described next. State Space ( 𝑍 ) : The MDP state space is denoted by 𝑍 , and the state at a given time slot 𝑡 is represented by 𝑧 𝑡 as 𝑧 𝑡 ≜  𝑒 𝑡 , { 𝑏 𝑖 ( 𝑡 ) } 𝑖 = 1 , .. ., 𝑀 , ˜ 𝑋 𝑡 , ¯ 𝑋 𝑡 , 𝜃 𝑡 , 𝛿 𝑡 , 𝑄 𝑡  , (17) where  𝑏 𝑖 𝑡  𝑖 = 1 , .. ., 𝑀 denotes the belief distribution over the source states. Moreover , since the time horizon 𝑇 is infinite and the belief state  𝑏 𝑖 𝑡  𝑖 = 1 , .. ., 𝑀 is continuous, the state space 𝑍 is inherently infinite. Action Space ( 𝐴 ) : The action space of the belief MDP remains identical to that of the POMDP model. State T ransition Probabilities ( 𝑃 MDP ) : The transition prob- ability from the current state 𝑧 𝑡 to state 𝑧 𝑡 + 1 is giv en by P { 𝑧 𝑡 + 1 | 𝑧 𝑡 , 𝑎 𝑡 } =  𝑋 𝑡 ∈ X P { 𝑧 𝑡 + 1 | 𝑧 𝑡 , 𝑎 𝑡 , 𝑋 𝑡 } P { 𝑋 𝑡 | 𝑧 𝑡 , 𝑎 𝑡 } . (18) Here, the probability term P { 𝑧 𝑡 + 1 | 𝑧 𝑡 , 𝑎 𝑡 , 𝑋 𝑡 } can be obtained using (5) together with (16). In addition, the observation prob- ability is expressed as P { 𝑋 𝑡 | 𝑧 𝑡 , 𝑎 𝑡 } = 𝑏 𝑖 ( 𝑡 ) , for 𝑖 = 1 , . . . , 𝑀 . Cost Function ( 𝑟 ) : The cost function that ev aluates the expected cost at time slot 𝑡 , is expressed as 𝐶  𝑧 𝑡  =  𝑖 𝑏 𝑖 𝑡 𝑓  𝑖 , ˆ 𝑋 𝑡  , (19) where 𝑓  𝑖 , ˆ 𝑋 𝑡  denotes the cost obtained when the source state 𝑖 is estimated as ˆ 𝑋 𝑡 . Using these definitions, we focus on solving the belief MDP problem 𝜋 ∗ = arg min 𝜋 ∈ Π lim sup 𝑇 →∞ 1 𝑇 𝑇  𝑡 = 1 E { 𝐶 ( 𝑧 𝑡 ) } ! , (20) where 𝜋 ∗ is the optimal policy and Π is the set of all policies that satisfy the energy constraint (2). The presence of the continuous state variable { 𝑏 𝑖 ( 𝑡 ) } in (17) makes finding an optimal policy to the above problem extremely challenging since the associated state space Z is infinite. Therefore, the state space must be reduced to a finite set. According to Proposition 2 in [9], the belief at time slot 𝑡 , conditioned on observing ˜ 𝑋 𝑡 and 𝜃 𝑡 , is giv en by 𝑏 𝑖 𝑡 = 𝑝 𝜃 𝑡 ˜ 𝑋 𝑡 𝑖 , 𝑖 = 1 , . . . , 𝑀 , (21) where 𝑝 𝜃 𝑡 𝑖 𝑗 represents the ( 𝑖 , 𝑗 ) -th entry of the matrix 𝑃 𝜃 𝑡 . Then, as 𝜃 𝑡 increases, the matrix 𝑃 𝜃 𝑡 con verges to a steady-state, and consequently , the belief state 𝑏 𝑖 ( 𝑡 ) approaches the steady- state probabilities for all ˜ 𝑋 𝑡 and 𝑖 ∈ 𝑆 𝑋 . Similarly , since the computation of ˆ 𝑋 𝑡 depends on 𝑃 𝛿 𝑡 , the range of 𝛿 𝑡 can be bounded by a large constant 𝛿 max . W ith suitable choices of 𝜃 max and 𝛿 max , the belief MDP problem can be reformulated as finite-state MDP . As a result, the belief terms { 𝑏 𝑖 𝑡 } 𝑀 𝑖 = 1 can be removed from (17), as the belief is a function of 𝜃 𝑡 and the system state can be redefined as 𝑧 𝑡 ≜  𝑒 𝑡 , ˜ 𝑋 𝑡 , ¯ 𝑋 𝑡 , 𝜃 𝑡 , 𝛿 𝑡 , 𝑄 𝑡  . (22) Similarly to [20], one can show that the MDP with states in (22) is communicating under which the Bellman optimality equation 𝑈 ∗ ( 𝑧 𝑡 + 𝑉 ∗ ( 𝑧 𝑡 ) ) = min 𝑎 𝑡 ∈ 𝐴  𝐶 ( 𝑧 𝑡 ) +  𝑧 𝑡 + 1 ∈ 𝑍 P  𝑧 𝑡 + 1 | 𝑧 𝑡 , 𝑎 𝑡  𝑈 ∗ ( 𝑧 𝑡 + 1 )  , ∀ 𝑧 𝑡 ∈ 𝑍 , (23) exists and can be solved via R VIA. R VIA transforms the Bell- man’ s optimality equation into the following iterati ve process, such that for all 𝑧 ∈ Z and for an iteration index 𝑛 = 1 , 2 , . . . , we hav e 𝑉 𝑛 + 1 ( 𝑧 ) = min 𝑎 ∈ A        𝐶 ( 𝑧 ) +  𝑧 ′ ∈ Z Pr  𝑧 ′ | 𝑧 , 𝑎  ℎ 𝑛 ( 𝑧 ′ )        , ℎ 𝑛 ( 𝑧 ) = 𝑉 𝑛 ( 𝑧 ) − 𝑉 𝑛 ( 𝑧 ref ) . (24) A detailed description of R VIA can be found in, e.g., Algo- rithm 1 of [20]. I V . N U M E R I C A L E V A L UAT I ON In this section we ev aluate the system model under a few scenarios. W e assume a source with 𝑀 = 5 states, where its state transition matrix 𝑃 is gi ven in (25) 𝑃 =          0 . 1 0 . 6 0 . 2 0 . 05 0 . 05 0 . 8 0 . 05 0 . 05 0 . 03 0 . 07 0 . 0125 0 . 0125 0 . 95 0 . 0125 0 . 0125 0 . 1 0 . 7 0 . 1 0 . 05 0 . 05 0 . 005 0 . 3 0 . 0475 0 . 0475 0 . 6          . (25) The distortion matrix 𝑑 = [ 𝑑 𝑖 𝑗 ] is giv en in (26), where 𝑑 𝑖 𝑗 represents the distortion if ˆ 𝑋 𝑡 = 𝑖 is estimated while the source 𝑋 𝑡 = 𝑗 𝑑 =          0 383 183 70 529 611 0 714 419 281 55 271 0 88 611 327 967 245 0 491 818 820 30 258 0          . (26) In addition, we assume the transmission 𝜏 and sampling 𝜅 costs are 1, the battery capacity 𝐸 = 5 and the AoIs bounds 𝜃 max = 𝛿 max = 30 . Finally , we sho wcase two monitor motion transition matrix Q and their respecti ve RSR vectors 𝑞 𝑠 in (27) and (28) Q =  0 . 1 0 . 9 1 0  , 𝑞 𝑠 =  0 0 . 9  , (27) Q =            0 . 1 0 . 9 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0            , 𝑞 𝑠 =  0 0 . 25 0 . 6 0 . 9 0 . 6 0 . 25  . (28) The first represents a very basic scenario where the channel changes between good and bad, with a small chance of staying on the bad state. The second tries to mimic a LEO satellite behavior , where the success probability increases and decreases, like a satellite going from its rise moment, reaching its culmination (maximum elev ation) and setting afterwards. Note that, in both cases, we considered a chance (0.1) of the first state to remain in the same state. This represents the case without cov erage and no satellite appears in the follo wing slot. It is important to notice that the average RSR (steady-state) in both cases is 0.42, so that they can be compared fairly . Also, in the following analysis, as in [9], we call baseline the policy that follo ws the rule: If 𝑒 𝑡 ≥ 𝜏 + 𝜅 , then 𝑎 𝑡 = 3 , else 𝑎 𝑡 = 0 . This means the controller should sample and transmit if it has energy to it, or stay idle otherwise. The optimal policy is the one acquired through the R VI optimization. Finally , we compare the results of the Moving monitor with the case where the controller has no knowledge about the monitor motion, and thus, utilizes a Constant av erage RSR 𝑞 𝑠 at all time slots. Figure 2 depicts the a verage distortion for ML and MMD estimation as a function of the energy arri v al rate 𝜇 for the simplified motion monitor from (27). First, we can see that the optimal policy in the moving scenario produces a smaller distortion compared to any of the other cases. W e can attribute it to the policy being able to identify the best scenarios to trans- mit, where it produces distortion reduction going from 38% to 42% compared to the cases without channel knowledge. This is expected, as the motion matrix is simple, and with only two states, it can be straightforward to kno w that the controller should not transmit when the channel is at 𝑞 𝑠 = 0 . Howe ver , the difference between both baseline schemes is minimal, although the moving scenario might produce cases with more failures in a row due to bad choices to transmit. Finally , similar to the results in [9], the MMD estimation outperforms the ML estimation, mostly because it uses knowledge from the source Figure 2: A v erage distortion with (a) ML and (b) MMD estimations as a function of the energy arri val rate 𝜇 for different policies and motion model from (27). distortion to better estimate the states. Ho wev er , because of the simplified motion monitor matrix, when there is more energy av ailable in the optimal moving scenario, the monitor had to perform fe wer estimations, and thus the estimation technique has a smaller impact. Figure 3 shows a time window of 60 slots from a simulation run using the baseline and R VIA policies for 𝜇 = 0 . 9 . The energy arriv als and channel realizations are the same in both scenarios. Here becomes clear the adv antages of the R VIA in choosing the correct moments to transmit, avoiding the bad states (red background). Howe ver , some outages still occur , since the success probability in the good state (green background) is 0.9, as seen in 𝑞 𝑠 in (27). On the other hand, the baseline transmits whenev er possible and will e ventually transmit in the bad state. Figure 4 presents the same analysis as before, but for the motion monitor in (28). First to notice is that optimal policy underperforms compared to the simplified monitor , despite both having the same average RSR. This degradation occurs because the monitor matrix is larger , causing the system to remain in suboptimal transmission states for longer . Thus, the controller has to decide to transmit in states other than the 𝑞 𝑠 = 0 . 9 case, unlike in the pre vious scenario. The baseline performs slightly better for the same reason: it now encounters more opportunities to transmit when 𝑞 𝑠 ≠ 0 . In this setting, incorporating channel knowledge (mo ving) yields distortion reductions between 9% and 11% compared to the case without such knowledge (constant). V . C O N C L U S I O N This work presented a goal-oriented remote monitoring framew ork for mobile receiv er scenarios, extending previous stationary models to account for time-varying channels in- duced by receiver mobility . The system was formulated as a POMDP and solv ed via relati ve value iteration to obtain optimal sampling and transmission policies. Numerical results showed that exploiting receiv er motion knowledge reduces av erage distortion by 10% to 42%, depending on the motion model complexity . These gains highlight the value of incorpo- rating channel dynamics into policy optimization for energy- harvesting sensors in remote monitoring applications. Future Figure 3: Simulation time window of 60 slots for the baseline and R VIA policies for 𝜇 = 0 . 9 . Green and red background colors indicate the good and bad channel states, respectiv ely , as defined in (27). Green circles denote successful transmissions, while red crosses indicate transmission outages. Figure 4: A v erage distortion with (a) ML and (b) MMD estimations as a function of the energy arri val rate 𝜇 for different policies and motion model from (28). directions include in vestigating imperfect channel state infor- mation and extending the frame work to multi-sensor scenarios. A C K N O W L E D G E M E N T S This research was partially supported in Finland by the European Union through the Interreg Aurora project ENSURE-6G (Grant 20361812) and by the Research Coun- cil of Finland (RCF) through the projects 6G Flagship (Grant 369116), ECO-LITE (Grant 362782) and D YN AM- ICS (Grant 367702), in Brazil by CNPq (INCT STREAM 409179/2024-8, 305021/2021-4) and RNP/MCTI Brasil 6G (01245.020548/2021-07). R E F E R E N C E S [1] L. Liu, X. Guo, W . Liu, and C. Lee, “Recent progress in the energy harvesting technology–from self-powered sensors to self-sustained IoT, and ne w applications, ” Nanomaterials , vol. 11, no. 11, p. 2975, 2021. [2] T . M. Getu, G. Kaddoum, and M. Bennis, “ A survey on goal-oriented se- mantic communication: T echniques, challenges, and future directions, ” IEEE Access , 2024. [3] P . He, H. Lei, D. W u, R. W ang, Y . Cui, Y . Zhu, and Z. Y ing, “Non- terrestrial network technologies: Applications and future prospects, ” IEEE Internet of Things Journal , 2024. [4] G. K. Ijemaru, K. L.-M. Ang, and J. K. P . Seng, “Mobile collectors for opportunistic internet of things in smart city environment with wireless power transfer, ” Electronics , vol. 10, no. 6, 2021. [5] Z. W ei, M. Zhu, N. Zhang, L. W ang, Y . Zou, Z. Meng, H. Wu, and Z. Feng, “U A V-assisted data collection for internet of things: A survey , ” IEEE Internet of Things Journal , vol. 9, no. 17, pp. 15 460–15 483, 2022. [6] T . W ei, W . Feng, Y . Chen, C.-X. W ang, N. Ge, and J. Lu, “Hybrid satellite-terrestrial communication networks for the maritime internet of things: Key technologies, opportunities, and challenges, ” IEEE Internet of Things Journal , vol. 8, no. 11, pp. 8910–8934, 2021. [7] N. Pappas and M. K ountouris, “Goal-oriented communication for real- time tracking in autonomous systems, ” in 2021 IEEE International Confer ence on Autonomous Systems (ICAS) . IEEE, 2021, pp. 1–5. [8] A. Zakeri, M. Moltafet, M. Leinonen, and M. Codreanu, “Minimizing the AoI in resource-constrained multi-source relaying systems: Dynamic and learning-based scheduling, ” IEEE T ransactions on W ireless Com- munications , vol. 23, no. 1, pp. 450–466, 2024. [9] A. Zakeri, M. Moltafet, and M. Codreanu, “Goal-oriented remote tracking of an unobservable multi-state Markov source, ” in 2024 IEEE W ireless Communications and Networking Confer ence (WCNC) . IEEE, 2024, pp. 1–6. [10] E. Lutz, “ A Markov model for correlated land mobile satellite channels, ” International journal of satellite communications , vol. 14, no. 4, pp. 333–339, 1996. [11] O. Aboderin and I. A. Alimi, “Modeling land mobile satellite channel and mitigation of signal fading, ” American Journal of Mobile Systems, Applications and Services , vol. 1, no. 1, pp. 46–53, 2015. [12] S. Rougerie, F . Lacoste, and B. Montenegro-V illacieros, “Mobile satel- lite propagation channels for Ku and Ka band, ” in 2016 10th European Confer ence on Antennas and Pr opagation (EuCAP) . IEEE, 2016, pp. 1–5. [13] A. O. Akinniyi, A. T . T ola, and O. Olatunde, “Modelling of land mobile satellite channel to counter channel outage, ” Int. J. Distrib . P arallel Syst , vol. 8, pp. 1–21, 2017. [14] Y . Lee and J. P . Choi, “Performance ev aluation of high-frequency mobile satellite communications, ” IEEE Access , vol. 7, pp. 49 077– 49 087, 2019. [15] F . P . Fontan, M. V ´ azquez-Castro, C. E. Cabado, J. P . Garcia, and E. Ku- bista, “Statistical modeling of the LMS channel, ” IEEE T ransactions on vehicular technolo gy , vol. 50, no. 6, pp. 1549–1567, 2001. [16] M. Hui, D. Shen, Y . Cui et al. , “ A new fiv e-state Markov model for land mobile satellite channels, ” in 8th International Symposium on Antennas, Pr opagation and EM Theory , Kunming, China. Piscataway USA: IEEE Pr ess , 2008, pp. 1512–1515. [17] N. Privault, “Stochastic analysis of Bernoulli processes, ” 2008. [18] S. R. Eliason, Maximum likelihood estimation: Logic and practice . Sage Publications, 1993. [19] O. Sigaud and O. Buf fet, Markov decision pr ocesses in artificial intelligence . John Wile y & Sons, 2013. [20] A. Zakeri, M. Moltafet, and M. Codreanu, “Semantic-aware sampling and transmission in real-time tracking systems: A POMDP approach, ” IEEE T ransactions on Communications , vol. 73, no. 7, pp. 4898–4913, 2025.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment