Relaxed Efficient Acquisition of Context and Temporal Features

Relaxed Eicient Acquisition of Context and T emporal Features Y unni Qu 1 , Dzung Dinh 1 , Grant King 2 , Whitney Ringwald 3 , Bing Cai Kok 1 , Kathleen Gates 1 , Aidan W right 2 , Junier Oliva 1 1 University of North Carolina at Chapel Hill, Chapel Hill, NC, USA 2 University of Michigan, Ann Arbor , MI, USA 3 University of Minnesota T win Cities, Minneapolis, MN, USA quyunni@cs.unc.edu ddinh@cs.unc.edu grking@umich.edu wringwal@umn.edu bingcai@unc.edu kmgates@unc.edu aidangcw@umich.edu joliva@cs.unc.edu Abstract In many biomedical applications, measurements are not freely avail- able at inference time: each lab oratory test, imaging modality , or assessment incurs nancial cost, time burden, or patient risk. Lon- gitudinal active feature acquisition (LAF A) seeks to optimize pre- dictive performance under such constraints by adaptively select- ing measurements ov er time, yet the problem remains inher ently challenging due to temporally coupled decisions (missed early mea- surements cannot be r evisited, and acquisition choices inuence all downstream predictions). Moreover , real-world clinical workows typically begin with an initial onboarding phase, during which relatively stable contextual descriptors ( e.g., demographics or base- line characteristics) are collected once and subsequently condition longitudinal decision-making. Despite its practical imp ortance, the ecient selection of onboarding context has not been studied jointly with temporally adaptive acquisition. W e therefore pr op ose REACT (Relaxed Ecient Acquisition of Context and T emporal features), an end-to-end dierentiable framework that simultaneously optimizes (i) selection of onboarding contextual descriptors and (ii) adaptive feature–time acquisition plans for longitudinal measurements un- der cost constraints. REA CT emplo ys a Gumbel–Sigmoid relaxation with straight-through estimation to enable gradient-based optimiza- tion over discrete acquisition masks, allowing direct backpropaga- tion from prediction loss and acquisition cost. Across real-w orld longitudinal health and behavioral datasets, REA CT achieves im- proved predictive performance at lower acquisition costs compared to existing longitudinal acquisition baselines, demonstrating the benet of modeling onboarding and temp orally coupled acquisition within a unied optimization framework. 1 Introduction Longitudinal data—from digital health monitoring to b ehavioral assessments—create new opportunities for early detection and p er- sonalized decision-making [ 34 , 36 ]. In deployment, however , the challenge is not only prediction but also resource-constrained mea- surement at infer ence time : acquiring every variable at ev er y time point is often costly , burdensome, and unnecessar y . This motivates frameworks that adaptively decide what to measure under bud- get constraints , explicitly trading o predictive p erformance and ac- quisition cost rather than assuming fully obser ved inputs [ 32 , 33 , 39 ]. In practice, such methods better reect real longitudinal workows by prioritizing the most informative measurements at each occa- sion, deferring costly assessments until warranted, and stopping further collection when its expected value is low . Consider a mobile behavioral health application that provides just-in-time adaptiv e support for risk reduction. During onboarding , users report contextual descriptors such as demographics, clini- cal history , and baseline psychometric assessments. After enroll- ment , the application administers e cological momentary assess- ments (EMAs) at scheduled intervals to capture time-varying sig- nals such as aect, cravings, stressors, and so cial context for short- term risk forecasting [ 34 ]. In both phases, however , exhaustive data collection is costly and burdensome: long onboarding surveys can deter engagement and limit adoption [ 1 , 30 ], while frequent or repetitive EMAs can induce fatigue , reduce adherence , and degrade response quality [ 3 ]. Rather than collecting all onb oard- ing descriptors and EMA measurements, our approach learns an adaptive acquisition policy that decides what to measure and when , explicitly balancing predictive accuracy against user bur den and measurement cost over time . This same two-stage structure appears more broadly across healthcare onboarding and follow-up. Intake gathers contextual descriptors such as demographics, family history , comorbidities, medication lists, and baseline screening instruments, while subse- quent care relies on longitudinal data collection tailored to the evolv- ing clinical picture, including follow-up labs, imaging, symptom inventories, and specialist referrals. Exhaustive onboarding can in- crease administrative burden, prolong visits, raise costs, and reduce completion or response quality [ 1 , 30 ]. Likewise, repeatedly acquir- ing all possible follow-up measur ements is costly , time-intensive, and strains limited clinical resources, including sta time, imaging capacity , and diagnostic e quipment. More generally , many health- care and bioinformatic applications inv olve one-time acquisition of relatively stable contextual descriptors follow e d by longitudinal measurement decisions over time. These settings motivate methods that prioritize the most informative and cost-eective acquisitions across both phases. T o address this problem, we propose REACT ( R elaxed E cient A cquisition of C ontext and T emporal features), a relaxation-based framework for longitudinal active featur e acquisition (LAF A) that makes cost-aware acquisition decisions for both a priori contextual descriptors and temp orally acquired features . Our contributions are: • Formalizing Onb oarding + Longitudinal AF A. W e intro- duce a practical LAF A formulation that explicitly distinguishes one-time contextual descriptors acquired at onb oarding from se- quential, time-indexed measurements acquir e d during follow-up , better reecting real-world clinical workows. Prev Acq. Cntx Desc Features … Context Descriptors Cntx Desc Acquire (t=2) Features Acquire (t=8) Features Next Acquisition Next Acq. Next T ime Next T ime Next Acq. Next T ime Planner Planner Planner Select Onboarding (t=0) … … ^ y 1 ^ y 2 ^ y 3 ^ y 4 ^ y 5 ^ y 6 ^ y 7 ^ y 8 ^ y 9 ^ y 10 Figure 1: Over view of REACT . REACT rst p erforms a one-time onb oarding acquisition of contextual descriptors at 𝑡 = 0 . Shaded feature entries denote acquired measurements. Using acquired context and prior longitudinal observations, the planner then adaptively decides both when to acquire next and which features to obtain. Between acquisition steps, the model predicts temporal labels without colle cting additional measurements, yielding p ersonalized, nonuniform acquisition trajectories. • Joint Context– T emporal Acquisition Learning. W e de- velop a unie d policy that jointly learns (i) which contextual de- scriptors to acquire upfront and (ii) a structur e d feature × time ac- quisition plan over the longitudinal horizon, optimizing predictive performance under explicit cost constraints. • Dierentiable Discrete Acquisition via Relaxation. W e enable end-to-end training of discrete acquisition decisions using Gumbel–Sigmoid relaxations [ 19 ] with straight-through gradients, allowing gradient-based optimization of both the acquisition policy and predictive model with a self-iterative training procedure. • Empirical Accuracy–Cost Gains. Across real-w orld longi- tudinal datasets, we demonstrate consistent improvements in the accuracy–cost tradeo relative to competitive baselines. 2 Related W ork Existing machine learning appr oaches address important aspects of cost-aware feature acquisition, but key gaps remain for longitudinal settings. Classical feature sele ction typically op erates in a static regime, selecting a population-level subset of variables rather than learning subject-specic acquisition strategies that adapt to previ- ously observed temporal values [ 12 , 21 , 38 ]. Likewise, active learn- ing fo cuses primarily on acquiring labels during training, rather than acquiring features at inference time [ 8 , 14 , 18 ]. T o position REA CT , we review prior work on test-time feature acquisition and highlight limitations in handling the two-phase structure of con- textual onboarding followed by longitudinal monitoring. 2.1 Active Feature A cquisition Active Featur e Acquisition (AF A) [ 32 , 33 ] studies test-time predic- tion when features are not freely available and each measurement incurs a cost. The goal is to sequentially sele ct which features to acquire for a given instance, adapting decisions to previously ob- served values. Many recent appr oaches cast AF A as a r einforcement learning problem [ 10 , 35 , 39 ], but RL-based training can be dicult to optimize due to the large action space and challenging credit assignment [ 16 , 37 ]. Some metho ds assuage issues with RL by using generative surrogates to impute missing features and scor e can- didate acquisitions [ 16 ], or by relying on greedy acquisition rules [ 4 , 7 , 17 ], which may fail to capture interactions among feature groups whose utility depends on joint acquisition. T o address these limitations, V alancius et al . [37] propose a non-parametric oracle- based method, Norclie et al . [23] optimize feature acquisition in a stochastic latent space via an expected gradient-based obje ctive, and Ghosh and Lan [6] use a dierentiable policy for end-to-end training of both the acquisition policy and predictor . However , con- ventional AF A assumes that features are static once acquired and does not model measurements that evolve ov er time. 2.2 Longitudinal Active Feature Acquisition Longitudinal active feature acquisition (LAF A) extends AF A by re- quiring the agent to decide not only what to acquir e, but also when to acquire it [ 15 , 32 ]. This introduces additional challenges, since missed measurements at earlier time points may become perma- nently unavailable. Several recent works cast LAF A as a Markov Decision Process (MDP) [ 15 , 27 ]. For example, ASA C [ 40 ] uses an actor–critic architecture to jointly learn acquisition and prediction, while Qin et al . [27] study continuous-time acquisition policies for timely prediction of adverse outcomes. Related approaches also optimize acquisition timing, but may assume that all measur ements available at a selected time point are acquired together [22]. As in these sequential formulations, we consider a nite decision horizon with a bounded number of acquisition opp ortunities. Although the MDP formulation is natural, RL-based LAF A can be dicult to optimize in practice. The action space is combinatorial, since the policy must jointly determine which features to acquire and when . The state evolves as new measurements are colle cted, further complicating p olicy learning. In addition, supervision is typ- ically provided only through downstr eam predictive performance, creating a dicult credit-assignment problem for individual acqui- sitions and decision times [16]. Beyond these optimization challenges, prior LAF A formulations generally treat all features uniformly and do not explicitly model the common two-phase workow in which relatively stable contextual descriptors are acquired once at onboarding and then inform subse- quent temporal acquisition decisions. In contrast, we study a practi- cal LAF A setting that separates a priori contextual descriptors from temporally acquired measurements. W e then develop a relaxation- based framew ork, REA CT , that avoids standard RL training by Relaxed Eicient Acquisition of Context and T emp oral Features using Gumbel-Sigmoid relaxations with straight-through gradients to enable end-to-end optimization of discrete acquisition decisions. 3 Method In longitudinal active feature acquisition (LAF A), an agent must se- quentially choose which measurements to obtain over time, trading predictive accuracy against acquisition cost . W e propose REACT , an end-to-end framework that jointly learns (i) a one-time selection of onboarding context and (ii) a policy for patient-personalized longitudinal acquisitions. Rather than relying on standard r ein- forcement learning, REA CT uses Gumb el-Sigmoid relaxation with straight-through gradients [ 19 ] to enable dierentiable optimization of discrete acquisition decisions. The model comprises thr e e jointly trained components—a global Context Selector , an adaptive Longitu- dinal P lanner , and a Predictor —optimized under a unied obje ctive that balances prediction loss and acquisition cost. W e next formalize the LAF A setting and describe the optimization of each component. 3.1 Problem Formulation W e consider a training dataset D = { ( 𝑠 ( 𝑖 ) , 𝑥 ( 𝑖 ) , 𝑦 ( 𝑖 ) ) } 𝑁 𝑖 = 1 , where 𝑠 ( 𝑖 ) ∈ R 𝑑 𝑠 denotes the onboarding contextual descriptors for in- stance 𝑖 , 𝑥 ( 𝑖 ) ∈ R 𝑇 × 𝑑 denotes its temporal features across 𝑇 discrete timesteps, and 𝑦 ( 𝑖 ) = ( 𝑦 ( 𝑖 ) 1 , . . . , 𝑦 ( 𝑖 ) 𝑇 ) denotes the corresponding tar- get sequence with 𝑦 ( 𝑖 ) 𝑡 ∈ { 1 , . . . , 𝐶 } . For brevity , we omit the sup er- script ( 𝑖 ) when discussing a single instance. While alternativ e mask- ing schemes could be used—e.g., by passing acquisition masks ex- plicitly to downstream models—we adopt simple elementwise mask- ing for notational conv enience. W e no w describe the iterative acqui- sition process, which begins with onboarding at 𝑡 = 0 and continues until the planner terminates, ∅ , or the nal timepoint is reached. Onboarding and First T emporal A cquisition . At onboard- ing (see Fig. 1), REACT’s Context Selector outputs a static, non- personalized binary mask 𝑚 𝑠 ∈ { 0 , 1 } 𝑑 𝑠 specifying which contex- tual descriptors to acquire at cost, yielding ˜ 𝑠 = 𝑚 𝑠 ⊙ 𝑠 , where ⊙ denotes element-wise multiplication. Conditioned on the obser ved values in ˜ 𝑠 , the planner then selects b oth the next acquisition time- point, 𝑡 next , and the temporal feature mask 𝑚 next ∈ { 0 , 1 } 𝑑 for the rst temp oral acquisition. Thus, the rst temp oral acquisition is personalized through the instance’s obser ved contextual descrip- tors. For any 𝑡 < 𝑡 next , predictions ˆ 𝑦 𝑡 are made using only ˜ 𝑠 . When 𝑡 = 𝑡 next , the selected temporal features are acquired at cost, yield- ing ˜ 𝑥 𝑡 next = 𝑚 next ⊙ 𝑥 𝑡 next , and prediction proceeds using b oth ˜ 𝑠 and ˜ 𝑥 𝑡 next . The planner then repeats this process, iteratively sele cting subsequent acquisition times and feature subsets as describ ed below . Iterative Acquisition Process . At each decision step, the plan- ner selects either the next acquisition ( 𝑡 next , 𝑚 next ) or termination ∅ in a dynamic, instance-specic manner base d on the informa- tion acquired so far (see Fig. 1). Sp ecically , it conditions on the contextual descriptors ˜ 𝑠 and the masked temporal history H 𝑡 = ( ˜ 𝑥 1 , . . . , ˜ 𝑥 𝑡 , 0 , . . . , 0 ) ∈ R 𝑇 × 𝑑 , where unacquired or skipped timepoints remain zero. Thus, RE- A CT’s planner 𝜋 maps 𝜋 (H 𝑡 , ˜ 𝑠 , 𝑡 ) → ( 𝑡 next , 𝑚 next ) or 𝜋 (H 𝑡 , ˜ 𝑠 , 𝑡 ) → ∅ . As before, for all 𝑡 ′ < 𝑡 next (or all remaining 𝑡 ′ ≤ 𝑇 if the planner terminates), predictions ˆ 𝑦 𝑡 ′ are made using only the information acquired so far , namely ( H 𝑡 , ˜ 𝑠 ) . If the planner does not terminate, time advances to 𝑡 ← 𝑡 next , the selected temporal features are acquired, yielding ˜ 𝑥 𝑡 next = 𝑚 next ⊙ 𝑥 𝑡 next , and ˆ 𝑦 𝑡 next is predicted from the updated histor y H 𝑡 next = ( ˜ 𝑥 1 , . . . , ˜ 𝑥 𝑡 next , 0 , . . . , 0 ) together with ˜ 𝑠 . The pr o cess then repeats. Prediction Model . At timestep 𝑡 , a predictor network 𝑓 𝜙 : R 𝑇 × 𝑑 × R 𝑑 𝑠 × { 1 , . . . , 𝑇 } → R 𝐶 takes the temporal history H 𝑡 , acquired context ˜ 𝑠 , and target timesteps 𝑡 ′ ≥ 𝑡 as input, and outputs class probabilities ˆ 𝑦 𝑡 ′ = 𝑓 𝜙 ( H 𝑡 , ˜ 𝑠 , 𝑡 ′ ) . (1) 3.2 The REACT Obje ctive Our goal is to acquire contextual and temp oral features cost-eciently while maintaining predictive accuracy and improving the informa- tion available for future acquisition decisions. Thus, an acquisition is valuable not only for its immediate predictive benet, but also for how it improv es later planning and prediction. T o this end, we directly train a context selector and planner to output binary acquisition masks that optimize the accuracy–cost tradeo. W e rst dene acquisition cost abstractly , allowing it to capture monetary cost, time, patient burden, risk, or combinations thereof. Let 𝑐 𝑠 ∈ R 𝑑 𝑠 + denote the one-time costs of the contextual descriptors, and 𝑐 𝑥 ∈ R 𝑑 + denote the per-timep oint costs of the temporal features. For an acquisition trajectory { 𝑚 𝑡 } 𝑇 𝑡 = 1 , the total cost is Cost ( 𝑚 𝑠 , { 𝑚 𝑡 } 𝑇 𝑡 = 1 ) = 𝑐 ⊤ 𝑠 𝑚 𝑠 | {z } contextual + 𝑇  𝑡 = 1 𝑐 ⊤ 𝑥 𝑚 𝑡 | {z } temporal . (2) Using the predictor in Eq. (1) , we obtain ˆ 𝑦 𝑡 from the accrued his- tory up to time 𝑡 . This yields the following accuracy–cost tradeo objective: 𝑇  𝑡 = 1 L pred ( ˆ 𝑦 𝑡 , 𝑦 𝑡 ) + 𝜆 𝑐 ⊤ 𝑠 𝑚 𝑠 + 𝑇  𝑡 = 1 𝑐 ⊤ 𝑥 𝑚 𝑡 ! , (3) where 𝜆 > 0 is an application-sp ecic parameter controlling the tradeo b etween super vised prediction loss L pred (e .g., cross-entrop y) and the cost of contextual and temporal acquisitions. 3.3 The REACT Mo del REA CT is an end-to-end framework that jointly learns a cost- eective subset of onboarding descriptors and personalize d tempo- ral acquisition plans under the objective in Eq. (3) . The architecture consists of three interacting modules: • Context Sele ctor 𝛼 ∈ R 𝑑 𝑠 : a learned vector 𝛼 that param- eterizes a binar y context mask 𝑚 𝑠 ∈ { 0 , 1 } 𝑑 𝑠 at 𝑡 = 0 , sp ecifying which onboarding descriptors to acquire for the best downstream accuracy–cost tradeo. • Longitudinal P lanner 𝜋 𝜃 : a neural network that takes the temporal history up to 𝑡 , the current timestep 𝑡 , and the acquired onboarding context ˜ 𝑠 as input, and outputs logits that parameterize future temporal acquisition masks 𝑚 𝑡 ′ ∈ { 0 , 1 } 𝑑 for 𝑡 ′ > 𝑡 . • Predictor 𝑓 𝜙 : a classication network as dene d in subse c- tion 3.1. For a target timestep 𝑡 ′ ≥ 𝑡 , it maps the acquired context ˜ 𝑠 and temporal history H 𝑡 to class probabilities ˆ 𝑦 𝑡 ′ ∈ R 𝐶 . The Challenge of Discrete Decisions . While this architecture intuitively models the two-phase acquisition process, jointly train- ing the parameters 𝛼 , 𝜃 , and 𝜙 presents a fundamental optimiza- tion challenge. The target masks 𝑚 𝑠 and 𝑚 𝑡 are discrete variables (i.e., binary masks). These discrete variables are not dierentiable, making it impossible to route gradients from the downstream pre- diction loss and acquisition cost back to the planner 𝜋 𝜃 and context selector 𝛼 using standard backpropagation. Therefore, to enable end-to-end training of this frame work, we employ Gumbel-Sigmoid relaxation [19] with a straight-through estimator (Sec. 3.4). 3.4 Dierentiable Optimization via Relaxation T o handle discrete acquisition decisions, REA CT replaces direct binary sampling with the Gumb el-Sigmoid relaxation [ 9 , 19 ], en- abling gradients from the prediction loss and acquisition cost to ow through the acquisition masks. For any logit 𝑙 𝑗 produced by 𝛼 or 𝜋 𝜃 , we inject stochasticity by sampling Gumbel noise 𝑔 𝑗 = − log  − log 𝑢 𝑗  , 𝑢 𝑗 ∼ Uniform ( 0 , 1 ) . (4) Given temperature 𝜏 > 0 , w e then compute a continuous relaxation ˜ 𝑚 𝑗 ∈ ( 0 , 1 ) as ˜ 𝑚 𝑗 = 𝜎  𝑙 𝑗 + 𝑔 𝑗 𝜏  . (5) During the forward pass, we discretize via thresholding, ˆ 𝑚 𝑗 = I [ ˜ 𝑚 𝑗 > 0 . 5 ] . (6) T o retain dierentiability during backpropagation, we use the straight- through estimator G ( 𝑙 𝑗 ) = ˜ 𝑚 𝑗 + sg ( ˆ 𝑚 𝑗 − ˜ 𝑚 𝑗 ) , (7) where sg ( ·) denotes the stop-gradient operator , so ˆ 𝑚 𝑗 − ˜ 𝑚 𝑗 is treated as constant during backpropagation. Thus, G ( 𝑙 𝑗 ) = ˆ 𝑚 𝑗 in the for- ward pass, while gradients are taken with respect to the continuous proxy ˜ 𝑚 𝑗 in the backward pass. Applying G ( · ) element-wise to the 𝑑 𝑠 contextual logits yields the binary context mask ˆ 𝑚 𝑠 ∈ { 0 , 1 } 𝑑 𝑠 , and applying it to the 𝑑 temporal logits at timestep 𝑡 yields the binar y temporal mask ˆ 𝑚 𝑡 ∈ { 0 , 1 } 𝑑 . Dierentiable Loss . W e train the context selector and planner (see Fig. 2) directly to produce future acquisition plans that optimize the accuracy–cost objective in Eq. (3) . Consider an instance with temporal features 𝑥 ∈ R 𝑇 × 𝑑 , contextual descriptors 𝑠 ∈ R 𝑑 𝑠 , labels 𝑦 ∈ { 1 , . . . , 𝐶 } 𝑇 , and a given temporal mask M prev 𝑡 ∈ { 0 , 1 } 𝑡 × 𝑑 representing an observation state at time 𝑡 . For notation, let EX 𝑇 ( 𝑣 ) denote zero-padding of 𝑣 ∈ R 𝑡 × 𝑑 to length 𝑇 , let K > 𝑡 ( 𝑣 ) set entries at times 1 , . . . , 𝑡 to zero while keeping values after 𝑡 , and let K ≤ 𝑡 ( 𝑣 ) set entries at times 𝑡 + 1 , . . . , 𝑇 to zero while ke eping values up to and including 𝑡 . W e then dene the relaxed future acquisition plan 𝑃 > 𝑡 = K > 𝑡  G  𝜋 𝜃 ( EX 𝑇 ( M prev 𝑡 ) ⊙ 𝑥 , 𝑡 , ˜ 𝑠 )   , Future Plan Cntx Desc Features Features Observed (t=3) Planner Features Output Mask t next = 6 Prev Features Figure 2: Longitudinal planner . Given the onb oarding con- text and longitudinal measurements observed up to time 𝑡 , the planner outputs a binar y acquisition mask over future feature–time pairs. The left portion of the mask corresponds to previously acquir ed measurements (unused), while the right portion species the future acquisition plan, judged against the cost-b enet objective. The earliest selected future time denes the next acquisition time, 𝑡 next . the induced mask available up to time 𝑡 ′ > 𝑡 as 𝑀 ≤ 𝑡 ′ = EX 𝑇 ( M prev 𝑡 ) + K ≤ 𝑡 ′ ( 𝑃 > 𝑡 ) , and the relaxed context mask as ˆ 𝑚 𝑠 = G ( 𝛼 ) . The total relaxed loss for planning after time 𝑡 is L REACT ( 𝛼 , 𝜃 , 𝜙 ; 𝑥 , 𝑠 , 𝑦, 𝑡 , M prev 𝑡 ) = 𝑇  𝑡 ′ = 𝑡 + 1 L pred  Prediction at 𝑡 ′ z }| { 𝑓 𝜙 ( 𝑀 ≤ 𝑡 ′ ⊙ 𝑥 , ˆ 𝑚 𝑠 ⊙ 𝑠 , 𝑡 ′ ) , 𝑦 𝑡 ′  | {z } Prediction Loss for Plan + 𝜆 1 ⊤ 𝑇 ( 𝑃 > 𝑡 ) 𝑐 𝑥 | {z } T emporal Acquisition Cost + 𝜆 𝑐 ⊤ 𝑠 ˆ 𝑚 𝑠 . | {z } Context Acquisition Cost (8) This relaxed objective enables stable joint gradient updates of 𝛼 , 𝜃 , and 𝜙 . Notably , 𝛼 is traine d not only to improve prediction directly , but also to sele ct onboarding context that supports better future acquisition decisions by the planner 𝜋 𝜃 . Moreover , Eq. (8) corresponds to a relaxed form of the cost–b enet objective in Eq. (3) . Below , we describ e the training procedure, which uses mini-batches and dynamic rollouts of the current planner to generate observation states for minimizing Eq. (8). 3.5 Training Proce dure Self-Iterative Training . The planner is trained via self-iterative training , where the current planner rolls out on-policy trajectories that are then reused as training states. This allows the planner to improve beyond the oine reference plans by training on states it induces itself—similar in spirit to DAgger-style iterative imi- tation learning [ 31 ], but driven by direct gradient optimization with ground-truth lab els rather than imitation targets. T raining alternates between the following steps (details in Algorithm 1): (1) On-policy rollout: sample a batch B from D train , and execute the current planner 𝜋 𝜃 and context selector 𝛼 to collect trajectories; (2) Joint gradient update: compute L from Eq. (8) and jointly update 𝛼 , 𝜃 , and 𝜙 by backpropagating through the ST -Gumb el- Sigmoid gates. Note that the rollouts in Algorithm 1 may contain Relaxed Eicient Acquisition of Context and T emp oral Features Algorithm 1 Self-Iterative Training of REA CT Require: Training data D train , planner 𝜋 𝜃 , context logits 𝛼 , pre- dictor 𝑓 𝜙 , learning rate 𝜂 . 1: for iteration 𝑖 = 1 to 𝑁 do 2: Sample batch B ∼ D train 3: // Step 1: For ward pass to roll out current policy to collect on-policy states 4: for each ( 𝑥 , 𝑠 , 𝑦 ) ∈ B do 5: ˆ 𝑚 𝑠 ← G ( 𝛼 ) ; ˜ 𝑠 ← ˆ 𝑚 𝑠 ⊙ 𝑠 6: Initialize ˆ H 0 ← 0 ∈ R 𝑇 × 𝑑 , M prev 0 ← 0 ∈ R 𝑇 × 𝑑 7: for 𝑡 = 1 to 𝑇 do 8: ˆ 𝑚 𝑡 ←G ( 𝜋 𝜃 ( ˆ H 𝑡 − 1 ; 𝑡 ; ˜ 𝑠 ) ) [ 𝑡 ] // query planner output for 𝑡 based on past history 9: ˆ 𝑥 𝑡 ← ˆ 𝑚 𝑡 ⊙ 𝑥 𝑡 10: ˆ H 𝑡 ← ˆ H 𝑡 − 1 ; ˆ H 𝑡 [ 𝑡 ] ← ˆ 𝑥 𝑡 // up date histor y 11: M prev 𝑡 ← M prev 𝑡 − 1 ; M prev 𝑡 [ 𝑡 ] ← ˆ 𝑚 𝑡 // up date history mask 12: end for 13: Store trajectory 𝜏 = { ( 𝑥 , 𝑠 , 𝑦, 𝑡 , M prev 𝑡 ) } 𝑇 𝑡 = 0 in B roll 14: end for 15: // Step 2: Compute loss over on-policy trajectories and the ground truth label y 16: L = 1 | B roll |  ( 𝑥 ,𝑠, 𝑦,𝑡 , M prev 𝑡 ) ∈ B roll L REACT ( 𝛼 , 𝜃 , 𝜙 ; 𝑥 , 𝑠 , 𝑦, 𝑡 , M prev 𝑡 ) 17: // Step 3: Update w/ gradient descent (or other gradient based optimizer ) 18: 𝛼 ← 𝛼 − 𝜂 ∇ 𝛼 L ; 𝜃 ← 𝜃 − 𝜂 ∇ 𝜃 L ; 𝜙 ← 𝜙 − 𝜂 ∇ 𝜙 L 19: end for skipped timepoints or early termination when the planner outputs zero-masks ( line 8 ). 3.6 Inference- Time A cquisition Algorithm At inference time, REA CT operates sequentially as summarize d in Algorithm 2. Gumbel noise is removed, and all acquisition masks are obtained deterministically by hard-thresholding the learned logits. The process begins at onboarding ( 𝑡 = 0 ), where the learned context mask 𝑚 𝑠 = I [ 𝜎 ( 𝛼 ) > 0 . 5 ] determines which contextual features to acquire. The adaptive longitudinal phase then proceeds over 𝑡 ∈ { 1 , . . . , 𝑇 } . At each step, the planner 𝜋 𝜃 evaluates the cur- rent history H 𝑡 together with the acquired context ˜ 𝑠 to produce a future temporal acquisition plan 𝑀 . REA CT acquires features only at the next selected timep oint, skipping acquisition at interme diate timesteps, and may terminate early if no further acquisitions ar e planned. Predictions ˆ 𝑦 𝑡 are made at each timestep using the informa- tion available up to 𝑡 , and the process continues until termination or the horizon 𝑇 is reached. 4 Experiment 4.1 Datasets W e evaluate REA CT on four real-world longitudinal datasets span- ning two application families: behavioral longitudinal datasets (CHEEARS, ILIADD) and clinical longitudinal datasets (OAI, ADNI). Full dataset and feature details can be found in Appx. E. Algorithm 2 REACT Inference- Time Acquisition Require: trained context selector 𝛼 ; trained planner 𝜋 𝜃 ; predictor 𝑓 𝜙 . Ensure: Predictions { ˆ 𝑦 𝑡 } 𝑇 𝑡 = 1 1: Initialize: predictions ← [ ] 2: Initialize acquired history: H ← ( 0 , . . . , 0 ) ∈ R 𝑇 × 𝑑 3: // Stage 1: Contextual descriptor acquisition 4: 𝑚 𝑠 ← I [ 𝜎 ( 𝛼 ) > 0 . 5 ] 5: ˜ 𝑠 ← 0 ; Acquire s 𝑗 for 𝑚 𝑠 , 𝑗 > 0 6: M ← I  𝜎  𝜋 𝜃 ( H , 0 , ˜ 𝑠 )  > 0 . 5  7: 𝑡 next ← min { 𝑡 ′ | ∥ M [ 𝑡 ′ ] ∥ > 0 } // next time with nonzero mask 8: 𝑚 next ← M [ 𝑡 next ] // next acquisition mask 9: // Stage 2: Adaptive temp oral acquisition and prediction 10: for 𝑡 = 1 to 𝑇 do 11: if 𝑡 = = 𝑡 next then 12: // Acquire and replan 13: x ← 0 14: Acquire x 𝑗 for 𝑚 next 𝑗 > 0 15: H [ 𝑡 ] ← x 16: M ← K > 𝑡 h I  𝜎  𝜋 𝜃 ( H , 𝑡 , ˜ 𝑠 )  > 0 . 5  i // future mask plan 17: 𝑡 next ← min { 𝑡 ′ | ∥ M [ 𝑡 ′ ] ∥ > 0 } // or ∅ for termination 18: 𝑚 next ← M [ 𝑡 next ] 19: end if 20: // Predict 21: ˆ 𝑦 𝑡 ← 𝑓 𝜙 ( H , ˜ 𝑠 , 𝑡 ) 22: append ˆ 𝑦 𝑡 to predictions 23: end for 24: Return predictions 4.1.1 Behavioral Longitudinal Datasets. CHEEARS . The CHEEARS dataset [ 28 ] comprises ecological momentary assessment (EMA) data from 204 college students. Each participant completed daily surveys assessing aect, drinking be- haviors, and social context. For this study , we divided the data into sliding windows of 10 conse cutive days and set the target to be next-day drinking behavior prediction (binary classication). W e split the data temporally by a xed cuto date, such that training instances correspond to observations before the cuto and test instances correspond to observations after it. The onb oarding data contains demographics, AUDI T alcohol screening scores, drinking motives (DMQ: social, coping, enhancement, conformity), alcohol consequences (Y AA CQ: 9 subscales), p ersonality traits (NEO Big Five), and interp ersonal functioning (IIP: dominance, aliation, elevation). The temporal data contains daily aect (10 items: happy , stressed, anxious, etc.), drinking urges/plans/quantity expectations, and social experiences. W e assigned an equal feature cost of 1 to all features. The dataset is split into train/val/test ( 60 . 6% , 18 . 9% , 20 . 5% ). ILIADD . The ILIADD (Intensive Longitudinal Investigation of Alternative Diagnostic Dimensions) [ 29 ] dataset is a fully remote ambulatory assessment study ( 𝑁 = 544 participants with ≥ 5 EMA surveys completed) for mental health treatment histor y (81% report- ing prior tr eatment). The contextual features consist of demograph- ics and scales derived from the HiTOP-SR, factor scales, symptom scales, and alcohol use scales. In the study , this set of questions was prompted to participants 8 times per day . Therefore, we used these collected answers throughout the day as temporal featur es; these assessed momentary p ositive aect, energy , stress, and impulsivity (4 items). W e split the data temporally by a xed cuto date/time. For this study , we divided the data into conse cutive sliding windows of 10 time steps, and the label negative aect is predicte d for each timepoint. W e assigned an equal feature cost of 1 to all features. The dataset is split into train/val/test ( 69 . 5% , 15 . 2% , 15 . 2% ). 4.1.2 Clinical Longitudinal Datasets. O AI . The Osteoarthritis Initiative (OAI) 1 is a longitudinal cohort that tracks knee osteoarthritis progression with annual follow-up up to 96 months for 4 , 796 patients using imaging and clinical assess- ments. W e select 𝑑 𝑠 = 10 contextual features (e.g., sex, race, age) and 𝑑 = 17 temp oral features (including clinical measurements and 10 joint space width - JSW - features e xtracte d from knee radiography) over 𝑇 = 7 visits. Following Chen et al . [2] , Nguyen et al . [22] , we consider two prediction targets at each visit: (i) Kellgr en–Lawrence grade (KLG [ 11 ]; range 0 − 4 ), where we merge grades 0 and 1, and (ii) W OMA C pain [ 20 ] (range 0 − 20 ), wher e we dene W OMAC < 5 as no pain and ≥ 5 as pain. W e assigned lower costs to low-eort questionnaire/clinical variables (e.g., 0 . 3 - 0 . 5 ) and higher costs to JSW extracted from knee radiograph (e.g., 0 . 8 - 1 . 0 ). Following Chen et al . [2] , we split the dataset into train/val/test ( 50% , 12 . 5% , 37 . 5% ). ADNI . The Alzheimer’s Disease Neuroimaging Initiative (ADNI) 2 [ 26 ] is a longitudinal, multi-center , observational study for tracking Alzheimer’s disease progression. Following [ 27 ], we use a bench- mark of 𝑁 = 1 , 002 participants with regular follow-up every six months and consider the rst 𝑇 = 12 visits. The onboarding context consists of 𝑑 𝑠 = 7 descriptors (e .g., age, gender , Functional Activities Questionnaire), while the temporal feature bank contains 𝑑 = 4 imaging biomarkers: FDG and A V45 fr om PET , and Hippocampus and Entorhinal from MRI. At each visit, the goal is to predict the patient’s current disease status as normal cognition, mild cognitive impairment, or Alzheimer’s disease [ 24 , 25 ]. Context acquisition costs are set to 0 . 3 , and higher costs are assigned to PET biomark- ers ( 1 . 0 each) and MRI biomarkers ( 0 . 5 each) to reect the greater expense and burden of PET and MRI imaging. Following Qin et al . [27], we split the dataset into train/val/test ( 64% , 16% , 20% ). 4.2 Baselines W e compare our framework against several strong AF A baselines, fo- cusing on RL-based metho ds for longitudinal settings. These include ASA C [ 40 ], an actor-critic approach that jointly optimizes feature selection and prediction; RAS [ 27 ], which dynamically determines acquisition timing and feature subsets over continuous time; and its variant AS [27], which enforces uniform acquisition inter vals. W e also include DIME [ 5 ], a r epresentative non-longitudinal AF A framework. T o adapt it to our setting, we restrict its action space to the current and future timesteps only . This gives DIME 1 https://nda.nih.gov/oai/ 2 https://adni.loni.usc.edu/ an advantage, as it may acquire multiple featur es at the pr esent timestep before advancing. Importantly , existing baselines do not natively distinguish the on- boarding context 𝑠 from the temporal measur ements 𝑥 𝑡 . T o evaluate them fairly , we modify their observation space at every timestep 𝑡 to include static features as 𝑥 ′ 𝑡 = [ 𝑠 , 𝑥 𝑡 ] ∈ R 𝑑 𝑠 + 𝑑 . Thus, the baselines can acquire the context at any time if they missed it earlier , giving them an advantage. Ho wever , the baselines must learn to acquire context early and avoid redundantly spending the acquisition cost 𝑐 𝑠 for the same unchanging features at future steps. T o ensure a fair comparison, we started from the authors’ ocial implementations and applied the above adaptations. W e tuned the hyp erparameters for each method on the validation set and report the resulting test performance; exhaustive setup details are in Appx. B. 4.3 Implementation Details 4.3.1 Predictor A rchitecture & Training. For prediction at target timestep 𝑡 ′ , the predictor takes as input the acquired temporal history H 𝑡 ′ , acquired context ˜ s , and normalized time indicator for 𝑡 ′ . These are passed through a 3-lay er MLP producing class logits (see details in Appx. C). The predictor is pre-trained with random masking, then jointly trained with the planner during acquisition policy learning. 4.3.2 P lanner Ar chitecture & Training. The planner 𝜋 𝜃 consists of 3 hidden layers with dimensions [512, 256, 128] and ReLU activa- tions. The planner maps the acquired data to a mask over future acquisitions (see Sec. 3.3). W e trained the planner network with a total of 1K batches utilizing our self-iterative training proce dure (Sec. 3.5). Additional details may be found in Appx. C. Please see timing results in App x. Fig. 13, wher e REACT achieves training and inference times that are faster than or comparable to recent deep learning approaches, such as DIME, RAS, and AS. W e will open-source the REA CT code up on publication. 4.4 Performance-Cost Tradeo In Fig. 3, we evaluate REA CT on ve longitudinal prediction tasks and plot the AUROC vs. the total average acquisition cost (see AU- RPC gures, which mostly follow the same trends, in Appx. A). Here, the total cost includes both the one-time onb oarding context costs and temporal feature acquisition costs. For REA CT , one can learn the p olicy at dierent cost budget ranges by sweeping the cost co ecient 𝜆 in Eq. (3) , which controls the trade o b etween prediction loss and acquisition cost (see Appx. T ab. 1 for the val- ues we use for 𝜆 ). For the baselines, we analogously vary each method’s cost-sensitive hyp erparameter to obtain dierent average costs (more details in Appx. B). Thus, each data point in the curve corresponds to a dierent performance-cost setting, and a method is preferred if it achiev es higher p erformance at a lower total cost (higher toward the upper left). That is, y-axes show the predictive performance (e.g., AUROC or AUPRC) of the mo dels (at various cost/benet tradeos) ov er the test set. Correspondingly , we record the average cost incurred during inference (at various cost/benet tradeos) from acquisitions. Since the total acquisition costs include the one-time onboard- ing context cost, part of REA CT’s advantage comes from learning a selective subset of contextual descriptors rather than acquiring Relaxed Eicient Acquisition of Context and T emp oral Features Figure 3: AUROC/total cost of models across various average acquisition costs (budgets) on test data. The dashed line is evaluating the pretrained classier from REACT with all features available. Figure 4: Example feature acquisition rollouts using REACT for two distinct instances from the (a) CHEEARS and ( b) W OMAC test sets. For visual clarity , only the selecte d features for these instances are displayed. all onboarding variables. This learned context selector 𝛼 provides informative initialization for the downstream temporal planner 𝜋 𝜃 ; examples of dataset-lev el context selection are provided in Appx. F, and feature descriptions are given in App x. E. Behavioral Datasets. On CHEEARS and ILIADD, REA CT con- sistently provides the best overall trade-o. The gains are particu- larly clear on ILIADD , wher e it outperforms DIME, RAS, and ASA C across all budgets, approaching the performance of the all-features baseline at a moderate cost. Clinical Datasets . On KLG and WOMA C, REA CT again yields the strongest p erformance across all budget constraints compared to the baselines. On ADNI, the performance gap narrows, with DIME also performing strongly; however , REA CT remains highly com- petitive and achieves slightly better ov erall results. For additional results on the AUPRC metric, please see Appx. A. With AUPRC, REA CT shows a similar trend, outperforming the baselines. 4.5 Analysis of Learned Acquisition Behavior T o better understand the acquisition p olicies learne d by REACT , we provide additional analyses across datasets. Before discussing these analyses, we rst detail the construction of these visualiza- tions. Fig. 5 and Fig. 6 show the learned acquisition trajectories across datasets. For each dataset, the top panel displays the average temporal acquisition cost incurred per timestep, while the middle panel shows the termination (i.e., stop acquiring) probability dis- tribution. The bottom panel illustrates the temporal acquisition plan as a directed graph across timesteps (y-axis). Nodes represent specic featur es acquired at a given timestep, with size and color in- tensity reecting the overall acquisition frequency . Edges show the directed transitions between acquired nodes: if a sample acquires feature m at one timestep 𝑡 and feature n at a future acquisition step 𝑡 ′ > 𝑡 , a directed edge m @ 𝑡 → n @ 𝑡 ′ is added. The visual weight of the edge (thickness and darkness) corresponds to the fraction of samples exhibiting that transition. For visual clarity , we only show the features actually acquired across the trajectories. Fig. 5 shows that REACT learns markedly dierent longitudinal acquisition policies across datasets, rather than applying a uni- form sparsication strategy . Acr oss tasks, acquisition is often front- loaded, with higher measurement cost in early steps followed by progressively sparser follow-up, suggesting that early observations are frequently most useful for routing later de cisions. The stop- probability distributions further indicate adaptive termination b e- havior: in CHEEARS and ADNI, stopping mass is spread across intermediate and later steps, consistent with selective early stopping once sucient evidence is gathered, whereas in ILIADD, W OMA C, and KLG, termination under the policy shown here is concentrated closer to the end of the horizon, indicating that continue d monitor- ing is often preferred at this operating point. The trajectory graphs reinforce this interpretation by sho wing structured transition pat- terns, where early acquisitions branch into dierent later feature sequences rather than repeatedly selecting the same variables in a xed order . Fig. 4 further shows that REA CT adapts featur e acquisi- tion at the instance level: (a) For CHEEARS, only the rst instance acquires the drink_expectancies and happy features, and (b) for W OMAC, only the second instance acquires JSW_3 , and the policy stops acquiring once predictions stabilize, when additional mea- surements are unlikely to change the predicte d lab el to save the budget. W e note that these visualizations correspond to REACT policies learned for a r esp ective single acquisition cost tradeo parameter , 𝜆 . The dataset-specic patterns are consistent with the underly- ing measurement types. In CHEEARS, REA CT concentrates on a Figure 5: Qualitative visualization of longitudinal acquisition dynamics by REA CT across datasets. Metrics are reported per dataset on the test set as (T otal/Longitudinal Costs | AUROC/AUPRC): ILIADD (35.739/23.993 | 0.842/0.706), CHEEARS (13.275/11.275 | 0.673/0.540), ADNI (12.635/10.535 | 0.823/0.678), WOMA C (30.217/28.141 | 0.670/0.355), KLG (29.243/26.869 | 0.812/0.621). For visual clarity , only the selected features are displayed, and the stop probabilities are rounded. Figure 6: Qualitative comparison of acquisition of REACT, DIME, and RAS on the ADNI dataset. Metrics are reporte d per framework on the test set as (T otal/Longitudinal Costs | AUROC/AUPRC): REA CT (5.335/3.235 | 0.824/0.683), DIME (5.619/3.630 | 0.644/0.483), RAS (8.675/4.193 | 0.817/0.684) relatively small subset of daily EMA variables, including aective states, drinking-related e xp ectations, and social-context items, with acquisition frequency declining over time. This suggests that for next-day drinking prediction, a limited set of recent moo d, motiva- tion, and context measurements often captures much of the useful signal, after which additional daily acquisitions provide diminishing value. In ILIADD , the learned policy repeatedly acquires the com- pact set of momentary EMA items—positive aect, energy , stress, and impulsivity—across the horizon. Because the target is negative aect at each timepoint and the temp oral bank itself consists of re- peatedly sample d within-day EMA variables, this sustaine d reuse of a small dynamic feature set is behaviorally plausible and aligns with the strong predictive p erformance on this dataset. Notably , this com- paratively static acquisition pattern illustrates that REA CT adapts the degree of temporal policy dynamicity to the dataset, rather than enforcing diverse trajectories when r ep eated measurement of the same small feature set is most useful. In ADNI, REACT focuses on the four longitudinal imaging biomarkers—Hippocampus, Entorhi- nal, FDG, and A V45—with a transition structure suggesting earlier use of lower-cost MRI markers and mor e selective progression to higher-cost PET biomarkers when neede d. The spread of stopping mass across visits is likewise consistent with adaptive escalation when later acquisitions dier substantially in cost and bur den. In O AI (W OMA C and KLG), the learned policies (under the choice for 𝜆 ) show broader early use of the available longitudinal clinical and radiographic variables, followed by gradual narrowing over later visits. These patterns suggest that, when allowed a less restric- tive budget, REA CT can capitalize on that exibility by acquiring a wider set of early clinical and radiographic measurements be- fore concentrating on a smaller follow-up subset. The denser early graphs likely reect an increased budget of the selected cost–b enet trade-o 𝜆 rather than an inherent need for broader acquisition. Overall, these visualizations suggest that REACT allocates longitu- dinal acquisition eort in a task-dependent manner , adapting b oth feature choice and stopping behavior to the temporal structure, feature semantics, and relative acquisition costs of each dataset. Acquisition Trajectories of REACT vs. Baselines . W e com- pare against DIME and RAS baselines at a similar average cost in Fig. 6. REA CT achieves better p erformance than DIME at a similar cost and comparable performance to RAS at a lo wer cost. More- over , one may se e that REACT acquires informative biomarkers early and then terminates quickly ( 98% of traje ctories terminate by Relaxed Eicient Acquisition of Context and T emp oral Features Figure 7: Ablation study of the self-iterative training we used for training REACT (Alg. 1). W e compare AUROC/total cost when planner 𝜋 𝜃 trained with (i) self-iterative training from random initialization (Self Training) and (ii) trained using only o line reference states (O line Reference). Figure 8: Ablation study on how the learned context selector aects the temporal feature acquisition. AUROC of REA CT with learned context descriptors compared to when context descriptors are all acquired (REACT-all) or not at all acquired (REA CT-none) across datasets. timestep tw o). This behavior suggests a time-aware strategy , where the policy invests early to reduce the chance of missing high-value signals, and stops once additional measurements are unlikely to change the prediction. In contrast, DIME and RAS spread the ac- quisition over time , showing later stopping distribution mass and extended acquisition chains. This suggests the baselines struggle to stop (ev en) when they have enough information. 4.6 Ablation of Context Sele ction W e ablate the onboarding context selector 𝛼 against two variants: (i) REA CT-all, which acquires all context features, and (ii) REA CT- none, which uses none. As shown in Fig. 8, the learned sele ctor achieves higher AUROC at comparable cost on CHEEARS, ILIADD , and W OMAC. On KLG, all three variants perform similarly , suggest- ing context contributes little predictive value. On ADNI, removing context hurts performance , while REA CT remains competitive with REA CT-all — indicating the selector learns to acquire useful context without needing all of it. O verall, the benet comes not from more context but from a learned selective subset that b etter supp orts downstream temporal acquisition. AUPRC results in Appx. Fig. 11 show a consistent pattern. 4.7 Ablation of Self-Iterative Training T o validate self-iterative training, we ablate it against a variant trained solely on static oine reference plans (see Appx. A.2). As shown in Fig. 7, self-iterative training consistently outperforms the oine r eference across all datasets. The gap is most pronounced on clinical datasets (W OMAC, KLG, ADNI), wher e the oine variant produces unstable, non-monotonic curves at higher cost budgets — suggesting increasing misalignment between reference plans and the states the planner actually induces at test time. This conrms that exposing the planner to its own rollout states is critical for ee c- tive policy learning. Additional AUPRC results are in Appx. Fig. 10. 5 Conclusion W e presented REACT , which formalizes a practical gap in the LAF A literature: the explicit separation of onboarding context fr om tem- poral features—a two-phase structur e common to real clinical work- ows. REACT yields an end-to-end dierentiable framework for LAF A that jointly optimizes onboarding context selection and adap- tive temporal acquisition planning under cost constraints. By re- placing discrete optimization with Gumbel-Sigmoid relaxation, RE- A CT trains a unie d acquisition policy without the instability and credit-assignment diculties of RL-based approaches. Importantly , the context selector is optimized not in isolation, but to sele ct on- boarding descriptors that best support the downstream planner’s cost-benet tradeo—learning a population-level context strategy tailored to longitudinal acquisition eciency . W e provide both quantitative results, such as cost-performance curves, and qualitative analysis that shows feature acquisition pro- cesses on ve real-w orld longitudinal tasks with context features spanning behavioral and clinical health domains. Across these datasets, REA CT consistently achieves superior predictive perfor- mance at the same acquisition cost compared to e xisting LAF A baselines. Qualitative analysis reveals that REA CT learns meaning- ful, dataset-sp ecic acquisition strategies: acquiring informative features early and terminating once predictions stabilize. Ablation studies conrm that both the learned context selector and the self- iterative training procedure contribute meaningfully to the superior performance of REA CT by identifying a selective subset of onboard- ing descriptors that best support downstream temporal decisions and exposing the planner to on-policy states that b etter reect test-time conditions, respectively . W e anticipate that this work ser ves as a foundation for future research on cost-aware longitudinal decision-making. In the longer term, approaches such as REA CT may support more ecient, patient- centered care by reducing unnecessary measurement burden, low- ering costs, and improving the allocation of e xp ensive diagnostic resources. References [1] OL Aiyegbusi, J Roydhouse, SC Rivera, P Kamudoni, P Schache, R Wilson, R Stephens, and M Calvert. [n. d.]. Key considerations to r e duce or address respon- dent burden in patient-reported outcome (PRO ) data collection. Nat Commun. 2022; 13 (1): 6026. [2] Boqi Chen, Junier Oliva, and Marc Niethammer . 2024. A unie d model for lon- gitudinal multi-modal multi-view prediction with missingness. In International Conference on Medical Image Computing and Computer-Assisted Inter vention . 410–420. [3] Diane Cook, Aiden W alker , Bryan Minor, Catherine Luna, Sarah T omaszewski Farias, Lisa Wiese, Raven W eaver , Maureen Schmitter-Edgecombe, et al . 2025. Understanding the Relationship Between Ecological Momentar y Assessment Methods, Sense d Behavior, and Responsiveness: Cross-Study Analysis. JMIR mHealth and uHealth 13, 1 (2025), e57018. [4] Ian Connick Covert, W ei Qiu, Mingyu Lu, Na Y oon Kim, Nathan J White, and Su-In Lee. 2023. Learning to maximize mutual information for dynamic feature selection. In International Conference on Machine Learning . PMLR, 6424–6447. [5] Soham Gadgil, Ian Connick Covert, and Su-In Lee. 2024. Estimating Conditional Mutual Information for Dynamic Feature Selection. In The Twelfth International Conference on Learning Representations . [6] Aritra Ghosh and Andrew Lan. 2023. Difa: Dierentiable feature acquisition. In Proceedings of the AAAI Conference on A rticial Intelligence , V ol. 37. 7705–7713. [7] W enb o Gong, Sebastian T schiatschek, Sebastian Nowozin, Richard E Turner , José Miguel Hernández-Lobato, and Cheng Zhang. 2019. Icebreaker: Element- wise ecient information acquisition with a bayesian deep latent gaussian model. Advances in neural information processing systems 32 (2019). [8] Neil Houlsby, Ferenc Huszár , Zoubin Ghahramani, and Máté Lengyel. 2011. Bayesian active learning for classication and preference learning. arXiv preprint arXiv:1112.5745 (2011). [9] Eric Jang, Shixiang Shane Gu, and Ben Poole. 2016. Categorical Reparame- terization with Gumbel-Softmax. A rXiv abs/1611.01144 (2016). https://api. semanticscholar .org/CorpusID:2428314 [10] Jaromír Janisch, T omáš Pevn ` y, and Viliam Lis ` y. 2020. Classication with costly features as a sequential de cision-making problem. Machine Learning 109, 8 (2020), 1587–1615. [11] Jonas H Kellgren, JS Lawrence, et al . 1957. Radiological assessment of osteo- arthrosis. Ann Rheum Dis 16, 4 (1957), 494–502. [12] Utkarsh Mahadeo Khaire and R Dhanalakshmi. 2022. Stability of feature selection algorithm: A review . Journal of King Saud University-Computer and Information Sciences 34, 4 (2022), 1060–1073. [13] Patrick Kidger , James Morrill, James Foster, and Terry Lyons. 2020. Neural controlled dierential equations for irregular time series. Advances in Neural Information Processing Systems 33 (2020), 6696–6707. [14] Ksenia Konyushkova, Raphael Sznitman, and Pascal Fua. 2017. Learning active learning from data. Advances in neural information processing systems 30 (2017). [15] Jannik Kossen, Cătălina Cangea, Eszter Vértes, Andrew Jaegle, Viorica Pa- traucean, Ira Ktena, Nenad T omasev, and Danielle Belgrave. 2023. Active Ac- quisition for Multimodal T emporal Data: A Challenging Decision-Making Task. Transactions on Machine Learning Research (2023). [16] Y ang Li and Junier Oliva. 2021. Active feature acquisition with generative surrogate models. In International conference on machine learning . PMLR, 6450– 6459. [17] Chao Ma, Sebastian T schiatschek, Konstantina Palla, Jose Miguel Hernandez- Lobato, Sebastian Nowozin, and Cheng Zhang. 2019. EDDI: Ecient Dynamic Discovery of High- V alue Information with Partial V AE. In International Confer- ence on Machine Learning . PMLR, 4234–4243. [18] David JC MacKay . 1992. Information-based objective functions for active data selection. Neural computation 4, 4 (1992), 590–604. [19] Chris J. Maddison, Andriy Mnih, and Y ee Whye Teh. 2016. The Concrete Distribution: A Continuous Relaxation of Discrete Random V ariables. ArXiv abs/1611.00712 (2016). https://api.semanticscholar .org/CorpusID:14307651 [20] Sara McConnell, Pamela Kolopack, and Aileen M Davis. 2001. The W estern Ontario and McMaster Universities Osteoarthritis Index (WOMA C): a review of its utility and measurement properties. Arthritis Care & Research: Ocial Journal of the A merican College of Rheumatology 45, 5 (2001), 453–461. [21] Jianyu Miao and Lingfeng Niu. 2016. A survey on featu re selection. Procedia computer science 91 (2016), 919–926. [22] Khanh Nguyen, Huy Hoang Nguyen, Egor Panlo v, and Aleksei Tiulpin. 2024. Active Sensing of Knee Osteoarthritis Progression with Reinforcement Learning. arXiv preprint arXiv:2408.02349 (2024). [23] Alexander Norclie, Changhee Lee, Fergus Imrie, Mihaela V an Der Schaar, and Pietro Liò. 2025. Sto chastic Enco dings for Active Feature Acquisition. arXiv preprint arXiv:2508.01957 (2025). [24] Sid E O’Bryant, Laura H Lacritz, James Hall, Stephen C W aring, W enyaw Chan, Zeina G Kho dr , Paul J Massman, V alerie Hobson, and C Munro Cullum. 2010. V alidation of the new interpretive guidelines for the clinical dementia rating scale sum of boxes score in the national Alzheimer’s coordinating center database. A rchives of neurology 67, 6 (2010), 746–749. [25] Sid E O’Bryant, Stephen C W aring, C Munro Cullum, James Hall, Laura Lacritz, Paul J Massman, P hilip J Lupo, Joan S Reisch, Rachelle Do ody , T exas Alzheimer’s Research Consortium, et al . 2008. Staging dementia using Clini- cal Dementia Rating Scale Sum of Bo xes scores: a T exas Alzheimer’s research consortium study . A rchives of neurology 65, 8 (2008), 1091–1095. [26] Ronald Carl Petersen, Paul S Aisen, Laurel A Beckett, Michael C Donohue, Anthony Collins Gamst, Danielle J Harvey , CR Jack Jr , William J Jagust, Leslie M Shaw , Arthur W T oga, et al . 2010. Alzheimer’s disease Neuroimaging Initiative (ADNI) clinical characterization. Neurology 74, 3 (2010), 201–209. [27] Y uchao Qin, Mihaela van der Schaar, and Changhee Lee. 2024. Risk-averse active sensing for timely outcome prediction under cost pressure. Advances in Neural Information Processing Systems 36 (2024). [28] Whitney R. Ringwald, Kasey G. Creswell, Carissa A. Low , Afsaneh Doryab, T ammy Chung, Junier Oliva, Zachar y F Fisher , Kathleen M Gates, and Aidan G. C. W right. 2025. Common and uncommon risky drinking patterns in young adulthood uncovered by person-specic computational mo deling. Psychology of addictive behaviors : journal of the So ciety of Psychologists in Addictive Behaviors (2025). https://api.semanticscholar .org/CorpusID:275442415 [29] Whitney R. Ringwald, Colin E. Vize, and Aidan G. C. Wright. 2025. Do you feel what I feel? The relation between congruence of perceived aect and self- reported empathy in daily life so cial situations. Emotion (2025). https://api. semanticscholar .org/CorpusID:278234840 [30] Sindre Rolstad, John Adler , and Anna Rydén. 2011. Response burden and ques- tionnaire length: is shorter better? A review and meta-analysis. V alue in Health 14, 8 (2011), 1101–1108. [31] Stéphane Ross, Georey Gordon, and Drew Bagnell. 2011. A reduction of imi- tation learning and structured prediction to no-regret online learning. In Pro- ceedings of the fourteenth international conference on articial intelligence and statistics . JMLR W orkshop and Conference Proceedings, 627–635. [32] Maytal Saar- T sechansky , Prem Melville, and Foster Provost. 2009. Active feature- value acquisition. Management Science 55, 4 (2009), 664–684. [33] Victor S Sheng and Charles X Ling. 2006. Feature value acquisition in testing: a sequential batch test algorithm. In Proceedings of the 23rd international conference on Machine learning . 809–816. [34] Saul Shiman, Arthur A Stone, and Michael R Huord. 2008. Ecological momen- tary assessment. A nnu. Rev . Clin. Psychol. 4, 1 (2008), 1–32. [35] Hajin Shim, Sung Ju Hwang, and Eunho Y ang. 2018. Joint active feature ac- quisition and classication with variable-size set encoding. Advances in neural information processing systems 31 (2018). [36] Laura Swinckels, Frank C Bennis, Kirsten A Ziesemer , Janneke FM Scheerman, Harmen Bijwaard, Ander de Keijzer , and Josef Jan Bruers. 2024. The use of deep learning and machine learning on longitudinal electronic health records for the early detection and prevention of diseases: scoping r eview . Journal of medical Internet research 26 (2024), e48320. [37] Michael V alancius, Maxwell Lennon, and Junier Oliva. 2024. Acquisition Condi- tioned Oracle for Nongreedy Active Feature Acquisition. In International Confer- ence on Machine Learning . PMLR, 48957–48975. [38] B V enkatesh and J Anuradha. 2019. A r eview of feature selection and its methods. Cybern. Inf. T echnol 19, 1 (2019), 3–26. [39] Haiyan Yin, Yingzhen Li, Sinno Jialin Pan, Cheng Zhang, and Sebastian T schi- atschek. 2020. Reinforcement learning with ecient active feature acquisition. arXiv preprint arXiv:2011.00825 (2020). [40] Jinsung Y o on, James Jordon, and Mihaela Schaar . 2019. ASAC: Active sensing using Actor-Critic mo dels. In Machine Learning for Healthcare Conference . PMLR, 451–473. Relaxed Eicient Acquisition of Context and T emp oral Features Appendix A Additional Results for Experiments and Ablations A.1 AUPRC Results W e provide the additional AUPRC results for the cost/p erformance trade-o experiment presented in the main text. Fig. 9 is the exper- iment results, Fig. 11 is the context selector ablation results, and Fig. 10 is the oracle vs self-training ablation results. A.2 W armup Ablation Figure 12: Ablation on States Seen During Policy Training. While oine references can pro vide states to train the planner network, they have a fundamental limitation: one traje ctory per instance. Online renement solves this via on-policy rollouts: exe- cuting 𝜋 𝜃 generates diverse acquisition, exposing the policy to more states. W e ablate on the type of states seen during training to evalu- ate if training using states output by the planner network 𝜋 𝜃 helps generalization to test data. W e ablate on thr ee strategies: (1) Oine Reference (OR) W armup (REA CT), (2) No W armup (pure online), and (3) Oine Reference Only . W e will detail later in this section how to create the reference plan datasets for warmup. Fig. 12 shows OR W armup achieves comparable cost-performance trade os with No W armup on most datasets and b etter on W OMAC. O line Reference W armup . In this section, we detail how to create the dataset of reference plans, which is then used to warmup the planner 𝜋 𝜃 . On the high-level, because training trajectories are fully obser ved, we can evaluate candidate acquisition plans under the same accuracy–cost tradeo used in our objective and select the best-scoring plan for each training instance. W e then initialize training episodes with these instance-specic refer ence plans, rather than random states, to provide a stronger starting point for the planner before subsequent self-iterative r enement with on-policy rollouts. Let 𝑚 𝑠 ∈ { 0 , 1 } 𝑑 𝑠 denote the contextual acquisition mask, and let 𝑀 ∈ { 0 , 1 } 𝑇 × 𝑑 denote the temporal acquisition plan, with ro w 𝑚 𝑡 ∈ { 0 , 1 } 𝑑 corresponding to the acquired temporal features at time 𝑡 . Given a training trajector y ( 𝑠 , 𝑥 , 𝑦 ) ∈ D train , where 𝑦 = ( 𝑦 1 , . . . , 𝑦 𝑇 ) , we dene ˜ 𝑠 = 𝑚 𝑠 ⊙ 𝑠 and ˜ 𝑥 𝑡 = 𝑚 𝑡 ⊙ 𝑥 𝑡 for 𝑡 = 1 , . . . , 𝑇 . Let 𝐻 𝑡 ( 𝑀 ) = ( ˜ 𝑥 1 , . . . , ˜ 𝑥 𝑡 , 0 , . . . , 0 ) (9) denote the masked temporal history available up to time 𝑡 . W e then score a candidate plan ( 𝑚 𝑠 , 𝑀 ) using the plug-in objective 𝐽 𝜆 ( 𝑚 𝑠 , 𝑀 ; 𝑥 , 𝑠 , 𝑦 ) = Í 𝑇 𝑡 = 1 𝐿 pred  𝑓 𝜙 ( 𝐻 𝑡 ( 𝑀 ) , ˜ 𝑠 , 𝑡 ) , 𝑦 𝑡  + 𝜆  𝑐 ⊤ 𝑠 𝑚 𝑠 + Í 𝑇 𝑡 = 1 𝑐 ⊤ 𝑥 𝑚 𝑡  (10) Because ( 𝑠 , 𝑥 , 𝑦 ) is fully observed during training, 𝐽 𝜆 is directly computable and can therefore be used to compare candidate discrete acquisition plans, as described below . For each training instance 𝑖 , we generate a set of candidate ac- quisition plans ( 𝐾 = 1000 in our experiments) Candidate 𝑖 = { ( 𝑚 ( 𝑘 ) 𝑠 , 𝑀 ( 𝑘 ) ) } 𝐾 𝑘 = 1 , (11) and select the best plan, referred to as the reference plan, by ( 𝑚 ∗ 𝑠 , 𝑀 ∗ ) ∈ argmin ( 𝑚 𝑠 ,𝑀 ) ∈ Candidate 𝑖 𝐽 𝜆 ( 𝑚 𝑠 , 𝑀 ; 𝑥 𝑖 , 𝑠 𝑖 , 𝑦 𝑖 ) . (12) In practice, rather than performing an exhaustive search over Candidate 𝑖 , we construct the candidate set by sampling discrete plans, e.g., uni- formly at random. After obtaining reference plans for all training instances, we form a reference dataset D ref =   ( 𝑠 𝑖 , 𝑥 𝑖 , 𝑦 𝑖 ) , ( 𝑚 ∗ 𝑠 , 𝑀 ∗ )   𝑁 𝑖 = 1 , (13) which maps each fully obser ved training trajector y to its best- scoring acquisition plan and is used to warm-start the planner 𝜋 . W e use D ref to warm-start the p olicy parameters 𝛼 and 𝜃 of the planner 𝜋 𝜃 by minimizing the prediction loss from the classier when the planner sees the states from reference masks. B Baseline Implementations W e evaluate our proposed framework against several str ong base- line methods for traditional AF A and LAF A settings. Importantly , existing baselines do not natively distinguish the onb oarding con- text 𝑠 from the temporal measurements 𝑥 𝑡 . T o ensure a fair compari- son, w e restructure the observation space for all baseline models by concatenating both feature types at every step, obtaining the new state vector 𝑥 ′ 𝑡 = [ 𝑠 , 𝑥 𝑡 ] ∈ R 𝑑 𝑠 + 𝑑 . Thus, this grants the baselines exibility where they can acquire missed contextual descriptors earlier . Consequently , the baseline policies have to discover that Figure 9: AUPRC/total cost of models across various average acquisition costs ( budgets) on test data. The dashed line is evaluating the pretrained classier from REACT with all features available. Figure 10: Ablation study of the self-iterative training for REACT (Alg. 1). W e compare AUPRC/total cost when planner 𝜋 𝜃 trained with (i) self-iterative training from random initialization (Self Training) and (ii) traine d using only o line reference states (O line Reference). Figure 11: Ablation study on how the learned context selector aects the temp oral feature acquisition. AUPRC of REA CT with learned context descriptors compared to when context descriptors are all acquired (REACT-all) or not at all acquired (REA CT-none) across datasets. context should be acquired early , while actively learning to avoid re- measuring (and cost penalty 𝑐 𝑠 ) the same context data in subsequent visits. T o ensure a fair comparison, we e valuated all baselines by adapt- ing the authors’ ocial implementations. For ASA C, RAS, and AS, we use the implementation available at https://github .com/ yvchao/cvar_sensing, which is under the BSD-3-Clause license. For DIME, built on top of the ocial implementation available at https://github .com/suinleelab/DIME, we adapt the method to the longitudinal setting by restricting the acquisition to current or future timesteps. Following [ 27 ], ASA C, RAS, and AS share the same neural CDE predictor [ 13 ]. Following the released code, the drop rate 𝑝 ∈ { 0 . 0 , 0 . 3 , 0 . 5 , 0 . 7 } of the auxiliary obser vation strategy 𝜋 0 is treated as a hyperparameter and sele cted according to the outcome predic- tor’s average accuracy over multiple randomly masked evaluation sets generated by 𝜋 0 . The selected drop rates are: 𝑝 =                    0.7 ILIADD , 0.7 CHEEARS , 0.3 ADNI , 0.3 WOMA C , 0.0 KLG . ASA C [ 40 ] . For ASAC, w e tune the acquisition-cost co ecient 𝜇 on the validation set. The search grids are: 𝜇 ∈                    { 0.02, 0.005, 0.002 } ILIADD , { 0.01, 0.005, 0.003, 0.002 } CHEEARS , { 0.02, 0.01, 0.001 } ADNI , { 0.1, 0.01, 0.005, 0.001 } W OMAC , { 0.00175, 0.00125 } KLG . Note that sweeping the 𝜇 parameter yields a diverse set of ASAC policies, each corresponding to a dierent average acquisition cost. Relaxed Eicient Acquisition of Context and T emp oral Features RAS [ 27 ] . For RAS, we sp ecify the acquisition interval range ( Δ min , Δ max ) = ( 0 . 5 , 1 . 5 ) to all datasets. W e further tune the diagnostic- error coecient 𝛾 RAS on the validation set via grid search: 𝛾 RAS ∈                    { 5, 10, 15 } ILIADD , { 5, 10, 15 } CHEEARS , { 50, 75, 100, 175 } ADNI , { 500, 750, 1000, 1250 } WOMA C , { 500, 750, 1000, 1250 } KLG . The nal selected values of 𝛾 RAS depend on the target acquisition budget. For all datasets, we use a tail-risk quantile of 0.1 , an invalid- visit p enalty of 10 , and a discount factor of 0.99 , following the authors’ defaults. AS [ 27 ] . For AS, we use the same minimum and maximum acquisition-interval constraints as in the RAS setting. In addition, we tune a xed acquisition interval ˜ Δ for each dataset: ˜ Δ ∈                    { 0.5, 1.0, 1.5 } ILIADD , { 1.0, 1.5, 2.0 } CHEEARS , { 0.2, 0.4, 0.5, 1.0, 1.5 } ADNI , { 1.0, 1.5, 2.0 } W OMAC , { 0.2, 0.4, 0.5, 1.0, 1.5 } KLG . As with RAS, dierent nal values of ˜ Δ may b e reported for dierent acquisition budgets. DIME [ 4 ] . W e extend DIME to the longitudinal setting while preserving its original greedy acquisition mechanism. In our imple- mentation, DIME is restricted to selecting featur es from the current or future time points only . Since DIME acquires one feature at a time, it may make repeate d selections within the same time step before moving forward in time. T o keep the comparison controlled, both the prediction network and value network of DIME share the same architecture as REA CT . C Additional Implementation and T raining Details on REA CT Details architecture and training pr o cedure for REA CT are as fol- low: • predictor: 3-layer MLP with ReLU activations; 64 for all hid- den layer size; pre-trained with cross entropy loss, dropout with 𝜌 =0.4, random masking of input where each subset size is equally likely , and early stopping on the validation set. • contextual descriptor selector: logits of context descriptor size • planner: 3-layer MLP with ReLU activations; hidden layer sizes are [512, 256,128]; input time indicator embedde d with sinusoidal time embedding with dimension of 64. • training procedure: For results in Fig. 3 and Fig. 9, we jointly trained the above 3 components for 1000 mini batches of size 64 on all datasets, optimizing using L 𝑅𝐸 𝐴𝐶𝑇 . The rst 50 are warm-up batches using oine reference , and the remaining 950 are self-training. Hyperparameters we used for REA CT are as follows: • cost-benet tradeo hyperparameter 𝜆 : in T able 1, we r e- port 𝜆 we used for each cost we have in our plots. • learning rate for pretraining predictor: 0.001. • learning rate for training the planner and context selector: 0.001. • learning rate for training predictor jointly: 0.0001. D Training and Evaluation W all-Clock Runtime Hardware and Setup . W e measured the wall-clo ck runtime for training and evaluation across the ve longitudinal tasks (ILIADD , CHEEARS, ADNI, WOMA C, and KLG). T o ensure a fair comparison, we ensured all methods had access to the same hardware: a single NVIDIA L40S GP U and an Intel Xeon Gold 6526Y CP U. T o prevent timing mismatches caused by asynchronous CUDA executions, we synchronized the device before and after all time d regions and recorded the durations using time.perf_counter() . Training Runtime . W e normalized the training runtime as seconds per full train-set pass and use d the same batch size during both training and evaluation. W e dene one iteration as one full train-set pass. The sp ecic iterations for each metho d are 2 classier pretraining iterations and 2 policy ( joint) training iterations. In Fig. 13(a), we can see REA CT achieves training time that are faster than recent deep learning approaches (DIME, RAS, and AS). (a) Training wall-clock runtime (b) Evaluation wall-clock runtime Figure 13: Comparison of training and evaluation runtimes. Evaluation Runtime . For inference, all methods wer e timed on the full test sets using the same batch size of 128. W e report Dataset 𝜆 T otal Cost T emporal Cost Context Cost ADNI 0.05 1.100 0.500 0.600 0.01 4.725 3.225 1.500 0.0001 5.585 3.485 2.100 CHEEARS 0.003 1.000 0.000 1.000 0.002 6.851 5.851 1.000 0.0015 13.275 11.275 2.000 0.0008 26.044 18.044 8.000 0.0006 35.571 25.571 10.000 0.0004 45.213 32.257 12.956 0.0002 65.754 45.798 19.956 0.0001 100.824 78.868 21.956 ILIADD 0.005 1.999 1.999 0.000 0.003 11.266 11.266 0.000 0.001 29.742 20.995 8.747 0.0005 35.739 23.993 11.747 0.0001 51.696 33.950 17.747 KLG 0.05 0.989 0.989 0.000 0.01 8.489 6.997 1.492 0.005 13.179 11.687 1.492 0.001 29.243 26.869 2.374 0.0005 37.613 34.941 2.672 0.0001 46.243 43.274 2.969 W OMAC 0.05 0.297 0.297 0.000 0.01 1.767 0.584 1.183 0.005 1.775 0.297 1.478 0.001 7.368 5.890 1.478 0.0005 12.037 10.559 1.478 T able 1: REACT hyperparameter: acquisition costs and 𝜆 by dataset. the end-to-end wall-clock runtime and the per-acquisition cost. In Fig. 13(b), we can see REACT achie ves comparable inference time with the baselines. E Dataset Details E.1 CHEEARS More details about features we used for the CHEEARS data can be found in T able 2 and T able 3, se e [ 28 ] for information on the features and survey design. E.2 ILIADD More details about features we use d for the ILIADD data can be found in T able 4 and T able 5. E.3 ADNI More details about the ADNI context descriptors and temp oral features could be found in T able 6 and T able 7, respectively . Data access and the corresponding Data Use Agreement can be found at https://adni.loni.usc.edu and https://adni.loni.usc.edu/terms- of- use/, respectively . E.4 O AI The O AI dataset involves two prediction tasks, W OMAC and KLG, which share the same set of input features. More details about the O AI context descriptors and temporal features can be found in T able 8 and T able 9, respe ctively . Access to the OAI data may b e requested via https://nda.nih.gov/oai/. F Onboarding Context Selection W e show examples of the selected contexts in Fig. 14. These contexts correspond to the policies illustrated in Fig. 5. Relaxed Eicient Acquisition of Context and T emp oral Features Feature Name Acquisition Cost Description sex 1.0 Participant sex. age 1.0 Participant age. race___4 1.0 Participant race (Black/African American indicator). hispanic 1.0 Hispanic/Latino ethnicity . education 1.0 Highest level of education attained. current_employed 1.0 Current employment status. family_income 1.0 Family income level. religious_aliation 1.0 Religious aliation. cigarette_use 1.0 Cigarette/tobacco use history . alcohol_use 1.0 Alcohol use history . drug_use 1.0 Drug use history. mentalhealth 1.0 Mental health status or diagnosis. audit_total 1.0 T otal score on the Alcohol Use Disorders Identica- tion T est (AUDIT). dmq_soc 1.0 Drinking Motives Questionnaire (DMQ) — Social mo- tives subscale. dmq_cop 1.0 DMQ — Coping motives subscale. dmq_enh 1.0 DMQ — Enhancement motives subscale. dmq_con 1.0 DMQ — Conformity motives subscale. yaacq_total 1.0 Y oung Adult Alcohol Consequences Questionnaire (Y AA CQ) — total score. yaacq_social 1.0 Y AA CQ — Social/interpersonal consequences sub- scale. yaacq_control 1.0 Y AACQ — Loss of control subscale . yaacq_selfperc 1.0 Y AACQ — Self-perception consequences subscale. yaacq_selfcare 1.0 Y AACQ — Self-care consequences subscale. yaacq_risk 1.0 Y AACQ — Risk/safety consequences subscale. yaacq_academic 1.0 Y AA CQ — Academic/occupational consequences sub- scale. yaacq_depend 1.0 Y AACQ — Dependence symptoms subscale. yaacq_blackout 1.0 Y AACQ — Blackout subscale. neo_n 1.0 NEO Personality Inventory — Neuroticism subscale. neo_e 1.0 NEO Personality Inventory — Extraversion subscale. neo_a 1.0 NEO Personality Inventory — Agreeableness subscale. neo_o 1.0 NEO Personality Inventory — Openness subscale. neo_c 1.0 NEO Personality Inventory — Conscientiousness sub- scale. iip_dom 1.0 Inventory of Interp ersonal Problems (IIP) — Domi- nance subscale. iip_lov 1.0 IIP — Love/Aliation subscale. iip_elev 1.0 IIP — Elevation (overall interpersonal distress). Do W 1.0 Day of the week for the rst day of the sequence. T able 2: CHEEARS contextual descriptors. More details about features we used for the ILIADD dataset can be found in T able 4 and T able 5, see [28] for information on the features and sur vey design. Feature Name Acquisition Cost Description happy 1.0 Self-reported happiness in the past 15 minutes ( con- tinuous scale). nervous 1.0 Self-reported nervousness in the past 15 minutes ( con- tinuous scale). angry 1.0 Self-reported anger in the past 15 minutes (continuous scale). sad 1.0 Self-reported sadness in the past 15 minutes ( continu- ous scale). excited 1.0 Self-reported excitement in the past 15 minutes (con- tinuous scale). alert 1.0 Self-reported alertness in the past 15 minutes (contin- uous scale). ashamed 1.0 Self-reported shame in the past 15 minutes (continu- ous scale). relaxed 1.0 Self-reported relaxation in the past 15 minutes ( con- tinuous scale). bored 1.0 Self-reported boredom in the past 15 minutes (contin- uous scale). content 1.0 Self-reported contentment in the past 15 minutes (con- tinuous scale). stress 1.0 Self-reported stress in the past 15 minutes (continuous scale). drink_plans 1.0 Whether participant has spe cic plans to drink tonight. substance 1.0 Whether participant used any substances besides al- cohol to get high or feel good. dom 1.0 Self-rated dominance of social b ehavior during inter- actions today (continuous scale). warm 1.0 Self-rated warmth of social behavior during interac- tions today (continuous scale). drink_likely 1.0 Likelihood of drinking tonight. drink_quantity 1.0 Anticipated number of drinks to consume tonight. drink_urge 1.0 Strength of urge to drink in the past 15 minutes. nondrink_likely 1.0 Likelihood of drinking tonight (non-drinking condi- tion branch). nondrink_quantity 1.0 Anticipated drink quantity (non-drinking condition branch). nondrink_urge 1.0 Urge to drink in the past 15 minutes (non-drinking condition branch). nondrink_plan_other 1.0 Free-text spe cication of other evening plans (non- drinking branch). daily_activities 1.0 Activities completed today (9 binary features). daily_experiences 1.0 W ork-related experiences today (6 binar y features). drink_expectancies 1.0 Expected outcomes if drinking tonight (29 binar y fea- tures). drink_motives 1.0 Reasons for drinking tonight (13 binary features). general_experiences 1.0 General experiences that o ccurred today (4 binary features). nondrink_expectancies 1.0 Expected outcomes for tonight if not drinking (26 binary features). nondrink_motives 1.0 Drinking motives, non-drinking condition branch (13 binary features). nondrink_plans 1.0 Plans for the evening if not drinking (13 binary fea- tures). social_experiences 1.0 Social experiences that occurred today (7 binary fea- tures). T able 3: CHEEARS temporal features. Relaxed Eicient Acquisition of Context and T emp oral Features Feature Name Acquisition Cost Description age 1.0 Participant age. sex 1.0 Participant sex. handedness 1.0 Participant handedness (e.g., left, right, ambidex- trous). multiracial 1.0 Multiracial identity indicator . hispanic 1.0 Hispanic/Latino ethnicity . language 1.0 Primar y language spoken. marital 1.0 Marital status. relationship 1.0 Current romantic relationship status. grade 1.0 Current academic grade level. degree 1.0 Degree program or highest degree attained. income 1.0 Family or personal income level. cigarette 1.0 Cigarette/tobacco use. substance 1.0 Use of substances other than alcohol to get high or feel good. treatment 1.0 History of mental health or substance use treatment. recentTreatment 1.0 Whether participant received treatment recently . who Treatment 1.0 T ype or provider of most recent treatment. HiTOP_Dishon 1.0 HiTOP — Disinhibition/Dishonesty spectrum score. HiTOP_DisDys 1.0 HiTOP — Disinhibited/Dysregulated spectrum score. HiTOP_Emot 1.0 HiTOP — Emotional D ysfunction spectrum score. HiTOP_Mistrust 1.0 HiTOP — Mistrust/Antagonism spectrum score. HiTOP_PhobInd 1.0 HiTOP — Phobic Internalizing sp ectrum score. BFI_E 1.0 Big Five Inventory — Extraversion subscale. BFI_A 1.0 Big Five Inventory — Agreeableness subscale. BFI_C 1.0 Big Five Inventory — Conscientiousness subscale. BFI_N 1.0 Big Five Inventory — Neuroticism subscale. BFI_O 1.0 Big Five Inventory — Openness to Experience sub- scale. T able 4: ILIADD contextual descriptors. Feature Name Acquisition Cost Description interaction 1.0 Event-contingent report of a social interaction (trig- gered by participant). positiveaEMA 1.0 Self-reported positive aect in the past 15 minutes (0 = Neutral, 10 = V er y Positive). energyEMA 1.0 Self-reported energy/alertness in the past 15 minutes (0 = Not at all, 10 = V er y Energetic/A wake). stressEMA 1.0 Self-reported stress in the past 15 minutes (0 = Not at all, 10 = Extremely). impulse1EMA 1.0 In the past hour , acted on impulse (0 = Not at all, 10 = V er y Much). impulse2EMA 1.0 In the past hour , did things without worrying about consequences (0 = Not at all, 10 = V er y Much). impulse3EMA 1.0 In the past hour , decided to put o something that needed to be done (0 = Not at all, 10 = V ery Much). impulse4EMA 1.0 In the past hour , avoided doing something despite knowing the consequences (0 = Not at all, 10 = V ery Much). T able 5: ILIADD temporal (EMA) features. Feature Name Acquisition Cost Description A GE 0.3 Participant age. PTGENDER 0.3 Participant gender . PTEDUCA T 0.3 Participant e ducation. PTETHCA T 0.3 Participant ethnic category . PTRA CCA T 0.3 Participant racial categor y . PTMARRY 0.3 Participant marital status. F A Q 0.3 Functional Activities Questionnaire total score. T able 6: ADNI contextual descriptors. Feature Name Acquisition Cost Description FDG 1.0 FDG PET biomarker . A V45 1.0 A V45 PET biomarker . Hippocampus 0.5 Hipp ocampal MRI biomarker . Entorhinal 0.5 Entorhinal MRI biomarker . T able 7: ADNI temporal features. Feature Name Acquisition Cost Description HISP 0.3 Hispanic/Latino ethnicity indicator . RA CE 0.3 Self-reported race categor y . SEX 0.3 Participant sex. F AMHXKR 0.3 Mother , father , sister , or brother had knee repl surgery where all/part of knee replaced. EDCV 0.3 Highest grade or year of school completed. A GE 0.3 Age. SMOKE 0.3 Smoking histor y/status. INCOME2 0.3 Household income categor y . MARI TST 0.3 Marital status. MEDINS 0.3 Medical / health insurance status. T able 8: OAI contextual descriptors. Feature Name Acquisition Cost Description DRNKAMT 0.3 Current alcohol consumption amount. DRKMORE 0.3 Alcohol-use indicator capturing heavier or more fre- quent drinking pattern. BPSYS 0.5 Systolic blo od pressure. BPDIAS 0.5 Diastolic bloo d pressure. BMI 0.5 Body mass index. CEMPLO Y 0.3 Current employment. CUREMP 0.3 Currently work for pay . JSW_1 → JSW_10 0.8 Fixed-location radiographic knee joint space width ( JSW) measurements. T able 9: OAI temporal features. Relaxed Eicient Acquisition of Context and T emp oral Features (a) ILIADD (b) CHEEARS (c) ADNI (d) WOMA C (e) KLG Figure 14: Selected contexts for the policies shown in Fig. 5. Bright cells indicate selected features, while dark cells indicate those that were not selecte d.

Relaxed Efficient Acquisition of Context and Temporal Features

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment