Neural Network Models for Contextual Regression

Neural Net w ork Mo dels for Con textual Regression Seksan Kiatsupaibul ∗ P ak a w an Chansiripas ∗ Marc h 26, 2026 Abstract W e propose a neural net work mo del for contextual regression in whic h the regression mo del depends on con textual features that de- termine the activ e submo del and an algorithm to ﬁt the mo del. The prop osed simple contextual neural net work (SCtxtNN) separates con- text iden tiﬁcation from context-speciﬁc regression, resulting in a struc- tured and in terpretable arc hitecture with few er parameters than a fully connected feed-forward net work. W e show mathematically that the prop osed arc hitecture is suﬃcien t to represent contextual linear regression models using only standard neural netw ork comp onen ts. Numerical exp erimen ts are provided to supp ort the theoretical result, sho wing that the prop osed mo del achiev es lo wer excess mean squared error and more stable p erformance than feed-forward neural net works with comparable n um b ers of parameters, while larger netw orks im- pro ve accuracy only at the cost of increased complexity . The results suggest that incorp orating contextual structure can impro ve mo del eﬃciency while preserving interpretabilit y . Keywor ds : Contextual regression, game theory , ﬁctitious pla y , b est resp onse dynamics 1 In tro duction Con textual regression concerns a family of regression mo dels that are applied to diﬀeren t con texts that b elongs to a common problem. T ec hnically , ∗ Departmen t of Statistics, Ch ulalongkorn Univ ersity , Bangk ok 10330, Thailand 1 a contextual regression model is a regression mo del whose parameters are functions of the explanatory v ariables or features. W e are in terested in the setting where the contextual features, the subset of features that iden tiﬁes the con texts, are known. In addition, we assume that there are a ﬁnite n um b er of con texts, and the contextual features cannot readily b e transformed into indicator v ariables that can completely iden tify all of the con texts. Under this setting, ev en when the sub-mo del giv en a con text is a linear regression mo del, the ov erall mo del is non-linear. W e propose an interpretable neural net work arc hitecture for a contextual regression problem and a metho d to ﬁt the mo del. The applications of the contextual regression can b e numerously found in p ersonalized pro duct and services. In p ersonalized medicine, pulse oximetry pro vides a vivid example of ho w a contextual regression mo del can b e applied. A pulse o ximeter determines the amount of oxygen in blo o d by sending a b eam of ligh t through a p erson’s skin, measuring the light absorption signal and con verting the signal to the bloo d oxygen lev el via a regression model. A standard pulse o ximeter emplo ys a single regression model for all skin shades. Ho wev er, there is a gro wing b o dy of evidence that the accuracy of the o xygen measuremen t is compromised when the device is applied to a darker skin [Sjo ding et al., 2020]. A b etter design is to adopt a contextual regression mo del for the oximetry , taking the skin shade as the context. Note that the shade of the skin can also be determined from signals extracted from the ligh t b eam. These skin shade signals can b e taken as the contextual features. When the sub-model giv en a con text is a linear regression mo del, a contex- tual regression that is a linear mo del can be constructed when the con textual features are indicator v ariables. In that case, by including the cross terms b et w een the indicator con textual features and the other features, the result- ing linear regression mo del can b e serv ed as a contextual regression mo del. Mo del ﬁtting and analysis can b e easily p erformed following the standard linear mo deling metho dology . The problem b ecomes complicated when the con textual v ariables are con tinuous v ariables, which is the case we consider here. Assuming there are a ﬁnite num b er of con texts, the mo del is required to b e able to classify the observ ations into contexts based on the v alues of the contextual features and, subsequen tly , to b e able to predict the v alues of the resp onses based on the v alues of other features. In this case, a context regression problem b ecomes a com bination of a classiﬁcation problem and a regression problem. A neural net w ork is a uniﬁed framew ork for regression and classiﬁcation 2 mo deling that can accommo date b oth linear and non-linear mo dels. There- fore, it is an ideal c hoice for mo deling a con textual regression problem, which can b e regarded as a combination of classiﬁcation and regression. Ho wev er, the fully connected neural net work model, the so-called feed-forw ard net- w ork, is ov er-parameterized and is prone to ov er-ﬁtting. W e prop ose an in terpretable neural netw ork mo del, called the contextual neural net work (CtxtNN), that contains considerably low er n um b er of parameters than that of the feed-forward neural netw ork. W e will show that CtxtNN is suﬃcien t for mo deling the contextual linear regression problems. W e also introduce a mo del ﬁtting algorithm that improv es the chance of attaining a global optimal set of parameters for the contextual neural netw ork. The organization of the pap er is as follo ws. In Section 2, w e mathe- matically express the contextual regression problem under study and deﬁne CtxtNN for mo deling the problem. The prop osed neural netw ork mo del is sho wn to b e suﬃcien t in some imp ortan t cases. In Section 3, w e compare the p erformances b etw een the prop osed algorithm and the standard sto chastic gradien t when applied to v arious cases of CtxtNN. A conclusion is provided in Section 4 2 Metho dology W e ﬁrst describ e the con textual regression problem to b e addressed. Let X = R p , p ≥ 2 and Y = R b e the feature space and the output space, resp ectiv ely . Con texts are deﬁned by the contextual features that are the elemen ts of the contextual space ˜ X = R q , 0 < q < p , a pro jection of the feature space X . Complement to the contextual space ˜ X is the regressor space ˆ X = R p − q = R r where r = p − q and X = ˆ X × ˜ X = R r × R q . F or a feature vector x = [ x 1 , . . . , x p ] ⊤ ∈ X , the v ector of the ﬁrst r and the last q elemen ts, ˆ x = [ x 1 , . . . , x r ] ⊤ ∈ ˆ X and ˜ x = [ x r +1 , . . . , x p ] ⊤ ∈ ˜ X , represen t the corresp onding regressor feature v ector and contextual feature v ector, resp ectiv ely . Therefore, x = [ ˆ x, ˜ x ] ⊤ . W e assume that there is a ﬁnite (small) n um b er of contexts, indexed by i ∈ { 1 , . . . , c } = I . The con textual space ˜ X is partitioned into context regions {C 1 , . . . C c } , and x ∈ X is said to b elong to con text i, i ∈ I if the corresp onding con textual feature vector ˜ x is in C i . F or a contextual regression 3 mo del, w e assume there are a family F of functions { f 1 , . . . , f c } , eac h f i : ˆ X → Y where i ∈ I , and a regression function f that is deﬁned by f ( x ) = f i ( ˆ x ) , if ˜ x ∈ C i , i ∈ I . (1) W e imp ose the following structure on the problem. W e assume that there is a score function g : ˜ X → R that map eac h con text region C i to a c onne cte d in terv al I i ⊆ R , i ∈ I suc h that { I 1 , . . . , I c } form a partition of R . Therefore, the regression function can b e as well deﬁned by f ( x ) = f i ( ˆ x ) , if g ( ˜ x ) ∈ I i , i ∈ I . (2) 2.1 Simple Con textual Neural Net work A simple yet imp ortan t sub-class of the contextual regression mo dels is the simple con textual linear regression mo del. It will serve as a building blo c k for other contextual regression mo dels in subsequent sections. The simple contextual linear regression is a contextual regression mo del where all f i , i ∈ I , are linear functions and there is one con textual feature, i.e., q = 1, r = p − 1 and the score function is the identit y function g ( x ) = x . That is the contextual space R is partitioned into c connected in terv als, { I 1 , . . . , I c } and x = [ x 1 , . . . , x p ] ⊤ = [ ˆ x, x p ] ⊤ is said to b e in context j if x p ∈ I j . By con ven tion, let I j , j = 1 , . . . c b e the left-closed and right-open interv als. The regression function is of the form f ( x ) = β j 0 + β j · ˆ x, if x p ∈ I j , (3) where β i = ([ β 1 , . . . , β r ] j ) ⊤ ∈ R r is the co eﬃcien t vector , the term β j 0 ∈ R is the bias term (or the in tercept), and β i · ˆ x is the dot pro duct b etw een β j and ˆ x . W e propose the follo wing neural net work model, called the simple con tex- tual neural net w ork (SCtxtNN), for the simple contextual linear regression. The mo del can b e depicted by the netw ork diagram in Figure 1. The ar- c hitecture of SCtxtNN is designed to emplo y only t ypical neural net work comp onen ts without any customized comp onents, i.e., it con tains only the linear comp onen ts, ReLU (rectiﬁed linear unit) and the p erceptron comp o- nen ts. F or a neural netw ork, w e adopt the follo wing notations. W e are giv en n training feature v ectors, where x is a sp eciﬁc feature v ector, y ( x ) is the 4 𝑥 1 𝑥 𝑝 − 1 𝑦 𝑥 𝑝 𝑎 1 I 𝑎 2𝑐 I 𝑎 2𝑐 1 𝑎 1 1 𝑏 1 I 𝑤 1 , 1 I 𝑤 1 , 2𝑐 I 𝑏 2𝑐 I 𝑤 2𝑐 , 2𝑐 X 𝑤 1 , 1 X 𝑏 1 1 𝑤 1 , 1 1 𝑤 𝑝 − 1 , 1 1 𝑏 1 2 𝑤 1 , 1 2 𝑤 2𝑐 , 1 2 Con t e x tual sub - ne tw ork R egr ession sub - ne tw ork ⋮ ⋮ ⋮ Figure 1: Simple contextual neural netw ork (SCtxtNN) architecture. desired output for example x , ℓ is the index of net work lay ers of neurons or units, and a ℓ ( x ) is the v ector of activ ations output for input (feature or transformed feature) vector x . Let L b e the n umber of the netw ork lay ers excluding the input la y er. 1. w ℓ kj is the weight of the k th unit in the ( ℓ − 1) st la yer applied to the j th unit in the ℓ th la yer, k = 1 , 2 , . . . , m ℓ − 1 , j = 1 , 2 , . . . , m ℓ , ℓ = 1 , . . . , L . By conv en tion, l = 0 represen ts the input la yer and l = L represents the output la yer. 2. b ℓ j is the bias of the j th unit in the ℓ th lay er, j = 1 , 2 , . . . , m ℓ , ℓ = 1 , . . . , L . 3. a ℓ j is the activation of the j th unit in the ℓ th lay er, j = 1 , 2 , . . . , m ℓ , ℓ = 1 , . . . , L . The netw ork SCtxtNN consists of tw o sub-net works, the con textual sub- net work and the regression sub-netw ork. The con textual sub-net work has only L = 1 lay er, i.e. the output lay er indexed by I. The input contains one con textual feature x p . The output la y er I con tains 2 c units, each unit 5 is a p er c eptr on unit of the linear function of the con textual input features. Therefore, there are 2 c output a I j , j = 1 , . . . , 2 c from the output lay er where a I j = ˜ a ( w I 1 ,j x p + b I j ) , j = 1 , . . . , 2 c, (4) and ˜ a ( z ) = 1 R + ( z ) is the the indicator function o v er the set of p ositive real n umbers. The regression sub-net w ork has L = 2 lay ers, which include one hidden la yer ℓ = 1 and the output la yer ℓ = 2 = L . The input conta ins r = p − 1 features, x i , i = 1 , . . . , r . The hidden lay er ℓ = 1 con tains 2 c units. F or j = 1 , . . . , 2 c , the output of unit j of la y er ℓ = 1 is a R eLU activ ation of the linear function of the inputs plus the output fr om the c ontextual subnetwork , i.e., a 1 j = a r X k =1 w 1 k,j x k + b 1 j + w X j,j a I j ! , j = 1 , . . . , 2 c, (5) where a ( z ) = max(0 , z ) is the ReLU activ ation function. The output lay er l = 2 contains only one unit that represen ts the output. This output is a linear function of the activ ation unit from the preceding la y er, i.e., y = a 2 1 = 2 c X k =1 w 2 k, 1 a 1 k + b 2 1 . (6) Prop osition 1. L et S ⊂ ˆ X b e a c omp act subset of the r e gr essor sp ac e and let T b e a b ounde d interval in R such that T ∩ I j  = ∅ for j = 1 , . . . , c . F or a simple c ontextual line ar r e gr ession mo del, ther e exists SCtxtNN with output y such that y ( x ) = f ( x ) for al l x ∈ S × T . (7) Pr o of. By the deﬁnition of T and I j , there exist cut p oin t z 1 < z 2 < . . . < z c < z c +1 suc h that x ∈ S × T is in context j if z j ≤ x p < z j +1 , j = 1 , . . . , c . Consider ﬁrst the contextual sub-net w ork. W e set a I 2 j − 1 and a I 2 j to b e the iden tiﬁer for context j for i = 1 , . . . , c . That is w e set b I 2 j and w I 1 , 2 j so that w I 1 , 2 j x p + b I 2 j > 0 ⇐ ⇒ x p < z j , and w I 1 , 2 j x p + b I 2 j ≤ 0 ⇐ ⇒ x p ≥ z j , 6 and set b I 2 j − 1 = b I 2 j and w I 1 , 2 j − 1 = w I 1 , 2 j . With such parameter setting and that a I j is deﬁned as a p erceptron in (4), for j = 1 , . . . , c , a I 2 j − 1 and a I 2 j =  1 , if x p < z j 0 , if x p ≥ z j . (8) W e say that a I j is in on state or oﬀ state if it is zero or one, resp ectively . Therefore, when x p ∈ I j , i.e., z j ≤ x p < z j +1 , w e hav e a I 2 j − 1 and a I 2 j in on state for j = 1 , . . . , j and in oﬀ state for j = j + 1 , . . . , c . W e then set w X j,j , for all j = 1 , . . . , 2 c , to a large negative num b ers so that, when a I j is in oﬀ state, it will turn oﬀ a 1 j in the regression sub-netw ork, i.e., making a 1 j = 0. In fact, in what follo ws, w e see that it suﬃces, for j = 1 , . . . , 2 c , to set w X j,j = − 2 sup i ∈I  sup ˆ x ∈ S   β i · ˆ x   + 2   β i 0    . (9) No w construct the regression sub-netw ork as follows. Consider ﬁrst la y er ℓ = 2. Set b 2 0 = 0 (10) w 2 2 j − 1 , 2 j − 1 = 1 (11) w 2 2 j, 2 j = − 1 (12) According to (8), (9), (10), (11) and (12), if z j ≤ z p ≤ z j +1 , then y = j X k =1  a 1 2 j − 1 + a 1 2 j  (13) Lastly , we set the parameters for lay er ℓ = 1 to giv e the required results. F or the ﬁrst unit of la yer ℓ = 1, let w 1 k, 1 = β 1 k , for k = 1 , . . . , r. (14) The v alue of b 1 1 is set to carry the v alue of β 1 0 and to make P r k =1 w 1 k, 1 x k + b 1 1 b e p ositive whenever x ∈ S and carry the v alue. It suﬃces to set b 1 1 = β 1 0 +   β 1 0   + sup ˆ x ∈ S    β 1 · ˆ x    . (15) F or the second unit of la yer ℓ = 1, let w 1 k, 2 = 0 , for k = 1 , . . . , r . (16) 7 The v alue of b 1 2 is set to oﬀset that of b 1 1 . b 1 2 =   β 1 0   + sup ˆ x ∈ S    β 1 · ˆ x    = b 1 1 − β 1 0 . (17) Therefore, when z j ≤ x p < z j +1 , w e hav e a I 1 = 0 and a I 2 = 0, so r X k =1 w 1 k, 1 x k + b 1 1 + w X 1 , 1 a I 1 = β 1 · ˆ x + b 1 1 > 0 . Therefore, a 1 1 = max(0 , β 1 · ˆ x + b 1 1 ) = β 1 · ˆ x + b 1 1 . Lik ewise, r X k =1 w 1 k, 2 x k + b 1 2 + w X 2 , 2 a I 2 = b 1 2 = b 1 1 − β 1 0 > 0 . Therefore, a 1 2 = max(0 , b 1 1 − β 1 0 ) = b 1 1 − β 1 0 . Hence, a 1 1 + a 1 2 = β 1 0 + β 1 · ˆ x = f 1 ( ˆ x ) . (18) F or units 2 j − 1 for j = 2 , . . . , c of lay er ℓ = 1 of the regression sub- net work, let w 1 k, 2 j − 1 = β j k − β j − 1 k , for k = 1 , . . . , r. (19) The v alue of b 1 2 j − 1 for j = 2 , . . . , c , is set to carry v alue of β j 0 − β j − 1 0 and mak e the linear sum p ositive. b 1 2 j − 1 = ( β j 0 − β j − 1 0 ) +   β j 0 − β j − 1 0   + sup ˆ x ∈ S    ( β j − β j − 1 ) · ˆ x    . (20) F or unit 2 j of lay er ℓ = 2 , . . . , c of la yer ℓ = 1 of the regression sub-net work, let w 1 k, 2 j = 0 , for k = 1 , . . . , r . (21) The v alue of b 1 2 j is set to oﬀset that of b 1 2 j − 1 . b 1 2 j =   β j 0 − β j − 1 0   + sup ˆ x ∈ S    ( β j − β j − 1 ) · ˆ x    = b 1 2 j − 1 − ( β j 0 − β j − 1 0 ) . (22) 8 Fix j ∈ { 2 , . . . , c } . When z j ≤ x p < z j +1 , from (8), we hav e a I 2 j − 1 = 0 and a I 2 j = 0. Therefore, r X k =1 w 1 k, 2 j − 1 x k + b 1 2 j − 1 + w X 2 j − 1 , 2 j − 1 a I 2 j − 1 = β j · ˆ x + b 1 2 j − 1 > 0 . Therefore, a 1 2 j − 1 = max(0 , β j · ˆ x + b 1 2 j − 1 ) = β 1 · ˆ x + b 1 2 j − 1 . Lik ewise, r X k =1 w 1 k, 2 j x k + b 1 2 j + w X 2 j, 2 j a I 2 = b 1 2 j = b 1 2 j − 1 − ( β j 0 − β j − 1 0 ) > 0 . Therefore, a 1 2 j = max(0 , b 1 2 j − 1 − ( β j 0 − β j − 1 0 ) = b 1 2 j − 1 − ( β j 0 − β j − 1 0 ) . Hence, for j = 2 , . . . , c , a 1 2 j − 1 + a 1 2 j = ( β j 0 − β j − 1 0 ) + ( β j − β j − 1 ) · ˆ x = f j ( ˆ x ) − f j − 1 ( ˆ x ) . (23) By (13) and (18), when z 1 ≤ x p < z 2 , w e hav e, for ˆ x ∈ S , y = a 1 1 + a 1 2 = f 1 ( ˆ x ) . (24) By (13) and (23), when z 2 j − 1 ≤ x p < z 2 j for j = 2 , . . . , c , w e hav e, for ˆ x ∈ S , y = j X k =1  a 1 2 j − 1 + a 1 2 j  = f 1 ( ˆ x ) + j X k =2  f k ( ˆ x ) − f k − 1 ( ˆ x )  = f j ( ˆ x ) . (25) 3 Numerical Study In this section, w e conduct a numerical exp erimen t to compare the pro- p osed simple contextual neural net work (SCtxtNN) with standard feed-forward neural netw ork mo dels. W e p erform an exp eriment for the case one con text feature and one regression feature, i.e., p = 2 and q = 1. The ob jectiv e of the exp erimen t is to ev aluate the predictive p erformance of the prop osed arc hi- tecture relative to conv entional neural netw orks with comparable or larger n umbers of parameters. 9 3.1 Exp erimen tal Setup W e compare three neural net work mo dels: • Simple Con textual Neural Netw ork (SCtxtNN): describ ed in Section 2.1 with three con texts. In the implementation, sigmoid ac- tiv ation functions are used in the contextual subnet w ork as a smo oth appro ximation of the p erceptron units, while ReLU activ ation is used in the regression subnet w ork. • Small F eed-F orward Neural Net work (Small FF): A fully con- nected feed-forw ard neural net work with t w o hidden lay ers, eac h ha ving four hidden units with ReLU activ ation functions. The arc hitecture is c hosen so that the n umber of parameters is comparable to that of simple con textual neural netw ork. • Large F eed-F orward Neural Net work (Large FF): A fully con- nected feed-forw ard neural net work with t w o hidden lay ers, eac h ha ving six hidden units with ReLU activ ation functions. The n umber of hid- den units is chosen so that the prop osed con textual neural netw ork can b e represen ted as a sp ecial case of this architecture, and therefore this mo del forms a sup erset of simple contextual neural net work. The total n umber of parameters of eac h mo del is shown in T able 1. Mo del Num b er of parameters Simple con textual neural net work 37 Small feed-forw ard netw ork 37 Large feed-forw ard netw ork 67 T able 1: T otal n umber of parameters for eac h mo del Data are generated according to the mo del in Section 2.1 with three con texts. The con textual feature ˜ x is generated from the uniform distribution on [ − 1 , 1] and determines the context index j ∈ { 1 , 2 , 3 } according to the in terv als [ − 1 , − 1 / 3), [ − 1 / 3 , 1 / 3), and [1 / 3 , 1]. The regressor feature ˆ x is generated indep endently from the standard normal distribution. F or each context j , the regression co eﬃcient v ector β j is generated in- dep enden tly from the standard normal distribution. The resp onse v ariable y is then generated from the corresp onding linear regression mo del with an additiv e Gaussian noise term ε ∼ N (0 , 0 . 01 2 ). 10 F or each simulation, a dataset of size 6000 is generated. The dataset is then split into 1500 training samples, 1500 v alidation samples, and 3000 test samples. Within eac h simulation, the same dataset is used for all mo dels, while a new dataset is generated for each simulation. All mo dels are trained using the Adam optimizer with learning rate 0 . 001 and mean squared error (MSE) loss. In eac h simulation, the three models are trained on the same training and v alidation sets to ensure a fair comparison. Eac h mo del is trained for 20 , 000 ep o chs, and the training and v alidation losses are recorded at each ep o c h. After training, the mean squared error on the test set is computed for each mo del. The exp erimen t is rep eated for 50 indep enden t sim ulations. 3.2 Results Figure 2: T raining and v alidation MSE o v er ep o chs for SCtxtNN, Small FF, and Large FF mo dels. Figure 2 presen ts the training and v alidation loss across ep o chs for the three mo dels, with each curv e represen ting the mean of 50 sim ulations. All mo dels demonstrate a consistent decrease in loss follow ed by a plateau, indi- cating successful con vergence of the optimization pro cess. The small feed-forw ard neural net work mo del sho ws the smallest gap b e- t ween training and v alidation loss, whic h suggests limited mo del complexity , but its v alidation loss remains higher than that of the other mo dels. The simple con textual neural netw ork also conv erges smo othly , with a slightly larger gap b et ween training and v alidation curves, while achieving low er v al- idation loss than the small feed-forw ard netw ork. The large feed-forward 11 neural netw ork mo del reac hes v ery small training loss while the v alidation loss stabilizes. Overall, the loss curv es mainly reﬂect diﬀerences in model complexit y due to the diﬀerent num b ers of parameters and architectures de- ﬁned in the exp erimental setup, but they do not by themselv es determine the ﬁnal predictiv e p erformance. Figure 3: Excess test MSE o v er 50 sim ulations for SCtxtNN, Small FF, and Large FF. Excess MSE is deﬁned as test MSE min us the noise v ariance, so that zero corresp onds to the optimal achiev able error. T o compare predictiv e p erformance, w e ev aluate the excess test MSE sho wn in Figure 3. The excess MSE is deﬁned as the test mean squared error min us the v ariance of the noise, so that it measures the estimation error ab o v e the optimal predictor. With this deﬁnition, smaller v alues indicate b etter approximation of the true regression function. The results show that the simple contextual neural net work model ac hiev es lo wer excess MSE than the small feed-forw ard neural net work mo del. While the medians of the t w o mo dels are similar, the contextual mo del has a low er mean excess MSE (marked by × in the ﬁgure), suggesting fewer large devia- tions across sim ulations. In addition, its distribution exhibits lo wer v ariabil- it y , as indicated b y the narro wer b ox in the ﬁgure, which implies more stable 12 p erformance. Since these t wo mo dels ha ve comparable num b ers of param- eters, this diﬀerence suggests that the con text-based architecture is b etter aligned with the mo del structure describ ed in Section 2.1 than a generic fully connected net work with similar capacit y . The large feed-forward neural netw ork mo del ac hiev es the low est excess MSE ov erall. This b ehavior is exp ected b ecause the larger net work has sub- stan tially higher complexity and can approximate the target function more closely . Ho wev er, this improv emen t is obtained by increasing the num b er of parameters rather than b y using a mo del structure deriv ed from the data- generating mec hanism. 4 Conclusion W e compared a simple contextual neural net w ork (SCtxtNN) with stan- dard fully connected netw orks of diﬀerent sizes. The results show that the structured mo del consisten tly outp erforms a fully connected netw ork with comparable n um b ers of parameters, while a muc h larger netw ork can achiev e lo wer error at the cost of substan tially increased complexit y . These ﬁndings suggest that incorp orating contextual structure in to the mo del can improv e predictiv e accuracy while keeping the net work simple. By separating context iden tiﬁcation from con text-sp eciﬁc regression, the pro- p osed arc hitecture reﬂects the form of the data-generating pro cess and can represen t the relationship more eﬃciently than a generic fully connected net- w ork with a similar n umber of parameters. In addition, the contextual mo del preserves interpretabilit y , since the comp onen ts of the net work corresp ond to meaningful parts of the data- generating pro cess, suc h as context iden tiﬁcation and con text-sp eciﬁc re- gression. This indicates that using contextual structure can improv e mo del p erformance while maintaining a simple and interpretable representation. References Mic hael W. Sjo ding, Rob ert P . Dickson, Theo dore J. Iwash yna, Steven E. Ga y , and Thomas S. V alley . Racial bias in pulse oximetry measuremen t. The New England Journal of Me dicine , 383:2477–2478, 2020. 13

Neural Network Models for Contextual Regression

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment