Focused Relative Risk Information Criterion for Variable Selection in Linear Regression

Reading time: 6 minute
...

📝 Original Info

  • Title: Focused Relative Risk Information Criterion for Variable Selection in Linear Regression
  • ArXiv ID: 2602.16463
  • Date: 2026-02-18
  • Authors: ** - K. Claeskens (University of Antwerp) - J. Hjort (University of Copenhagen) - M. Cunen (University of Oslo) (원 논문에 명시된 저자 순서와 소속을 그대로 반영했습니다.) — **

📝 Abstract

This paper motivates and develops a novel and focused approach to variable selection in linear regression models. For estimating the regression mean $μ=\E\,(Y\midd x_0)$, for the covariate vector of a given individual, there is a list of competing estimators, say $\hattμ_S$ for each submodel $S$. Exact expressions are found for the relative mean squared error risks, when compared to the widest model available, say $\mse_S/\mse_\wide$. The theory of confidence distributions is used for accurate assessments of these relative risks. This leads to certain Focused Relative Risk Information Criterion scores, and associated FRIC plots and FRIC tables, as well as to Confidence plots to exhibit the confidence the data give in the submodels. The machinery is extended to handle many focus parameters at the same time, with appropriate averaged FRIC scores. The particular case where all available covariate vectors have equal importance yields a new overall criterion for variable selection, balancing complexity and fit in a natural fashion. A connection to the Mallows criterion is demonstrated, leading also to natural modifications of the latter. The FRIC and AFRIC strategies are illustrated for real data.

💡 Deep Analysis

📄 Full Content

Mrs. Jones is pregnant (again). She is white, 40 years old, of average weight 60 kg before pregnancy, and a smoker. What is the expected birthweight of her child-to-come?

To address this and similar questions, involving comparison and ranking of many submodels of a given wide regression model, we use a dataset on n = 189 mothers and babies, discussed and analysed in Claeskens & Hjort (2008, Ch. 2), and apply linear regression for the birthweight y in terms of five covariates x 1 (age), x 2 (weight in kg before pregnancy), x 3 (indicator for smoker or not), x 4 (ethnicity indicator 1), x 5 (ethnicity indicator 2), and where ‘white’ corresponds to these two being zero, i.e. not belonging to the two other ethnic groups in question. The full linear regression model has

(1.1) with the ε i assumed i.i.d. from a normal N(0, σ 2 ). The task is to estimate µ = E (Y | x 0 ), with

x 0 = (x 0,1 , . . . , x 0,5 ) the covariate vector for Mrs. Jones. For each of the 2 5 = 32 submodels, corresponding to taking covariates in and out of the above regression equation, there is a point estimate, say µ S , with S a subset of {1, . . . , 5}, and these are plotted on the vertical axis of the FRIC plot of Figure 1.1, along with submodel-specific 80% confidence intervals, to indicate their relative precision. The crucial new and extra aspect of the plot are the FRIC scores, plotted on the horizontal axis. These Focused Relative Risk Information Criterion scores are accurately constructed estimates of the relative risk, the ratio of mean squared errors, relative to the wide model, i.e.

rr S = mse S mse wide = E ( µ S -µ) 2 E ( µ wide -µ) 2 for S subset of {1, . . . , 5}.

(1.2)

Thus models with FRIC scores below 1 are judged to be better than the full wide model of (1.1),

for the given purpose of estimating the mean well for the specified covariate vector; FRIC values higher than 1 indicate that the submodel does a worse job than the wide model itself. Only the wide model based estimator µ wide is guaranteed to have zero bias, so the statistical game is to use the data to hunt for submodels leading to lower variances and biases not far from zero. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 1 2 3 2.8 3.0 3.2 3.4

Figure 1.1: FRIC plot for the 2 5 = 32 models for estimating the birthweight of the child-to-come, for Mrs. Jones (white, age 40, 60 kg, smoker). The FRIC scores are estimates of the relative risks rr S = mse S /mse wide of (1.2); the blue circles are the associated point estimates; and the vertical lines are submodel-based 80% confidence intervals. Here 16 submodels have FRIC scores smaller than 1 and are judged to be better than the wide model. See Table 1.1 for identification of the best models and their FRIC scores and estimates.

Inside the natural framework of linear regression models, basic formulae for the required quantities are worked out in Section 2. The operating assumption is that the wide model, with all covariates on board, is in force. The denominator of (1.2) is standard, whereas more care is needed to find a fruitful formula for the numerator, since submodels will carry biases. In Section 3 we construct several natural estimators for the relative risk quantities rr S , and such are indeed used to produce Figure 1.1. Importantly, as part of this development we construct informative and exact confidence distributions for the relative risks. These are data driven cumulative distribution functions C S (rr S , data) with the property pr β,σ {rr S : C S (rr S , data) ≤ α} = α for all α ∈ (0, 1).

(1.3)

In (1.3) the ‘data’ are random, with distribution governed by (1.1), and the identity, being valid for all (β, σ), secures that accurate confidence intervals at all levels can be read off for the relative risks mse S /mse wide . As we show in Section 3, these confidence distributions also start out with certain pointmasses at observable minimum positions, say rr S,min . Natural confidence intervals for the relative risks rr S therefore do not take the usual form of rr S ± z α , say, an estimate along with a plus-minus estimated error. Instead, confidence intervals induced by the confidence distributions

(1.3) are often asymmetric, and sometimes start out at the indicated minimum possible value; see indeed Figure 3.1. q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q 0.0 ). Submodels to the far right are those in which we place trust in their ability to do better than the wide model. See Table 1.1 for identification of the best models and their conf(S) scores and estimates.

Two tools flowing from these insights are as follows. First, one may use the median confidence estimators rr 0.50 S = C -1 S (0.50, data) as FRIC scores when constructing the FRIC plots. This bypasses certain difficulties otherwise encountered, related to a decision of whether one needs to truncate nonnegative estimates of squared biases to zero or not; see the discussion of Cunen & Hjort (2020). Second

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut