Computing value of information (VOI) is a crucial task in various aspects of decision-making under uncertainty, such as in meta-reasoning for search; in selecting measurements to make, prior to choosing a course of action; and in managing the exploration vs. exploitation tradeoff. Since such applications typically require numerous VOI computations during a single run, it is essential that VOI be computed efficiently. We examine the issue of anytime estimation of VOI, as frequently it suffices to get a crude estimate of the VOI, thus saving considerable computational resources. As a case study, we examine VOI estimation in the measurement selection problem. Empirical evaluation of the proposed scheme in this domain shows that computational resources can indeed be significantly reduced, at little cost in expected rewards achieved in the overall decision problem.
Deep Dive into Rational Value of Information Estimation for Measurement Selection.
Computing value of information (VOI) is a crucial task in various aspects of decision-making under uncertainty, such as in meta-reasoning for search; in selecting measurements to make, prior to choosing a course of action; and in managing the exploration vs. exploitation tradeoff. Since such applications typically require numerous VOI computations during a single run, it is essential that VOI be computed efficiently. We examine the issue of anytime estimation of VOI, as frequently it suffices to get a crude estimate of the VOI, thus saving considerable computational resources. As a case study, we examine VOI estimation in the measurement selection problem. Empirical evaluation of the proposed scheme in this domain shows that computational resources can indeed be significantly reduced, at little cost in expected rewards achieved in the overall decision problem.
URPDM2010
Rational Value of Information Estimation for Measurement Selection
David Tolpin
Computer Science Dept., Ben-Gurion University, 84105 Beer-Sheva, Israel
Solomon Eyal Shimony
Computer Science Dept., Ben-Gurion University, 84105 Beer-Sheva, Israel
ABSTRACT.
Computing value of information (VOI) is a crucial task in various aspects of
decision-making under uncertainty, such as in meta-reasoning for search; in selecting measurements
to make, prior to choosing a course of action; and in managing the exploration vs. exploitation
tradeoff. Since such applications typically require numerous VOI computations during a single run,
it is essential that VOI be computed efficiently. We examine the issue of anytime estimation of VOI,
as frequently it suffices to get a crude estimate of the VOI, thus saving considerable computational
resources. As a case study, we examine VOI estimation in the measurement selection problem.
Empirical evaluation of the proposed scheme in this domain shows that computational resources
can indeed be significantly reduced, at little cost in expected rewards achieved in the overall decision
problem.
1
INTRODUCTION
Problems of decision-making under uncertainty frequently contain cases where information can be
obtained using some costly actions, called measurement actions. In order to act rationally in the
decision-theoretic sense, measurement plans are typically optimized based on some form of value
of information (VOI). Computing VOI can also be computationally intensive. Since frequently an
exact VOI is not needed in order to proceed (e.g. it is sufficient to determine that the VOI of a
certain measurement is much lower than that of another measurement, at a certain point in time),
significant computational resources can be saved by controlling the resources used for estimating
the VOI. This paper examines this tradeoffvia a case study of measurement selection.
In general,
computation of value of information (VOI), even under the commonly used
simplifying myopic assumption, involves multidimensional integration of a general function
[Russell and Wefald, 1991].
For some problems,
the integral can be computed efficiently
[Russell and Wefald, 1989]; but when the utility function is computationally intensive or when
a non-myopic estimate is used, the time required to compute the value of information can be sig-
nificant [Heckerman et al., 1993] [Bilgic and Getoor, 2007] and must be taken into account while
computing the net value of information. This paper presents and analyzes an extension of the
known greedy algorithm that decides when to recompute VOI of each of the measurements based
on the principles of limited rationality [Russell and Wefald, 1991].
Although
it
may
be
possible
to
use
this
idea
in
more
general
settings,
this
paper
mainly examines on-line most informative measurement selection [Krause and Guestrin, 2007]
[Bilgic and Getoor, 2007], an approach which is commonly used to solve problems of optimiza-
tion under uncertainty [Zheng et al., 2005] [Krause et al., 2008]. Since this approach assumes that
the computation time required to select the most informative measurement is negligible compared
to the measurement time[Russell and Wefald, 1991], it is important in this setting to ascertain that
VOI estimation indeed does not consume excessive computational resources.
1
arXiv:1003.5305v2 [cs.AI] 16 Apr 2010
URPDM2010
2
THE MEASUREMENT SELECTION PROBLEM
As our case study, we examine the following optimization problem. Given:
• A set of Ns items S = {s1, s2, . . . , sNs}.
• A set of Nf item features Z = {z1, z2, . . . , zNf }. (Each feature zi has a domain D(zi).)
• A joint distribution over the features of the items in S. That is, a joint distribution over the
random variables {z1(s1), z2(z1), . . . , z1(s2), z2(s2), . . .}.
• A set of measurement types M = {(c, p)k k ∈1..Nm}, with potentially different intrinsic
measurement cost c and observation distribution p, conditional on the true feature values, for
each measurement type.
• A utility function u(z): RNf →R on features. In the simplest case, there is just one real-
valued feature, acting as the item’s utility value, and u is simply the identity function.
• A measurement budget C.
Find a policy of measurement decisions and a final selection that maximize the expected net utility
of the selection (the expected reward):
max:R = u(z(sα)) −
Nq
X
i=1
cki
s.t.:
Nq
X
i=1
cki ≤C
(1)
where Q = {(ki, si) i ∈1..Nq} is the performed measurement sequence and sα is the selected
item. A next measurement is selected on-line, after the outcomes of all preceding measurements
are known.
The above selection problem is intractable, and is therefore commonly solved approximately using
a greedy heuristic algorithm. The greedy algorithm selects a measurement mjmax with the greatest
net value of information Vjmax. The net value of information is the difference between the intrinsic
value of information and the measurement cost.
Vj = Λj −ckj
(2)
The intrinsic valu
…(Full text truncated)…
This content is AI-processed based on ArXiv data.