APPEARED IN BULLETIN OF THE

이 논문은 정보 기반 복잡도에 대한 관점을 제공합니다. 이론은 완전한 정보가 없는 무한 차원 문제를 다룹니다. 이러한 문제를 해결하는 데 사용되는 정보는 partial하고/또는 오류로 오염되었을 수 있습니다.

논문에서는 IBC의 기본 개념과 표준적인 모델을 정의하고, 두 가지 주요 설정 - 최악의 경우 및 평균 사례 - 에서 알고리즘의 복잡도를 정의합니다. 또한, 여러 문제에 대한 IBC를 적용한 예를 제공합니다.

논문은 정보 기반 복잡도와 수치 해석의 차이를 강조하며, 두 분야가 공통 도메인에서 서로 다른 관점을 취한다고 언급합니다. 또한, 이론적 기초로서 IBC의 가치를 제시하고, IBC에 대한 일반적인 비판에 대응하는 방법을 설명합니다.

논문은 IBC를 위한 표준적인 모델과 설정을 정의하고, 여러 문제에 대한 IBC를 적용한 예를 제공하며, 정보 기반 복잡도와 수치 해석의 차이를 강조한다는 점에서 중요하다.

APPEARED IN BULLETIN OF THE

arXiv:math/9201269v1 [math.NA] 1 Jan 1992APPEARED IN BULLETIN OF THEAMERICAN MATHEMATICAL SOCIETYVolume 26, Number 1, Jan 1992, Pages 29-52PERSPECTIVES ON INFORMATION-BASED COMPLEXITYJ. F. Traub and H. Wo´zniakowski1.

IntroductionComputational complexity studies the intrinsic difficulty of mathematically posedproblems and seeks optimal means for their solutions. This is a rich and diversefield; for the purpose of this paper we present a greatly simplified picture.Computational complexity may be divided into two branches, discrete and con-tinuous.

Discrete computational complexity studies problems such as graph the-oretic, routing, and discrete optimization; see, for example, Garey and Johnson[79]. Continuous computational complexity studies problems such as ordinary andpartial differential equations, multivariate integration, matrix multiplication, andsystems of polynomial equations.

Discrete computational complexity often uses theTuring machine model whereas continuous computational complexity tends to usethe real number model.Continuous computational complexity may again be split into two branches. Thefirst deals with problems for which the information is complete.

Problems wherethe information may be complete are those for which the input is specified by afinite number of parameters. Examples include linear algebraic systems, matrixmultiplication, and systems of polynomial equations.

Recently, Blum, Shub andSmale [89] obtained the first NP-completeness result over the reals for a problemwith complete information.The other branch of continuous computational complexity is information-basedcomplexity, which is denoted for brevity as IBC. Typically, IBC studies infinite-dimensional problems.

These are problems where either the input or the outputare elements of infinite-dimensional spaces.Since digital computers can handleonly finite sets of numbers, infinite-dimensional objects such as functions on thereals must be replaced by finite sets of numbers. Thus, complete information is notavailable about such objects.

Only partial information is available when solvingan infinite-dimensional problem on a digital computer. Typically, information iscontaminated with errors such as round-offerror, measurement error, and humanerror.

Thus, the available information is partial and/or contaminated.We want to emphasize this point for it is central to IBC. Since only partial and/orcontaminated information is available, we can solve the original problem only ap-proximately.

The goal of IBC is to compute such an approximation as inexpensivelyas possible.1991 Mathematics Subject Classification. Primary 68Q25.Received by the editors April, 1991This research was supported in part by the National Science Foundation.c⃝1992 American Mathematical Society0273-0979/92 $1.00 + $.25 per page1

2J. F. TRAUB AND H. WO´ZNIAKOWSKIFigure 1In Figure 1 (see p. 30) we schematize the structure of computational complexitydescribed above.Research in the spirit of IBC was initiated in the Soviet Union by Kolmogorovin the late 1940s.

Nikolskij [50], then a graduate student of Kolmogorov, studiedoptimal quadrature. This line of research was greatly advanced by Bakhvalov; see,e.g., Bakhvalov [59, 64, 71].

In the United States research in the spirit of IBCwas initiated by Sard [49] and Kiefer [53]. Kiefer reported the results of his 1948MIT Master’s Thesis that Fibonacci sampling is optimal when approximating themaximum of a unimodal function.

Sard studied optimal quadrature. Golomb andWeinberger [59] studied optimal approximation of linear functionals.

Schoenberg[64] realized the close connection between splines and algorithms optimal in thesense of Sard.IBC is formulated as an abstract theory and it has applications in numerousareas. The reader may consult TWW [88]1 for some of the applications.

IBC hasbenefitted from research in many fields. Influential have been questions, concepts,and results from complexity theory, algorithmic analysis, applied mathematics, nu-merical analysis, statistics, and the theory of approximation (particularly the workon n-widths and splines).In this paper we discuss, in particular, IBC research for two problems of numer-ical analysis.

We first contrast IBC and numerical analysis, limiting ourselves tojust one characteristic of each.IBC is a branch of computational complexity, and optimal (or almost optimal)information and algorithms are obtained from the theory. In numerical analysis,particular classes of algorithms are carefully analyzed to see if they satisfy certaincriteria such as convergence, error bounds, efficiency, and stability.Numerical analysis and IBC have different views on the problems which lie intheir common domain.

The authors of this paper have worked in both numericalanalysis and IBC, and believe the viewpoints are not right or wrong, just different.On the other hand, in many research groups around the world, people work onboth numerical analysis and IBC, and do not draw a sharp distinction between thetwo. They believe IBC can serve as part of the theoretical foundation of numericalanalysis.We believe there might be some profit in discussing the views of numerical anal-ysis and IBC.

Unfortunately Parlett [92]2 does not serve this purpose since, aswe shall show, this paper ignores relevant literature and is mistaken on issues of1When one of us is a coauthor, the citation will be made using only initials.2Citation to this paper will be made using only an initial.

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY3complexity theory.For example, P [92] contains a central misconception about IBC which immedi-ately invalidates large portions of the paper. P [92] assumes that the informationis specified (or fixed).

Indeed, the first “high level criticism” is that IBC “is notcomplexity theory” (see P [92, 2.A]), since “specified information” is used.But it is the very essence of IBC that both the information and the algorithmsare varied.Indeed, one of the central problems of IBC is the optimal choice ofinformation. Significant portions of three monographs, TW [80] and TWW [83,88], all of which are cited in P [92], are devoted to this issue.

We return to thisissue in §3 after notation has been established.In P [92], the author limits himself to “matrix computations, which is the areawe understand best.”We do not object to discussing matrix computations, al-though they constitute a small fraction and are atypical of IBC. For example, inthe recent monograph TWW [88], some ten pages, just 2%, are devoted to matrixcomputations.

Matrix computations are atypical since complete information canbe obtained at finite cost. However, even in this particular area, P [92] ignoresrelevant literature and does not exhibit a grasp of the complexity issues.

Since thediscussion will, of necessity, assume some rather technical details concerning matrixcomputations, we will defer it to §§5 and 6.We stress that we are not questioning the importance of matrix computations.On the contrary, they play a central role in scientific computation. Furthermore,we believe there are some nice results and deep open questions regarding matrixcomputations in IBC.But the real issue is, after all, IBC in its entirety.

P [92] is merely using the twopapers TW [84] and Kuczy´nski [86] on matrix computations to criticize all of IBC.We therefore respond to general criticisms in §§3 and 4.To make this paper self-contained we briefly summarize the basic concepts ofIBC in §2. Section 7 deals with possible refinements of IBC.

A summary of ourrebuttal to criticisms in P [92] is presented in §8.2. Outline of IBCIn this section we introduce the basic concepts of IBC and define the notationwhich will be used for the remainder of this paper.We illustrate the conceptswith the example of multivariate integration, a typical application of IBC.

A moredetailed account may be found in TWW [88]. Expository material may be foundin W [85], PT [87], PW [87], and TW [91].

LetS : F →G,where F is a subset of a linear space and G is a normed linear space. We wish tocompute an approximation to S(f ) for all f from F.Typically, f is an element from an infinite-dimensional space and it cannot berepresented on a digital computer.

We therefore assume that only partial informa-tion3 about f is available. We gather this partial information about f by computinginformation operations L(f ), where L ∈Λ.

Here the class Λ denotes a collectionof information operations that may be computed. We illustrate these concepts byan example.3For simplicity, we will not consider contaminated information in this paper.

4J. F. Traub and H. Wo´zniakowskiExample: Multivariate integration.

Let F be a unit ball of the Sobolev classW r,dpof real functions defined on the d-dimensional cube D = [0, 1]d whose rthdistributional derivatives exist and are bounded in Lp norm. Let G = R andS(f ) =ZDf(t) dt.Assume pr > d. To approximate S(f ), we assume we can compute only functionvalues.

That is, the class Λ is a collection of L: F →R, such that for some x fromD, L(f ) = f(x), ∀f ∈F.□For each f ∈F, we compute a number of information operations from the classΛ. LetN(f ) = [L1(f ), L2(f ), .

. .

, Ln(f )],Li ∈Λ,be the computed information about f. We stress that the Li as well as the num-ber n can be chosen adaptively.That is, the choice of Li may depend on thealready computed L1(f ), L2(f ), . .

. , Li−1(f ).

The number n may also depend onthe computed Li(f ). (This permits arbitrary termination criteria.

)N(f ) is called the information about f, and N the information operator. Ingeneral, N is many-to-one, and that is why it is impossible to recover the element f,knowing y = N(f ) for f ∈F.

For this reason, the information N is called partial.Having computed N(f ), we approximate S(f ) by an element U(f ) = φ(N(f )),where φ: N(F) →G. A mapping φ is called an algorithm.The definition of error of the approximation U depends on the setting.Werestrict ourselves here to only two settings.

In the worst case settinge(U) = supf∈F∥S(f ) −U(f )∥,and in the average case setting, given a probability measure µ on F,e(U) = ZF∥S(f ) −U(f )∥2µ(df )1/2.Example (continued). The information is given byN(f ) = [f(x1), f(x2), .

. .

, f(xn)]with the points xi and the number n adaptively chosen. An example of an algorithmis a linear algorithm given by U(f ) = φ(N(f )) = Pni=1 ai f(xi) for some numbersai.In the worst case setting, the error is defined as the maximal distance |S(f ) −U(f )| in the set F. In the average case setting, the error is the L2 mean of |S(f )−U(f )| with respect to the probability measure µ.

The measure µ is sometimestaken as a truncated Gaussian measure.□To define the computational complexity we need a model of computation. It isdefined by two assumptions:(1) We are charged for each information operation.

That is, for every L ∈Λand for every f ∈F, the computation of L(f ) costs c, where c is positiveand fixed, independent of L and f.

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY5(2) Let Ωdenote the set of permissible combinatory operations including theaddition of two elements in G, multiplication by a scalar in G, arithmeticoperations, comparison of real numbers, and evaluations of certain elemen-tary functions. We assume that each combinatory operation is performedexactly with unit cost.In particular, this means that we use the real number model, where we canperform operations on real numbers exactly and at unit cost.

Modulo roundoffsand the very important concept of numerical stability, this corresponds to floatingpoint arithmetic widely used for solving scientific computational problems.We now define the cost of the approximations. Let cost(N, f ) denote the cost ofcomputing the information N(f ).

Note that cost(N, f ) ≥c n, and the inequalitymay occur since adaptive selection of Li and n may require some combinatoryoperations.Knowing y = N(f ), we compute U(f ) = φ(y) by combining the informationLi(f ). Let cost(φ, y) denote the number of combinatory operations from Ωneededto compute φ(y).

We stress that cost(N, f ) or cost(φ, y) may be equal to infinityif N(f ) or φ(y) use an operation outside Ωor infinitely many operations from Λor Ω, respectively.The cost of computing U(f ), cost(U, f ), is given bycost(U, f ) = cost(N, f ) + cost(φ, N(f )).Depending on the setting, the cost of U is defined as follows. In the worst casesettingcost(U) = supf∈Fcost(U, f ),and in the average case settingcost(U) =ZFcost(U, f ) µ(df ).We are ready to define the basic notion of ε-complexity.

The ε-complexity isdefined as the minimal cost among all U with error at most ε,comp(ε) = inf{cost(U): U such that e(U) ≤ε}. (Here we use the convention that the infimum of the empty set is taken to beinfinity.

)Depending on the setting, this defines the worst case or average caseε-complexity.We stress that we take the infimum over all possible U for which the error doesnot exceed ε.Since U can be identified with the pair (N, φ), where N is theinformation and φ is the algorithm that uses that information, this means thatwe take the infimum over all information N consisting of information operationsfrom the class Λ, and over all algorithms φ that use N such that (N, φ) computesapproximations with error at most ε.Remark. The complexity depends on the set Λ of permissible information oper-ations and on the set Ωof permissible combinatory operations.Both sets arenecessary to define the complexity of a problem.

This is beneficial because thedependence of complexity on Λ and Ωenriches the theory; it enables us to study

6J. F. Traub and H. Wo´zniakowskithe power of specified information or combinatory operations.

We illustrate therole of Λ and Ωby a number of examples.We begin with the role of Λ. Assume that F is a subset of a linear space offunctions.

Let Λ1 consist of all linear functionals, and let Λ2 consist of functionevaluations. For many applications Λ2 is more practical.

Let Ωbe defined as above.Consider the integration example.For this problem, Λ1 is not a reasonablechoice since any integral could be computed exactly with cost c. For Λ2, we get themultivariate integration problem discussed in this section.Consider next the approximate solution of 2mth-order elliptic linear partial dif-ferential equations whose right-hand side belongs to the unit ball of Hr(D) for abounded simply-connected C∞region D of Rd. Let G = Hm(D).

Werschulz hasshown that the worst case complexity in the class Λ1 is proportional to ε−d/(r+m),and in the class Λ2 it is proportional to ε−d/r; a thorough study of this subject maybe found in the research monograph Werschulz [91]. Thus, the complexity penaltyfor using Λ2 rather than Λ1 goes to infinity as ε goes to zero for m > 0; see alsoTWW [88, Chapter 5, Theorem 5.9].

On the other hand, Werschulz has shownthat the complexity of Fredholm integral equations of the second kind is roughlythe same for Λ1 and Λ2; see Werschulz [91] as well as TWW [88, Chapter 5, §6].We now illustrate the role of Ωfor the approximate solution of scalar complexpolynomial equations of degree d using complete information, i.e., Λ consists ofthe identity mapping. Let Ω1 consist of the four arithmetic operations (over thecomplex field), and let Ω2 consist of the four arithmetic operations and complexconjugation.

We confine ourselves to purely iterative algorithms. Then for d ≥4,McMullen [85] proved that the problem cannot be solved for the class Ω1, whereasShub and Smale [86] proved that the problem can be solved for the class Ω2.

Thepositive result of Shub and Smale [86] also holds for systems of complex multi-variate polynomials of degree d. Hence, the arithmetic operations are too weak forapproximate polynomial zero finding, whereas also permitting complex conjugationsupplies enough power to solve the problem.□Example (continued). For the integration problem, the model of computationstates that one function evaluation costs c, and each arithmetic operation, com-parisons of real numbers, and evaluations of certain elementary functions can beperformed exactly at unit cost.

Usually c ≫1.The worst case ε-complexity for the unit ball of W r,dpis as follows. For pr > d,comp(ε) = Θ(cε−d/r)as ε →0;see Novak [88] for a recent survey.

Take p = +∞. Then for d large relative to r, theworst case ε-complexity is huge even for moderate ε.

Furthermore, if only continuityof functions is assumed, then the problem cannot be solved since comp(ε) = +∞.For the average case setting, let F be the unit ball in the sup norm of continuousfunctions. Let µ be a truncated classical Wiener sheet measure; see, e.g., TWW[88, p. 218].

Then using results from number theory concerning discrepancy (seeRoth [54, 80]), we havecomp(ε) = Θ(cε−1(log ε−1)(d−1)/2)as ε →0;see W [87, 91]. Thus, the average case complexity depends only mildly on thedimension d. (The same Θ result holds if the unit ball is replaced by the entire

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY7space of continuous functions.) To get an approximation with cost proportional tocomp(ε), it is enough to compute the arithmetic mean n−1 Pni=1 f(xi), where n =Θ(ε−1(log ε−1)(d−1)/2), and the points xi are derived from Hammersley points.□A goal of IBC is to find or estimate the ε-complexity, and to find an ε-complexityoptimal U, or equivalently, an ε-complexity optimal pair (N, φ).

By ε-complexityoptimality of U we mean that the error of U is at most ε and the cost of U is equalto, or not much greater than, the ε-complexity. For a number of problems this goalhas been achieved due to the work of many researchers.Many computational problems can be formulated using the approach outlinedabove.

For some problems, including the two matrix computation problems dis-cussed in P [92], we need a more general formulation. We now briefly discuss thismore general formulation; details can be found in TWW [83, 88].Let F and G be given sets, and W be a given mappingW : F × [0, +∞) →2G.We assume that W(f, 0) is nonempty and grows as ε increases, i.e., for any ε1 ≤ε2we have W(f, ε1) ⊂W(f, ε2), ∀f ∈F.We now wish to compute an element U(f ) which belongs to W(f, ε) for allf ∈F.

The definitions of U as well as the cost of U are unchanged. The error ofU is now defined as follows.

The error of U for f from F ise(U, f ) = inf{η : U(f ) ∈W(f, η)}.Then the error of U is defined as e(U) = supf∈F e(U, f ) in the worst case setting,and e(U) = (RF e2(U, f ) µ(df ) )1/2 in the average case setting. Note that forW(f, ε) = {g ∈G : ∥S(f ) −g∥≤ε}we have e(U, f ) = ∥S(f ) −U(f )∥and the two formulations coincide.Finally, we illustrate how the two matrix computation problems fit in this for-mulation.

(i)Large linear systems. We wish to approximate the solution of a large linearsystem Az =b by computing a vector x with residual at most ε, ∥Ax−b∥≤ε.

Here,b is a given vector, ∥b∥= 1, and A belongs to a class F of n × n nonsingularmatrices. The vectors x are computed by using matrix-vector multiplications Azfor any vector z.This problem corresponds to taking G = Rn andW(A, ε) = {x ∈G : ∥Ax −b∥≤ε},∀A ∈F.The class Λ of information operations is now given byΛ = {L : F →Rn : there exists a vector z ∈Rnsuch that L(A) = Az, ∀A ∈F}.

8J. F. TRAUB AND H. WO´ZNIAKOWSKI(ii)Eigenvalue problem.For a matrix A from a class F of n × n symmetric matrices, we wish to computean approximate eigenpair (x, λ), where x ∈Rn with ∥x∥= 1, and λ ∈R, such that∥Ax −λ x∥≤ε ∥A∥.As in (i), the pairs (x, λ) are computed by using matrix-vector multiplications.This problem corresponds to taking G = Bn × R, where Bn is the unit sphereof Rn, andW(A, ε) = {(x, λ) ∈G : ∥Ax −λ x∥≤ε ∥A∥},∀A ∈F.The class Λ is the same as in (i).3.

The role of informationInformation is central to IBC. We indicate briefly why the distinction betweeninformation and algorithm is so powerful.

We then respond to two general criticismsin P [92] regarding information.As explained in §2, the approximation U(f ) is computed by combining informa-tion operations from the class Λ. Let y = N(f ) denote this computed information.In general, the operator N is many-to-one, and therefore the set N −1(y) consistsof many elements of F that cannot be distinguished from f using N. Then the setSN −1(y) consists of all elements from G which are indistinguishable from S(f ).Since U(f ) is the same for any f from the set N −1(y), the element U(f ) mustserve as an approximation to any element g from the set SN −1(y).

It is clear thatthe quality of the approximation U(f ) depends on the “size” of the set SN −1(y).In the worst case setting, define the radius of information r(N) as the maximalradius of the set SN −1(y) for y ∈N(F). (The radius of the set A is the radius ofthe smallest ball which contains the set A.

)Clearly, the radius of information r(N) is a sharp lower bound on the worst caseerror of any U. We can guarantee an ε-approximation iffr(N) does not exceed ε(modulo a technical assumption that the corresponding infimum is attained).The cost of computing N(f ) is at least cn, where c stands for the cost of oneinformation operation, and n denotes their number in the information N. By the ε-cardinality number m(ε) we mean the minimal number n of information operationsfor which the information N has radius r(N) at most equal to ε.

From this we geta lower bound on the ε-complexity in the worst case setting,comp(ε) ≥cm(ε).For some problems (see TWW [88, Chapter 5, §5.8]) it turns out that it is possibleto find an information operator Nε consisting of m(ε) information operations, anda mapping φε such that the approximation U(f ) = φε(Nε(f )) has error at most εand U(f ) can be computed with cost at most (c + 2) m(ε). This yields an upperbound on the ε-complexity,comp(ε) ≤(c + 2) m(ε).Since usually c ≫1, the last two inequalities yield the almost exact value of theε-complexity,comp(ε) ≃cm(ε).

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY9This also shows that the pair (Nε, φε) is almost ε-complexity optimal.In each setting of IBC one can define a radius of information such that we canguarantee an ε-approximation iffr(N) does not exceed ε; see TWW [88]. Thispermits one to obtain complexity bounds in other settings.What is the essence of this approach?

The point is that the radius of informationas well as the ε-cardinality number m(ε) and the information Nε do not dependon particular algorithms, and they can often be expressed entirely in terms of well-known mathematical concepts. Depending on the setting and on the particularproblem, the radii of information, the ε-cardinality numbers, and the informationNε are related to Kolmogorov and Gelfand n-widths, ε-entropy, the traces of cor-relation operators of conditional measures, discrepancy theory, the minimal normof splines, etc.In summary, there are two reasons why one can sometimes obtain sharp boundson ε-complexity in IBC.

The first is the distinction between information and algo-rithm. The second is that, due to this distinction, one can draw on powerful resultsin pure and applied mathematics.We now respond to two central criticisms in P [92] regarding information.

Heasserts:(i) The information is specified (or given) and therefore this “is not complexitytheory;” see P [92, 2.A]. (ii) There is an “artificial distinction between information and algorithm;” seeP [92, 1].

(i) P [92] repeatedly asserts that the information is “specified” or “given.” Wehave already referred to this misconception in our introduction and will amplifyour response here.Varying the information and the algorithms is characteristic of IBC. (For prob-lems for which information is complete, i.e., N is one-to-one, only the algorithmscan be varied.) The definition of computational complexity in our work always en-tails varying both information and algorithms; see, for example, TW [80, Chapter1, Definition 3.2], TWW [83, Chapter 5, §3], W [85, 2.5], PW [87, II], TWW [88,Chapter 3, §3].Furthermore the study of optimal information, which of course makes sense onlyif the information is being varied, is a constant theme in our work; see, for example,TW [80, Chapters 2 and 7], TWW [83, Chapter 4], W [85, 3.5], PW [87, III D, VC], TWW [88, Chapter 4, §5.3, Chapter 6, §5.5].Here, we have responded to criticism (i) in general.

In §§5 and 6 we respond forthe case of matrix computations. (ii.1) P [92, 1] claims there is an “artificial distinction between information andalgorithm.” That is, he argues that writing the approximation U(f ) = φ(N(f )) issometimes restrictive.

We are surprised that he does not produce a single exampleto back his claim. (ii.2) P [92, Abstract] states that “a sharp distinction is made between infor-mation and algorithms restricted to this information.

Yet the information itselfusually comes from an algorithm and so the distinction clouds the issues and canlead to true but misleading inferences.”We once again explain our view of the issues involved here using a simple inte-gration example.As in §2 assume that we can compute function values. How can we approximate

10J. F. TRAUB AND H. WO´ZNIAKOWSKIthe integral of f?

The approximation U(f ) can be computed by evaluating f ata number of points, say at x1, x2, . .

. , xn, and then the computed values f(xi) arecombined to get U(f ).

Computations involving f(xi), the adaptive selection of thepoints xi, and the adaptive choice of n constitute the information N(f ). Denotingby φ the mapping which combines N(f ), we get U(f ) = φ(N(f )).We do not understand why this is restrictive, why it clouds the issues, and whyit leads to “true but misleading inferences.” As explained in the first part of thissection, the distinction between information and algorithm sometimes enables usto find sharp bounds on complexity.4.

The domain FA basic concept in IBC is the domain F. A central criticism of IBC in P [92]concerns F. The assertion is that there are two difficulties with F:(i) There is no need for F.(ii) There should be a charge for knowing membership in F.Concerning (i), the second “high level criticism” P [92, 2.B] states:“The ingredient of IBCT that allows it to generate irrelevant results is the prob-lem class F. F does not appear in our brief description of the theory in the secondparagraph of §1 because it is not a logically essential ingredient but rather a pa-rameter within IBCT.”Concerning (ii), P [92, Abstract] states:“By overlooking F’s membership fee the theory sometimes distorts the economicsof problem solving in a way reminiscent of agricultural subsidies.”First, why is F needed? (i.1) The set F is necessary since it is the domain of the operator S, or part ofthe domain of the operator W.One need not say anything further; an operator must have a domain.

Neverthe-less we will add a few additional points regarding the domain F.(i.2) For discrete or finite-dimensional problems one can sometimes take the“maximal” set as F. Thus, in studying the complexity of matrix multiplicationsone usually takes F as the set of all n × n matrices. In graph-theoretic complexityone often takes F as the set of all graphs (V, E), where V is the set of vertices andE is the set of edges.However, for infinite-dimensional problems one cannot obtain meaningful com-plexity results if F is too large.

For example, the largest F one might take for inte-gration is the set of Lebesgue-integrable functions, but then comp(ε) = +∞, ∀ε ≥0in the worst case setting. The ε-complexity remains infinite even if F is the set ofcontinuous functions.To make the complexity of an infinite-dimensional problem finite, one must takea smaller F in the worst case setting or switch to the average case setting.

Thus,as we saw in §2, in the average case setting with a Wiener measure, the complexityis finite even if F is the set of continuous functions. (i.3) The use of F is not confined to IBC.

In discrete computational complexityresearchers often use a set F which is smaller than the maximal set. For example, ifF is the set of all graphs then many problems are NP-complete.

If F is a specifiedsmaller set, then depending on the problem it may remain NP-complete or it maybe solvable in polynomial time. See, for example, Garey and Johnson [79].

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY11(i.4) We believe the dependence of complexity on F is part of the richness of IBC.For example, in the integration problem it is interesting to know how complexitydepends on the number of variables and the smoothness of the integrands. (i.5) For a moment, we specialize our remarks to matrix computations.

One couldstudy the complexity of large linear systems for the set F of all invertible matricesof order n. Then to compute an ε-approximation one would have to recover thematrix A by computing n matrix-vector multiplications; this is a negative result.We find criticism (i) particularly odd since an entire book, Parlett [80], is devotedto only the eigenvalue problem for symmetric matrices. The reason is, of course,that the algorithms and the analysis for the symmetric eigenvalue problem are verydifferent than for arbitrary matrices.

But then why is the concept of F so elusive?Researchers in numerical linear algebra often consider other important subsetsof matrices such as tridiagonal, Toeplitz, or Hessenberg matrices.We turn to the criticism that there should be a charge for knowing membershipin F.(ii.1) Is IBC being held to a higher standard? Do researchers in other disciplinescharge for F?

For example, researchers in numerical analysis often analyze the costand error of important algorithms. The analysis depends on F. To give a simpleexample, the analysis of the composite trapezoidal rule usually requires that thesecond derivative of the integrand is bounded.

There is no charge for membershipin F. Indeed, how would one charge for knowing that a function has a boundedsecond derivative? (ii.2) We believe that P [92] confuses two different problems:(a) approximation of S(f ) for f from F,(b) the domain membership problem; that is, does f belong to F?Domain membership is an interesting problem which may be formulated withinthe IBC framework, although it has nothing to do with the original problem ofapproximating S(f ) for f ∈F.We outline how this may be done.First, to make the domain membershipproblem meaningful we must define the domain of f, say the set F, in such a waythat the logical values of f ∈F vary with f from F, i.e., ∅̸= F ∩F ̸= F. LetS : F →{0, 1} ⊂R be given byS(f ) = χF (f ),∀f ∈F,where χF is the characteristic (indicator) function of F.Then the problem is to compute S(f ) exactly or approximately.

Observe thatwe now assume that f ∈F just as we assumed that f ∈F for problems of type (a).For the domain membership problem we charge for computing an approximationto S(f ), and the complexity of the domain membership problem is the minimalcost of verifying whether f ∈F.In the worst case setting, only the exact computation of S(f ) makes sense sincefor ε ≥12 the problem is trivial, and for ε < 12 it is the same as for ε = 0. Howeverfor the average case or probabilistic settings, an ε-approximation may be reasonable.For instance we may wish to compute S(f ) with probability 1 −ε.It is easy to see that, in general, the domain membership problem cannot besolved in the worst case setting.

To illustrate this, let F be the set of continousfunctions, and let F be the set of r times continuously differentiable functions,r ≥1. Let the class Λ of information operations consist of function values.

It is

12J. F. TRAUB AND H. WO´ZNIAKOWSKIobvious that knowing n values of f, no matter how large n may be, there is no wayto verify whether f is a member of F.The domain membership problem can be studied in the average case or prob-abilistic settings.

Its complexity may be large or small depending on F and F.An example of work for this problem is Gao and Wasilkowski [90] who study aparticular domain membership problem. (ii.3) Finally, we are at a loss to understand the following sentence from P [92,2.B], “Whenever F is very large (for example, the class of continuous functions orthe class of invertible matrices) then it is realistic to assign no cost to it.” Why isit realistic to assign no cost for “large” F, and why is it necessary to assign cost to“small” F?

Where is the magic line which separates large F from small F?5. Large linear systemsWe briefly describe IBC research on large linear systems and then respond tothe criticisms in P [92].

LetAx = b,where A ∈F, and F is a class of n × n nonsingular matrices. Here b is a knownn × 1 vector normalized such that ∥b∥= 1, and ∥· ∥stands for the spectral norm.Our problem is defined as follows.

For any A ∈F and any ∥b∥= 1 compute anε-approximation x,∥Ax −b∥≤ε.Usually A is sparse and therefore Az can be computed in time and storageproportional to n.It is therefore reasonable for large linear systems to assumethat the class Λ of information operations consists of matrix-vector multiplications.That is, we can compute Az1, Az2, . .

. , Azk, where zi may depend on the knownvector b and on the previously computed vectors Az1, .

. .

, Azi−1. To stress that theright-hand side vector b is known we slightly abuse the notation of §2 and denoteNk(A, b) = [b, Az1, .

. .

, Azk],A ∈F,(5.1)as the information about the problem. The number k is called the cardinality ofinformation.

For this to be of interest, we need k ≪n.Krylov information is the special case when we take z1 = b and zi = Azi−1.Thus Krylov information is given byN Krk (A, b) = [b, Ab, . .

. , Akb].In what follows we will use the concept of orthogonal invariance of the class F.The class F is orthogonally invariant iffA ∈FimpliesQTA Q ∈Ffor any orthogonal matrix Q, i.e., satisfying QTQ = I.Examples of orthogonally invariant classes include many of practical interestsuch as symmetric matrices, symmetric positive definite matrices, and matriceswith uniformly bounded condition numbers.We first discuss optimal information for large linear systems which is defined asfollows.

The ε-cardinality number m(ε) (see §3) denotes now the minimal cardi-nality k of all information Nk of the form (5.1) with r(Nk) ≤ε. Obviously, m(ε)

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY13depends on the class F and the class Λ. The information N ∗k is optimal iffk = m(ε)and r(N ∗k ) ≤ε.Remark.

In §2 we define the ε-complexity optimality of a pair (N, φ). In this sectionoptimality of information N ∗k is introduced.

How are these two optimality notionsrelated?In general, they are not. However, as already indicated in §2, for many prob-lems the cost of computing N ∗k(A, b) is proportional to cm(ε) and there exists analgorithm φ∗that uses N ∗k and has error ε and combinatory cost proportional tom(ε).

Then the pair (N ∗k, φ∗) is (almost) ε-complexity optimal. In this case, thetwo notions of optimality coincide and the complexity analysis reduces to the prob-lem of finding optimal information.

Details may be found in TWW [88, Chapter 4,§4].□In TW [84] we conjecture that for the class Λ of matrix-vector multiplicationsand for any orthogonally invariant F, Krylov information is optimal.Chou [87], based on Nemirovsky and Yudin [83], shows that Krylov informationis optimal modulo a multiplicative factor of 2. More precisely, let mKr(ε) denotethe minimal cardinality k of Krylov information for which r(N Krk ) ≤ε.

For anyorthogonally invariant class F, we havem(ε) ≤mKr(ε) ≤2 m(ε) + 2.Recently, Nemirovsky [91] shows that for a number of important orthogonally in-variant classes F and for m(ε) ≤12(n −3), Krylov information is optimal,m(ε) = mKr(ε).We now discuss algorithms that use Krylov information. We recall the definitionof the classical minimal residual (mr) algorithm; see, e.g., Stiefel [58].The mralgorithm, φmr, uses Krylov information N Krk (A, b) and computes the vector xksuch that∥Axk −b∥= min{∥Wk(A)b∥: Wk is a polynomialof degree ≤k and Wk(0) = 1}.Thus, by definition the mr algorithm minimizes the residual in the class of polyno-mial algorithms.The mr algorithm has many good properties.

Let mKr(ε, φmr) denote the min-imal cardinality of Krylov information needed to compute an ε-approximation bythe mr algorithm. Obviously, mKr(ε) denotes the minimal cardinality of Krylovinformation needed to compute an ε-approximation in the class of all algorithms.For any orthogonally invariant class F, we have (see TW [84])mKr(ε) ≤mKr(ε, φmr) ≤mKr(ε) + 1.These bounds are sharp.

That is, for some F we have mKr(ε) = mKr(ε, φmr), andfor other F we have mKr(ε, φmr) = mKr(ε) + 1.For all practically important cases, mKr(ε) is large and there is no significantdifference between mKr(ε, φmr) and mKr(ε). Therefore the mr algorithm is alwaysrecommended as long as F is orthogonally invariant.

14J. F. TRAUB AND H. WO´ZNIAKOWSKIThe mr algorithm minimizes, up to an additive term of 1, the number of matrix-vector multiplications needed to compute an ε-approximation among all algorithmsthat use Krylov information in an orthogonally invariant class F. In this sense, themr algorithm is Krylov-optimal, or for brevity, optimal.We comment on the mr algorithm.

(1) The mr algorithm computes xk without using the additional properties ofA, A ∈F, given in the definition of the class F. This is desirable sincethe computation of xk is the same for all F. The vector xk can be com-puted by the well-known three-term recurrence formula using at most 10 knarithmetic operations. (2) Although the mr algorithm competes with all algorithms, in particular withalgorithms that may use the additional properties of A given in the definitionof F, the mr algorithm can lose at most one insignificant step.

Equivalently,one may say that for any orthogonally invariant class F, the a priori in-formation about the class F and the fact that A ∈F is worth at most onestep. (3) On the other hand, if F is not orthogonally invariant then the mr algorithmmay lose its good properties.

Example 3.5 of TW [84] provides such a classfor which the worst happens; the mr algorithm takes n steps to solve theproblem, whereas the optimal algorithm, which is nonpolynomial, takes onlyone step.For an orthogonally invariant class F and for the class Λ of matrix-vector mul-tiplications, these results yield that the pair Krylov information and mr algorithmis (almost) ε-complexity optimal in the sense of §2. Furthermore, we have rathertight bounds on the worst case complexity.

More precisely,comp(ε) = camKr(ε, φmr),(5.2)where c is the cost of one matrix-vector multiplication anda ∈[0.5 −1/mKr(ε, φmr) , 1 + 10 n/c].For small ε and c ≫n, we have roughly a ∈[ 12, 1].Because of (5.2), the problem of obtaining the complexity reduces to the problemof finding mKr(ε, φmr). This number is known for some classes F; see TW [84] andTWW [88, Chapter 5, §9].

We discuss two classes:F1 =A : A = AT > 0, and ∥A∥2 ∥A−1∥2 ≤M,F2 =A : A = AT,and ∥A∥2 ∥A−1∥2 ≤M.That is, F1 is the class of symmetric positive definite matrices with condition num-bers bounded uniformly by M. Here M is a given number, M ≥1. The class F2differs from F1 by the lack of positive definiteness.For these two classes, the result of Nemirovsky [91] can be applied and for m(ε) ≤12(n −3) we have better bounds on a; namely a ∈[1 −1/mKr(ε, φmr) , 1 + 10 n/c].Thus, for small ε and c ≫n, a ≃1.For the class F1, we havemKr(ε, φmr) = min(n,&ln(1 + (1 −ε2)1/2)/εln(M 1/2 + 1)/(M 1/2 −1)').

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY15For small ε, large M, and n > M 1/2 ln (2/ε)/2, we havemKr(ε, φmr) ≃√M2ln 2ε.For the class F2, we havemKr(ε, φmr) = minn, 2ln((1 + (1 −ε2)1/2)/ε)ln((M + 1)/(M −1)).For small ε, large M, and n > M ln (2/ε), we havemKr(ε, φmr) ≃M ln 2ε.These formulas enable us to compare the complexities for classes F1 and F2. Forsmall ε, large M, and n > 2M ln (2/ε) + 3, we havecomp(ε, F1)comp(ε, F2) ≃12√M.This shows how positive definiteness decreases the ε-complexity.P [92] has four “high level” criticisms of IBC research on the large linear systemsproblem.

We also select three additional criticisms from P [92, 4]. We shall respondto these seven criticisms.

P [92] contains other misunderstandings and errors re-garding this topic but we will not try the reader’s patience by responding to eachof these. We list the seven criticisms of P [92]:(i) IBC “is not complexity theory” since “the stubborn fact remains that re-stricting information to Krylov information is not part of the linear equa-tions problem” P [92, 2.A].

(ii) “The trouble with this apparent novelty is that it is not possible to evaluatethe residual norm ∥Az −b∥for those external z because there is no knownmatrix A (only Krylov information). So how can an algorithm that producesz verify whether or not it has achieved its goal of making ∥Az −b∥< ε∥b∥”P [92, 2.C].

(iii) “The ingredient of IBCT that allows it to generate irrelevant results is theproblem class F [see paragraph 2 in (A)]. F did not appear in our briefdescription of the theory in the second paragraph of §1 because it is not alogically essential ingredient but rather a parameter within IBCT;” P [92,2.B].

(iv) “IBCT’s suggestion that it goes beyond the well-known polynomial class ofalgorithms is more apparent than real;” P [92, 2.C]. (v) “Here is a result of ours that shows why the nonpolynomial algorithms areof no interest in worst case complexity;” P[92, 4.3].

(vi) “With a realistic class such as SPD (sym, pos.def. )MR is optimal(strongly) as it was designed to be, and as is well known;” P [92, 4.4].

(vii) “The theory claims to compare algorithms restricted solely to informationNj. So how could the Cheb algorithm obtain the crucial parameter ρ?

;” P[92, 4.4].

16J. F. TRAUB AND H. WO´ZNIAKOWSKIWe respond to each of these seven criticisms.

(i) IBC does not restrict information to Krylov information. The optimality ofKrylov information in the class of matrix-vector multiplications is a conclusion, notan assumption.IBC does assume a class Λ of information operations.

The reasons why this isboth necessary and beneficial were discussed in §2. Here we confine ourselves tocertain classes relevant to large linear systems.Let Λ1 denote the class of matrix-vector multiplications.Then as describedabove, for an orthogonally invariant class F we may conclude that Krylov informa-tion is optimal to within a multiplicative factor of at most 2.

Furthemore, we mayconclude that Krylov information and the mr algorithm are almost ε-complexityoptimal. Rather tight bounds have been obtained on the complexity of importantclasses such as F1 and F2, see above.

Additional classes of matrices are studied inTW [84].Let Λ2 denote the class of information operations where inner products of rows(or columns) of A and an arbitrary vector z can be computed. Rabin [72] studiedthe class Λ2 for the exact solution of linear systems, ε = 0, and for an arbitrarynonsingular matrix A.

He proved that, roughly, 12n2 inner products are sufficientto solve the problem. No results are known for ε > 0.Let Λ3 denote the class of information operations consisting of arbitrary linearfunctionals.

Optimality questions for the class Λ3 are posed in TW [84]. No resultsare known and we believe this to be a difficult problem.Let Λ4 denote the class of information operations consisting of continuous non-linear functionals, and let Λ5 denote the class of nonlinear functionals.

In general,complexity results in Λ4 and Λ5 can be different; see Kacewicz and Wasilkowski[86] and Math´e [90]. For linear systems, these classes are too powerful since allentries of the matrix A can be recovered by knowing the value of one continuousnonlinear functional.

Thus, the ε-cardinality number is 1 even for ε = 0; see TW[80, Chapter 7, §3] for related material. (ii) If the class Λ consists of matrix-vector multiplications then, of course, wecan evaluate the residual ∥Az −b∥for any z.

If z is outside of a Krylov subspacethis requires one additional matrix-vector multiplication.On the other hand, it is sometimes possible to guarantee that ∥Az −b∥≤ε,without computing the residual ∥Az −b∥.This can be done by using a prioriinformation that A ∈F and the computed Krylov information. An example ofsuch a situation is provided by the Chebyshev algorithm for the class F = {A =I −B : B = BT, ∥B∥≤ρ < 1}.In general, if the assumptions are satisfied, IBC is predictive.The results ofthe theory guarantee an ε-approximation.

One simply does the amount of workspecified by the upper bound on the complexity. For important classes of matriceswe have seen above that there are rather tight bounds on the complexity.

Thereforethis strategy does not require much more work than necessary.For most problems there is no residual that can be checked. There are residualsfor problems related to solving linear or nonlinear equations.

In the multivariateintegration example of §2, there is no residual that can be computed. Yet, IBCguarantees an ε-approximation by using a priori information about the class F.(iii) We responded in general to the criticism that F is not needed in §4; herewe focus on large linear systems.

On this problem P [92, 2.B] states that “IBCTseems to use F as a tuning parameter designed to keep k < n.”

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY17The domain F is not a tuning parameter; it is needed for the problem to be welldefined. The domain F contains all a priori knowledge about matrices A.

The morewe know a priori, the smaller the domain F becomes, and as F becomes smaller,the problem becomes easier. Furthermore, a priori information is often available inpractice.

For example, matrices which occur in the approximation of elliptic partialdifferential operators are symmetric positive definite, often with known bounds oncondition numbers.Fortunately, many important classes which occur in practice are orthogonallyinvariant and the ε-complexity optimality of Krylov information and the mr algo-rithm may be applied.Of course, numerical analysts use different algorithms for different classes ofmatrices (symmetric, positive definite, tridiagonal, Toeplitz, etc.) It is therefore allthe more surprising that P [92] objects to the concept of the class F.(iv) P [92, 2C] claims that there is no need to go “beyond the well-known polyno-mial class of algorithms.” It should be obvious that all algorithms must be allowedto compete if we want to establish lower bounds on complexity.For orthogonally invariant classes it turns out that the restriction to the polyno-mial class of algorithms does not cause any harm since the classical mr algorithmmay lose at most one insignificant step.

But this had to be proven!In fact, it is not uncommon in computational complexity that the known algo-rithms (that use the specific information) turn out to be optimal or close to optimal.Examples include the Horner algorithm for evaluating a polynomial, the finite ele-ment method with appropriate parameters for elliptic partial differential equations,or the bisection algorithm for approximating a zero of a continuous function thatchanges sign at the interval endpoints.For large linear systems, a sufficient condition for almost ε-complexity optimalityof Krylov information and the mr algorithm is orthogonal invariance of the class F.As mentioned above, Example 3.5 of TW [84] shows that if F is not orthogonallyinvariant, the mr algorithm may lose its optimality. In this example the restrictionto the polynomial class of algorithms is harmful because the optimal algorithm isnonpolynomial.

(v) P [92, 4.3] supports his claim that nonpolynomial algorithms are not inter-esting by the Theorem of §4.3. This theorem holds for the class of SPD of all n × nsymmetric positive definite matrices.

In this theorem it is shown that for everynonpolynomial algorithm which computes an approximation outside the Krylovsubspace for A ∈SPD, there exists a matrix from SPD which has the identicalKrylov information as A and for which the residual is arbitrarily large.We do not understand why the Theorem of §4.3 and the one page sketch of itsproof were supplied. The same statement can be found in Example 3.4 of TW [84].In addition, Example 3.4 shows that polynomial algorithms are also not good forthe class SPD; that is, n matrix-vector multiplications are needed to compute anε-approximation.

The reason neither polynomial nor nonpolynomial algorithms aregood is that the class SPD is too large.We stress that Example 3.4 and the Theorem of §4.3 hold for F =SPD. As men-tioned above, for any orthogonally invariant class F the nonpolynomial algorithmsare not of interest since it has been proven that the mr algorithm is optimal, pos-sibly modulo one matrix-vector multiplication.

Also, as mentioned above, if F isnot orthogonally invariant, a nonpolynomial algorithm may be optimal. (vi) P [92] claims that the mr algorithm is optimal “as it was designed to be” for

18J. F. TRAUB AND H. WO´ZNIAKOWSKIthe class SPD.

This is simply not true. The mr algorithm is defined to be optimalin the class of polynomial algorithms.

Optimality of the mr algorithm in the classof all algorithms for the class SPD requires a proof. (vii) As already explained, the information that A ∈F = {A = I −B : B =BT, ∥B∥≤ρ < 1} is notused by the mr algorithm.

This means that the mralgorithm does not use the parameter ρ which is assumed known a priori and maybe used by competing algorithms.The parameter ρ is used by the Chebyshevalgorithm and that is why the mr algorithm loses one step for the class F. P [92,4.4] turns the positive optimality result for the mr algorithm into the irrelevantquestion “how could the Chebyshev algorithm obtain the crucial parameter ρ?” Bythe way, the parameter ρ is not so crucial if it decreases the number of steps byonly one!6. Large eigenvalue problemP [92] has three “high level” criticisms of the IBC research on the large eigenpairproblem.

He also criticizes the numerical testing. We shall respond to these fourcriticisms.We list the four criticisms of P [92]:(i) Kuczy´nski [86] computes an unspecified eigenvalue; P [92, 2.D].

(ii) IBC “is not complexity theory.” The reason given is that “the stubbornfact remains that restricting information to Krylov information is not part. .

. of the eigenvalue problem;” P [92, 2.A].

(iii) “The fact that b is treated as prescribed data is quite difficult to spot;” P[92, 2.E]. (iv) “The author has worked exclusively with tridiagonal matrices and has for-gotten that the goal of the Lanczos recurrence is to produce a tridiagonalmatrix!

Given such a matrix one has no need of either Lanczos or GMR;”P [92, 5.5].We respond to each of these four criticisms. (i) P [92] is certainly correct in asserting that when only one or a few eigenvaluesof a symmetric matrix are sought, then one typically desires a preassigned eigenvalueor a few preassigned eigenvalues.

To be specific, assume that the largest eigenvalueis to be approximated.It would be desirable to always guarantee that the largest eigenvalue λ1(A) ofa large symmetric matrix A can be computed to within error ε. Unfortunately,this cannot be done with less than n matrix-vector multiplications, that is, withoutrecovering the matrix A; see TWW [88, Chapter 5, §10]. More precisely, let Fdenote the class of all n × n symmetric matrices and let Λ consist of matrix-vectormultiplications.

That is, N(A) = [Az1, . .

. , Azk], where z1 is an arbitrary vectorand zi for i ≥2 may depend arbitrarily on Az1, .

. .

, Azi−1. Then for k ≤n −1,there exists no such N and no algorithm φ which uses N such that U(A) = φ(N(A))satisfies|λ1(A) −U(A)| ≤ε∥A∥,∀A ∈F.We are surprised that although TWW [88] is cited in P [92], he does not seem tobe aware of this result.Thus, the goal of computing an ε-approximation to the largest eigenvalue of alarge symmetric matrix cannot be achieved, if less than n matrix-vector multipli-cations are used.

This is, of course, a worst case result. There are a number of

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY19options for coping with this negative result. One could stay with the worst casesetting but settle for an unspecified eigenvalue.

Or one could give up on the worstcase guarantee and settle for a weaker one. We consider these options in turn.

(i.1) One option is to settle for an unspecified eigenvalue. More precisely, theproblem studied by Kuczy´nski [86] and Chou [87] is defined as follows.

For A ∈F,compute (x, λ) with x ∈Rn, ∥x∥= 1, and λ ∈R, such that∥Ax −λx∥≤ε∥A∥.Chou proved, modulo a multiplicative factor of 2, optimality of Krylov infor-mation N(A) = [Ab, . .

. , Akb], where b is a nonzero vector.

Optimality of Krylovinformation holds independently of the choice of the vector b. Kuczy´nski proved,modulo an additive term of 2, optimality of the generalized minimal residual (gmr)algorithm that uses Krylov information. (Optimality of Krylov information andthe gmr algorithm is understood as in §5.

These optimality results hold for anyorthogonally invariant class of matrices. )Since the gmr algorithm has small combinatory cost, we conclude that the pairKrylov information and gmr algorithm is (almost) ε-complexity optimal.

Kuczy´nskifound good bounds on the worst case error of the gmr algorithm. Hence, for n >ε−1, the worst case ε-complexity is given bycomp(ε) = acε ,where a roughly belongs to [ 14, 1] and, as before, c is the cost of one matrix-vectormultiplication.

(i.2) A second option is to attempt to approximate the largest eigenvalue butto settle for a weaker guarantee. KW [89]4 study this problem in the randomizedsetting.

(See, e.g., TWW [88, Chapter 11] for a general discussion of the randomizedsetting. )In particular, the Lanczos algorithm is studied.The Lanczos algorithm usesKrylov information N(A) = [Ab, A2b, .

. .

, Akb] with a random vector b which isuniformly distributed over the unit sphere of Rn. The error is defined for a fixedmatrix A while taking the average with respect to the vectors b.To date only an upper bound on the error of the Lanczos algorithm with ran-domized Krylov information has been obtained.

This upper bound is proportionalto ((ln n)/k)2.As always, to obtain complexity results both the information and the algorithmmust be varied. Lower bounds are of particular interest.

The complexity of ap-proximating the largest eigenvalue in the randomized setting is open. (ii) P [92, 2.A] states “.

. .

the stubborn fact remains that restricting informationto Krylov information is not part . .

. of the eigenvalue problem.”Although we have mentioned several times in this paper that P [92] seems un-aware of the results regarding optimality of Krylov information we are particularlysurprised that he appears unaware of this result in the context of the large eigenvalueproblem.

P [92] repeatedly cites Kuczy´nski [86] where Chou’s result is reported. (iii) P [92, 2.E] states “the fact that b is treated as prescribed data is quite difficultto spot.” Perhaps the reason it is difficult to spot is that it is not prescribed.4This paper is mistakenly referred in P [92] as [Tr & Wo, 1990].

20J. F. TRAUB AND H. WO´ZNIAKOWSKIWhat is assumed known?

It is known a priori that A is a symmetric n × nmatrix. Furthermore, we are permitted to compute Az1, .

. .

, Azk, where zi may beadaptively chosen. We are permitted to choose z1, which is called b, arbitrarily.

Inchoosing b we cannot assume that A is known, since the raison d’etre of methodsfor solving large eigenvalue problems is just that A need not be known.By the result quoted in (i), it is impossible to guarantee that we can find a vectorb such that an ε-approximation to the largest eigenvalue can be computed for allsymmetric n × n matrices with k < n.If Krylov information Ab, A2b, . .

. , Akb is used then the situation is even worse.Even for arbitrary k, i.e., even for k ≥n, an ε-approximation cannot be computed.Indeed, suppose we choose a vector b and a matrix A such that Ab = b. ThenKrylov information is reduced just to the vector b.

The largest eigenvalue cannotbe recovered (unless n = 1). Thus, for any vector b there are symmetric matricesA for which Krylov information will not work.Of course, one can choose b randomly, as was discussed above.

The averagebehavior with respect to vectors b is satisfactory for all symmetric matrices. Butthen one is settling for a weaker guarantee of solving the problem.P [92, 2.E] claims that for Krylov information “satisfactory starting vectors areeasy to obtain.”This remark seems to confuse the worst case and randomizedsettings.To get a satisfactory starting vector b in the worst case setting, thevector b must be chosen using some additional information about the matrix A. Ifsuch information is not available, it is impossible to guarantee satisfactory startingvectors.

On the other hand, in the randomized setting it is indeed easy to getsatisfactory starting vectors. (iv) P [92, 5.5] complains that Kuczy´nski [86] tests only tridiagonal matrices.There is no loss of generality in restricting the convergence tests of the Lanczosor gmr algorithms to tridiagonal matrices.

That was done in Kuczy´nski [86] tospeed up his tests.What is claimed in Kuczy´nski [86] for the pairs (T RI, b),T RI a tridiagonal matrix and b = e1 = [1, 0, . .

., 0]T, is also true for the pairs(QT T RI Q, QTb) for any orthogonal matrix Q. Obviously, the matrix QT T RI Qis not, in general, tridiagonal.The confusion between the worst case and randomized settings is also apparentwhen P [92] discusses numerical tests performed by Kuczy´nski [86] and by him.For the unspecified eigenvalue problem, Kuczy´nski [86] compares the gmr andLanczos algorithms in the worst case setting.

These two algorithms cost essentiallythe same per step, and the gmr algorithm never requires more steps than theLanczos algorithm. For some matrices, the gmr algorithm uses substantially fewersteps than the Lanczos algorithm.

That is why in the worst case setting the gmralgorithm is preferable.P [92] performed his numerical tests for the Lanczos algorithm with randomstarting vectors b. Thus, he uses a different setting.

It is meaningless to comparenumerical results in different settings.Finally, extensive numerical testing is also reported in KW [89] for approximatingthe largest eigenvalue by the Lanczos algorithm with randomized starting vectors.The Lanczos algorithm worked quite well for all matrices tested. The numericaltests reported by P [92] and KW [89] show the efficiency of the Lanczos algorithmin the randomized setting.

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY217. Refinements of IBCOur response to the criticism in P [92] does not mean that the current modelassumptions of IBC are the only ones possible.

On the contrary, we believe that insome circumstances these assumptions should be refined to improve the modellingof computational problems. We have mentioned the desirability of such refinementsin, e.g., TWW [88, Chapter 3, §2.3] and W [85, §9].

In this section we will verybriefly indicate some of the possible refinements and extensions of IBC, and indicatepartial progress. This is preparatory to responding to several comments in P [92].Refinements and extensions of IBC include the following:(1) We usually assume the real number model in a sequential model of com-putation where the cost of a combinatory operation is independent of theprecision of the operands or of the result.

Also of interest is a model wherethe cost of a combinatory operation depends on the precision (bit model)and/or on the particular operation. Parallel and distributed models of com-putation should also be studied.

For examples of work in these directionssee Boja´nczyk [84] who studies the approximate solution of linear systemsusing a variable precision parallel model of computation, and Kacewicz [90]who studies initial value problems for both sequential and parallel modelsof computation. (2) We usually assume that for every information operation L ∈Λ and forevery f ∈F the computation of L(f ) costs c, c > 0.

Also of interest isa model where the cost of an information operation depends on L, f, andprecision. For an example, see Kacewicz and Plaskota [90] who study linearproblems in a model where the cost of information operations varies withthe computed precision.

(3) Let S be a linear operator. Then we often assume that the set F is balancedand convex; TWW [88, Chapter 4, §5].

In particular, for functions spaces,we often assume that F is a Sobolev space of smoothness r with a uniformbound on ∥f (r)∥. It is of interest to study F which do not have such a nicestructure.P [92, 1] states “a handful of reservations about IBC have appeared in print.”These “reservations” turn out to concern refinements of IBC.

P [92] writes thatBabu˘ska [87] calls for realistic models. For example, Babu˘ska points out that forsome problems arising in practice the set F does not consist of smooth functionsbut rather of functions which are piecewise smooth with singularities at unknownpoints.

We agree that this is an important problem. A promising start has beenmade by Wasilkowski and Gao [89] on estimating a singularity of a piecewise smoothfunction in a probabilistic setting.Babu˘ska observes that the user may not know the class F or not know F exactly,and suggests the importance of algorithms which enjoy optimality properties for anumber of classes.

We agree that this is an important concern and a good directionfor future research.See W [85, §9.3] where this problem is called the “fat” Fproblem and where partial results are discussed. One attack on this problem is toaddress the domain membership problem defined in §4.

As indicated there, thiscan only be done with a stochastic assurance.P [92, 1] asserts that in a review of TWW [83], Shub [87] “gives a couple ofinstances of unnatural measures of cost.” (These words are from P [92], not fromShub [87].) Shub, in a generally favorable review (the reader may want to verify

22J. F. TRAUB AND H. WO´ZNIAKOWSKIthis), suggests circumstances when the cost of an information operation should vary.We concur.8.

SummaryP [92, 2] states five high level criticisms of IBC. We responded to them in thefollowing sections:CriticismResponseA1, 3, 5, 6B4, 5, 6C5D6E6There are additional criticisms, and in §§5 and 6 we responded to the ones whichseem most important.P [92, 1] states that “a handful of reservations about IBCT have appeared inprint.” He neglects mentioning the many favorable reviews.

He cites two examplesof reservations. We discussed the comments of Babu˘ska [87] and Shub [87] in §7.P [92] is based upon the following syllogism:(1) Major Premise: If two specific papers of IBC are misleading, then IBCis flawed.

(2) Minor Premise: Two specific papers of IBC regarding matrix computa-tions are misleading. (3) Conclusion:IBC is flawed.We have shown that his reasons for believing the minor premise are mistaken.ReferencesI.

Babu˘ska, Information-based numerical practice, J. Complexity (1987), 331–346.N.

S. Bakhvalov, On approximate calculation of integrals, Vestnik Moskov. Gos.

Univ. Ser.

Mat.Mekh. Astronom.

Fiz. Khim.

4 (1959), 3–18. (Russian), Numerical Methods for the Solution of Differential and Integral Equations and QuadratureFormulas , “Nauka,” Moscow, 1964, pp.

5–63. (Russian), On the optimality of linear methods for operator approximation in convex classes offunctions, Zh.

Vychisl. Mat.

Mat. Fiz.

11 (1971), 1014–1018; English transl., U.S.S.RComput. Math.

and Math. Phys.

11 (1971), 244–249. (Russian)L. Blum, M. Shub, and S. Smale, On a theory of computation and complexity over the real num-bers : NP-completeness, recursive functions and universal machines, Bull.

Amer. Math.Soc.

(N.S.) 21 (1989), 1–46.A.

Boja´nczyk, Complexity of solving linear systems in different models of computation, SIAM J.Comput. 21 (1984), 591–603.A.

W. Chou, On the optimality of Krylov information, J. Complexity 3 (1987), 26–40.F.

Gao and G. W. Wasilkowski, On detecting regularity of functions, work in progress, 1990.R. M. Garey and D. S. Johnson, Computers and intractability : A guide to the theory of NP-completeness, Freeman, New York, 1979.M.

Golomb and H. F. Weinberger, On Numerical Approximation (R. E. Langer, ed. ), Univ.

ofWisconsin Press, Madison, WI, 1959, pp. 117–190.B.

Z. Kacewicz, On sequential and parallel solution of initial value problems, J. Complexity 6(1990), 136–148.B.

Z. Kacewicz and L. Plaskota, On the minimal cost of approximating linear problems based oninformation with deterministic noise, Numer. Funct.

Anal. Optim.

(1990) (to appear).B. Z. Kacewicz and G. W. Wasilkowski, How powerful is continuous nonlinear information forlinear problems ?, J.

Complexity 2 (1986), 306–316.

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY23J. Kiefer, Sequential minimax search for a maximum, Proc.

Amer. Math.

Soc. 4 (1953), 502–505.J.

Kuczy´nski, On the optimal solution of large eigenpair problems, J. Complexity 2 (1986), 131–162.J.

Kuczy´nski and H. Wo´zniakowski, Estimating the largest eigenvalue by the power and Lanczosalgorithms with a random start, Report, Dept. of Computer Science, Columbia University,1989 (to appear in SIMAX).P.

Math´e, s-numbers in information-based complexity, J. Complexity 6 (1990), 41–66.C.

McMullen, Families of rational maps and iterative root-finding algorithms, Ph.D. thesis, Har-vard University, Cambridge, MA, 1985.A. S. Nemirovsky, On optimality of Krylov’s information when solving linear operator equations,J.

Complexity 7 (1991), 121–130.A. S. Nemirovsky and D. B. Yudin, Problem complexity and method efficiency in optimization,Wiley-Interscience, New York, 1983.S.

M. Nikolski˘ı, On the problem of approximation estimate by quadrature formulas, Uspekhi. Mat.Nauk 5 (1950), 165–177.

(Russian)E. Novak, Deterministic and stochastic error bounds in numerical analysis, Lectures Notes inMath., vol. 1349, Springer-Verlag, Berlin, 1988.E.

W. Packel and J. F. Traub, Information-based complexity, Nature 328 (1987), 29–33.E. W. Packel and H. Wo´zniakowski, Recent developments in information-based complexity, Bull.Amer.

Math. Soc.

(N.S.) 17 (1987), 9–36.B.

N. Parlett, The symmetric eigenvalue problem, Prentice-Hall, Englewood Cliffs, NJ, 1980., Some basic information on information-based complexity theory, Bull. Amer.

Math. Soc.(N.S.) 26 (1992), 3–27.M.

O. Rabin, Complexity of Computer Computations (R. E. Miller and J. W. Thatcher, eds. ),Plenum Press, New York , 1972, pp.

11–20.K. F. Roth, On irregularities of distribution, Mathematika 1 (1954), 73–79., On irregularities of distribution.

IV, Acta Arith. 37 (1980), 67–75.A.

Sard, Best approximate integration formulas ; best approximation formulas, Amer. J. Math.71 (1949), 80–91.M.

Shub, Review of “Information, uncertainty, complexity ” by J. F. Traub, G. W. Wasilkowski,and H. Wo´zniakowski (Addison-Wesley, Reading, MA, 1983), SIAM Re. 29 (1987), 495–496.M.

Shub and S. Smale, On the existence of generally convergent algorithms, J. Complexity 2(1986), 2–11.I.

J. Schoenberg, Spline interpolation and best quadrature formulas, Bull. Amer.

Math. Soc.

70(1964), 143–148.E. Stiefel, Kernel polynomials in linear algebra and their numerical applications, NBS Appl.

Math.43 (1958), 1–22.J. F. Traub, G. W. Wasilkowski, and H. Wo´zniakowski, Information, uncertainty, complexity,Addison-Wesley, Reading, MA, 1983., Information-based complexity, Academic Press, New York, 1988.J.

F. Traub and H. Wo´zniakowski, A general theory of optimal algorithms, Academic Press, NewYork, 1980., On the optimal solution of large linear systems, J. Assoc. Comput.

Mach. 31 (1984),545–559., Information-based complexity : New questions for mathematicians, Math.

Intelligencer13 (1981), 34–43.G. W. Wasilkowski and F. Gao, On the power of adaptive information for functions with singu-larities, Math.

Comp. 58 (1992), pp.

285–304.A. G. Werschulz, The computational complexity of differential and integral equations, OxfordUniv.

Press, Oxford, 1991.H. Wo´zniakowski, A survey of information based-complexity, J.

Complexity 1 (1985), 11–44., Average complexity for linear operators over bounded domains, J. Complexity 3 (1987),57–80., Average case complexity of multivariate integration, Bull.

Amer. Math.

Soc. (N.S.) 24(1991), 185–194.(J.

F. Traub) Department of Computer Science, Columbia University, New York,

24J. F. TRAUB AND H. WO´ZNIAKOWSKINew York 10027(H. Wo´zniakowski) Department of Computer Science, Columbia University and In-stitute of Informatics, University of Warsaw, Warsaw, Poland


출처: arXiv:9201.269원문 보기

Subscribe to koineu.com

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe