APPEARED IN BULLETIN OF THE
wavelet transform은 주파수 공간에서 Fourier Transform과 다르게 시간-공간에서 signal을 표현하는 방법이다. wavelet transform의 기본적인 개념은 dilatation(translation)과 dilation( compression)으로 이루어진다.
스트랭 교수는 Haar basis를 소개하여 wavelet transform의 기본적인 예를 제공한다. Haar basis는 orthogonal basis로 piecewise constant function의 linear combination으로 표현할 수 있다. 이 wavelet transform을 계산하는 데 시간복잡도는 O(n log n)이다.
또한, Fast Fourier Transform(FFT)를 wavelet transform에 적용하여 계산시간을 further 줄일 수 있다고 주장한다. wavelet matrix factorization을 통해 FFT를 하기보다 wavelet transform를 계산하는 것이 더 빠르다고 한다.
논문에서는 wavelet transform의 개념과 application, 특히 signal processing에서 wavelet transform의 이점을 강조한다.
여기서 중요한 점은 wavelet transform이 Fourier Transform와 달리 시간-공간에 표현되는 signal을 효과적으로 처리할 수 있다는 것이다. wavelet transform은 특히 local basis로 signal을 표현하기 때문에, signal processing에서 유용하게 사용될 수 있다고 주장한다.
APPEARED IN BULLETIN OF THE
arXiv:math/9304214v1 [math.NA] 1 Apr 1993APPEARED IN BULLETIN OF THEAMERICAN MATHEMATICAL SOCIETYVolume 28, Number 2, April 1993, Pages 288-305WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMSGilbert StrangAbstract. This note is a very basic introduction to wavelets.It starts with anorthogonal basis of piecewise constant functions, constructed by dilation and trans-lation.
The “wavelet transform” maps each f(x) to its coefficients with respect tothis basis.The mathematics is simple and the transform is fast (faster than theFast Fourier Transform, which we briefly explain), but approximation by piecewiseconstants is poor. To improve this first wavelet, we are led to dilation equations andtheir unusual solutions.
Higher-order wavelets are constructed, and it is surprisinglyquick to compute with them — always indirectly and recursively.We comment informally on the contest between these transforms in signal process-ing, especially for video and image compression (including high-definition television).So far the Fourier Transform — or its 8 by 8 windowed version, the Discrete CosineTransform — is often chosen. But wavelets are already competitive, and they areahead for fingerprints.
We present a sample of this developing theory.1. The Haar waveletTo explain wavelets we start with an example.
It has every property we hopefor, except one.If that one defect is accepted, the construction is simple andthe computations are fast. By trying to remove the defect, we are led to dilationequations and recursively defined functions and a small world of fascinating newproblems — many still unsolved.A sensible person would stop after the firstwavelet, but fortunately mathematics goes on.The basic example is easier to draw than to describe:Figure 1.
Scaling function φ(x), wavelet W(x), and the next level ofdetail.1991 Mathematics Subject Classification. Primary 42A06, 41A05, 65D05.Key words and phrases.
Wavelets, Fourier transform, dilation, orthogonal basis.I am grateful to the National Science Foundation (DMS 90-06220) for their supportReceived by the editors March 20, 1992 and, in revised form, November 30, 1992c⃝1993 American Mathematical Society0273-0979/93 $1.00 + $.25 per page1
2GILBERT STRANGAlready you see the two essential operations: translation and dilation. The stepfrom W(2x) to W(2x−1) is translation.
The step from W(x) to W(2x) is dilation.Starting from a single function, the graphs are shifted and compressed. The nextlevel contains W(4x), W(4x −1), W(4x −2), W(4x −3).
Each is supported on aninterval of length 14. In the end we have Haar’s infinite family of functions:Wjk(x) = W(2jx −k)(together with φ(x)).When the range of indices is j ≥0 and 0 ≤k < 2j, these functions form aremarkable basis for L2[0, 1].
We extend it below to a basis for L2(R).The four functions in Figure 1 are piecewise constant. Every function that isconstant on each quarter-interval is a combination of these four.
Moreover, theinner productRφ(x) W(x) dx is zero — and so are the other inner products. Thisproperty extends to all j and k: The translations and dilations of W are mutu-ally orthogonal.
We accept this as the definition of a wavelet, although variationsare definitely useful in practice. The goal looks easy enough, but the example isdeceptively simple.This orthogonal Haar basis is not a recent invention [1].
It is reminiscent of theWalsh basis in [2] — but the difference is important.† For Walsh and Hadamard,the last two basis functions are changed to W(2x)±W(2x−1). All of their “binarysinusoids” are supported on the whole interval 0 ≤x ≤1.
This global support isthe one drawback to sines and cosines; otherwise, Fourier is virtually unbeatable.To represent a local function, vanishing outside a short interval of space or time, aglobal basis requires extreme cancellation. Reasonable accuracy needs many termsof the Fourier series.
Wavelets give a local basis.You see the consequences. If the signal f(x) disappears after x =14, only aquarter of the later basis functions are involved.
The wavelet expansion directlyreflects the properties of f in physical space, while the Fourier expansion is perfectin frequency space. Earlier attempts at a “windowed Fourier transform” were adhoc — wavelets are a systematic construction of a local basis.The great value of orthogonality is to make expansion coefficients easy to com-pute.
Suppose the values of f(x), constant on four quarter-intervals, are 9, 1, 2, 0.Its Haar wavelet expansion expresses this vector y as a combination of the basisfunctions:9120= 31111+ 211−1−1+ 41−100+001−1.The wavelet coefficients bjk are 3, 2, 4, 1; they form the wavelet transform of f.The connection between the vectors y and b is the matrix W4, in whose orthogonalcolumns you recognize the graphs of Figure 1:y = W4bis9120=111011−101−1011−10−13241.†Rademacher was first to propose an orthogonal family of ±1 functions; it was not complete.After Walsh constructed a complete set, Rademacher’s Part II was regrettably unpublished andseems to be lost (but Schur saw it).
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS3This is exactly comparable to the Discrete Fourier Transform, in which f(x) =P ak eikx stops after four terms. Now the vector y contains the values of f at fourpoints:y = F4aisf(0π/2)f(1π/2)f(2π/2)f(3π/2)=11111ii2i31i2i4i61i3i6i9a0a1a2a3.This Fourier matrix also has orthogonal columns.
The n by n matrix Fn followsthe same pattern, with ω = e2πi/n in place of i = e2πi/4. Multiplied by 1/√n togive orthonormal columns, it is the most important of all unitary matrices.
Thewavelet matrix sometimes offers modest competition.To invert a real orthogonal matrix we transpose it. To invert a unitary matrix,transpose its complex conjugate.
After accounting for the factors that enter whencolumns are not unit vectors, the inverse matrices areW −14= 14111111−1−12−200002−2andF −14= 1411111(−i)(−i)2(−i)31(−i)2(−i)4(−i)61(−i)3(−i)6(−i)9.The essential point is that the inverse matrices have the same form as the originals.If we can transform quickly, we can invert quickly — between coefficients andfunction values. The Fourier coefficients come from values at n points.
The Haarcoefficients come from values on n subintervals.2. Fast Fourier Transform and Fast Wavelet TransformThe Fourier matrix is full — it has no zero entries.
Multiplication of Fn timesa vector a, done directly, requires n2 separate multiplications. We are evaluatingan n-term Fourier series at n points.
The series is Pn−10ak eikx, and the points arex = 2πj/n.The wavelet matrix is sparse — many of its entries are zero. Taken together, thethird and fourth columns of W fill a single column; the fifth, sixth, seventh, andeighth columns would fill one more column.
With n = 2ℓ, we fill only ℓ+1 columns.The total number of nonzero entries in Wn is n(ℓ+1). This already shows the effectof a more local basis.
Multiplication of Wn times a vector b, done directly, requiresonly n(log2 n + 1) separate multiplications.Both of these matrix multiplications can be made faster. For Fna, this is achievedby the Fast Fourier Transform — the most valuable numerical algorithm in ourlifetime.
It changes n2 to 12n log2 n by a reorganization of the steps — which issimply a factorization of the Fourier matrix. A typical calculation with n = 210changes (1024)(1024) multiplications to (5)(1024).
This saving by a factor greaterthan 200 is genuine. The result is that the FFT has revolutionized signal processing.Whole industries are changed from slow to fast by this one idea — which is puremathematics.The wavelet matrix Wn also allows a neat factorization into very sparse matrices.The operation count drops from O(n log n) all the way to O(n).
For our piecewiseconstant wavelet the only operations are add and subtract; in fact, W2 is the sameas F2. Both fast transforms have ℓ= log2 n steps, in the passage from n down to 1.
4GILBERT STRANGFor the FFT, each step requires 12n multiplications (as shown below). For the FastWavelet Transform, the cost of each successive step is cut in half.
It is a beautiful“pyramid scheme” created by Burt and Adelson and Mallat and others. The totalcost has a factor 1 + 12 + 14 + · · · that stays below 2.
This is why the final outcomefor the FWT is O(n) without the logarithm ℓ.The matrix factorizations are so simple, especially for n = 4, that it seemsworthwhile to display them. The FFT has two copies of the half-size transform F2in the middle:(1)F4 =111i1−11−i111i2111i21111.The permutation on the right puts the even a’s (a0 and a2) ahead of the odd a’s(a1 and a3).
Then come separate half-size transforms on the evens and odds. Thematrix at the left combines these two half-size outputs in a way that produces thecorrect full-size answer.
By multiplying those three matrices we recover F4.The factorization of W4 is a little different:(2)W4 =111−1111−11111111−111.At the next level of detail (for W8), the same 2 by 2 matrix appears four times inthe left factor. The permutation matrix puts columns 0, 2, 4, 6 of that factor aheadof 1, 3, 5, 7.
The third factor has W4 in one corner and I4 in the other corner (justas W4 above ends with W2 and I2 — this factorization is the matrix form of thepyramid algorithm). It is the identity matrices I4 and I2 that save multiplications.Altogether W2 appears 4 times at the left of W8, then 2 times at the left of W4, andthen once at the right.
The multiplication count from these n −1 small matrices isO(n) — the Holy Grail of complexity theory.Walsh would have another copy of the 2 by 2 matrix in the last corner, insteadof I2. Now the product has orthogonal columns with all entries ±1 — the Walshbasis.
Allowing W2 or I2, W4 or I4, W8 or I8, . .
. in the third factors, the matrixproducts exhibit a whole family of orthogonal bases.
This is a wavelet packet, withgreat flexibility. Then a “best basis” algorithm aims for a choice that concentratesmost of f into a few basis vectors.
That is the goal — to compress information.The same principle of factorization applies for any power of 2, say n = 1024. ForFourier, the entries of F are powers of ω = e2πi/1024.
The row and column indicesgo from 0 to 1023 instead of 1 to 1024. The zeroth row and column are filled withω0 = 1.
The entry in row j, column k of F is ωjk. This is the term eikx evaluatedat x = 2πj/1024.
The multiplication F1024a computes the series P ak ωjk for j = 0to 1023.The key to the matrix factorization is just this. Squaring the 1024th root ofunity gives the 512th root: (ω2)512 = 1.
This was the reason behind the middlefactor in (1), where i is the fourth root and i2 is the square root. It is the essentiallink between F1024 and F512.
The first stage of the FFT is the great factorization
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS5rediscovered by Cooley and Tukey (and described in 1805 by Gauss):(3)F1024 =I512D512I512−D512 F512F512 even-oddshuffle.I512 is the identity matrix. D512 is the diagonal matrix with entries (1, ω, .
. ., ω511),requiring about 512 multiplications.
The two copies of F512 in the middle give amatrix only half full compared to F1024 — here is the crucial saving. The shuffleseparates the incoming vector a into (a0, a2, .
. .
, a1022) with even indices and theodd part (a1, a3, . .
. , a1023).Equation (3) is an imitation of equation (1), eight levels higher.
Both are easilyverified. Computational linear algebra has become a world of matrix factorizations,and this one is outstanding.You have anticipated what comes next.
Each F512 is reduced in the same wayto two half-size transforms F = F256. The work is cut in half again, except for anadditional 512 multiplications from the diagonal matrices D = D256:(4)F512F512=IDI −DIDI −DFFFFeven-odd gives0 and 2 mod 4even-odd gives1 and 3 mod 4.For n = 1024 there are ℓ= 10 levels, and each level has 12n = 512 multiplicationsfrom the first factor — to reassemble the half-size outputs from the level below.Those D’s yield the final count 12nℓ.In practice, ℓ= log2 n is controlled by splitting the signal into smaller blocks.With n = 8, the scale length of the transform is closer to the scale length ofmost images.This is the short time Fourier transform, which is the transformof a “windowed” function wf.
The multiplier w is the characteristic function ofthe window. (Smoothing is necessary!
Otherwise this blocking of the image canbe visually unacceptable.The ridges of fingerprints are broken up very badly,and windowing was unsuccessful in tests by the FBI.) In other applications theimplementation may favor the FFT — theoretical complexity is rarely the wholestory.A more gradual exposition of the Fourier matrix and the FFT is in the mono-graphs [3, 4] and the textbooks [5, 6] — and in many other sources [see 7].
(In thelower level text [8], it is intended more for reference than for teaching. On the otherhand, this is just a matrix–vector multiplication!) FFT codes are freely availableon netlib, and generally each machine has its own special software.For higher-order wavelets, the FWT still involves many copies of a single smallmatrix.
The entries of this matrix are coefficients ck from the “dilation equation”.We move from fast algorithms to a quite different part of mathematics — with thegoal of constructing new orthogonal bases. The basis functions are unusual, for agood reason.3.
Wavelets by multiresolution analysisThe defect in piecewise constant wavelets is that they are very poor at approx-imation. Representing a smooth function requires many pieces.
For wavelets thismeans many levels — the number 2j must be large for an acceptable accuracy. It
6GILBERT STRANGis similar to the rectangle rule for integration, or Euler’s method for a differentialequation, or forward differences ∆y/∆x as estimates of dy/dx. Each is a simple andnatural first approach, but inadequate in the end.
Through all of scientific comput-ing runs this common theme: Increase the accuracy at least to second order. Whatthis means is: Get the linear term right.For integration, we move to the trapezoidal rule and midpoint rule.
For deriva-tives, second-order accuracy comes with centered differences. The whole point ofNewton’s method for solving f(x) = 0 is to follow the tangent line.
All these areexact when f is linear. For wavelets to be accurate, W(x) and φ(x) need the sameimprovement.
Every ax + b must be a linear combination of translates.Piecewise polynomials (splines and finite elements) are often based on the “hat”function — the integral of Haar’s W(x). But this piecewise linear function does notproduce orthogonal wavelets with a local basis.
The requirement of orthogonality todilations conflicts strongly with the demand for compact support — so much so thatit was originally doubted whether one function could satisfy both requirements andstill produce ax + b. It was the achievement of Ingrid Daubechies [9] to constructsuch a function.We now outline the construction of wavelets.
The reader will understand thatwe only touch on parts of the theory and on selected applications. An excellentaccount of the history is in [10].
Meyer and Lemari´e describe the earliest wavelets(including Gabor’s). Then comes the beautiful pattern of multiresolution analysisuncovered by Mallat — which is hidden by the simplicity of the Haar basis.
Mallat’sanalysis found expression in the Daubechies wavelets.Begin on the interval [0, 1]. The space V0 spanned by φ(x) is orthogonal to thespace W0 spanned by W(x).
Their sum V1 = V0 ⊕W0 consists of all piecewiseconstant functions on half-intervals. A different basis for V1 is φ(2x) = 12(φ(x) +W(x)) and φ(2x −1) =12(φ(x) −W(x)).
Notice especially that V0 ⊂V1. Thefunction φ(x) is a combination of φ(2x) and φ(2x−1).
This is the dilation equation,for Haar’s example.Now extend that pattern to the spaces Vj and Wj of dimension 2j:Vj = span of the translates φ(2jx −k) for fixed j,Wj = span of the wavelets W(2jx −k) for fixed j.The next space V2 is spanned by φ(4x), φ(4x −1), φ(4x −2), φ(4x −3). It containsall piecewise constant functions on quarter-intervals.
That space was also spannedby the four functions φ(x), W(x), W(2x), W(2x −1) at the start of this paper.Therefore, V2 decomposes into V1 and W1 just as V1 decomposes into V0 and W0:(5)V2 = V1 ⊕W1 = V0 ⊕W0 ⊕W1.At every level, the wavelet space Wj is the “difference” between Vj+1 and Vj:(6)Vj+1 = Vj ⊕Wj = V0 ⊕W0 ⊕· · · ⊕Wj.The translates of wavelets on the right are also translates of scaling functions onthe left. For the construction of wavelets, this offers a totally different approach.Instead of creating W(x) and the spaces Wj, we can create φ(x) and the spacesVj.
It is a choice between the terms Wj of an infinite series or their partial sums
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS7Vj. Historically the constructions began with W(x).
Today the constructions beginwith φ(x). It has proved easier to work with sums than differences.A first step is to change from [0, 1] to the whole line R. The translation index kis unrestricted.
The subspaces Vj and Wj are infinite-dimensional (L2 closures oftranslates). One basis for L2(R) consists of φ(x −k) and Wj k(x) = W(2jx −k)with j ≥0, k ∈Z.
Another basis contains all Wj k with j, k ∈Z. Then the dilationindex j is also unrestricted — for j = −1 the functions φ(2−1x −k) are constanton intervals of length 2.
The decomposition into Vj ⊕Wj continues to hold! Thesequence of closed subspaces Vj has the following basic properties for −∞< j < ∞:Vj ⊂Vj+1 and\Vj = {0} and[Vj is dense in L2(R);f(x) is in Vj if and only if f(2x) is in Vj+1;V0 has an orthogonal basis of translates φ(x −k), k ∈Z.These properties yield a “multiresolution analysis” — the pattern that other waveletswill follow.
Vj will be spanned by φ(2jx −k). Wj will be its orthogonal comple-ment in Vj+1.
Mallat proved, under mild hypotheses, that Wj is also spanned bytranslates [11]; these are the wavelets.Dilation is built into multiresolution analysis by the property that f(x) ∈Vj ⇔f(2x) ∈Vj+1. This applies in particular to φ(x).
It must be a combination oftranslates of φ(2x). That is the hidden pattern, which has become central to thissubject.
We have reached the dilation equation.4. The dilation equationIn the words of [10], “la perspective est compl`etement chang´ee.” The constructionof wavelets now begins with the scaling function φ.The dilation equation (orrefinement equation or two-scale difference equation) connects φ(x) to translates ofφ(2x):(7)φ(x) =NXk=0ck φ(2x −k).The coefficients for Haar are c0 = c1 = 1.
The box function φ is the sum of twohalf-width boxes. That is equation (7).
Then W is a combination of the sametranslates (because W0 ⊂V1). The coefficients for W = φ(2x) −φ(2x −1) are 1and −1.
It is absolutely remarkable that W uses the same coefficients as φ, but inreverse order and with alternating signs:(8)W(x) =1X1−N(−1)k c1−k φ(2x −k).This construction makes W orthogonal to φ and its translates. (For those trans-lates to be orthogonal to each other, see below.
)The key is that every vectorc0, c1, c2, c3 is automatically orthogonal to c3, −c2, c1, −c0 and all even translateslike 0, 0, c3, −c2.When N is odd, c1−k can be replaced in (8) by cN−k. This shift by N −1 iseven.
Then the sum goes from 0 to N and W(x) looks especially attractive.
8GILBERT STRANGEverything hinges on the c’s. They dominate all that follows.
They determine(and are determined by) φ, they determine W, and they go into the matrix fac-torization (2). In the applications, convolution with φ is an averaging operator —it produces smooth functions (and a blurred picture).
Convolution with W is adifferencing operator, which picks out details.The convolution of the box with itself is the piecewise linear hat function —equal to 1 at x = 1 and supported on the interval [0, 2]. It satisfies the dilationequation with c0 =12, c1 = 1, c2 =12.But there is a condition on the c’s inorder that the wavelet basis W(2jx −k) shall be orthogonal.
The three coefficients12, 1, 12 do not satisfy that condition. Daubechies found the unique c0, c1, c2, c3 (fourcoefficients are necessary) to give orthogonality plus second-order approximation.Then the question becomes: How to solve the dilation equation?Note added in proof.
A new construction has just appeared that uses two scal-ing functions φi and wavelets Wi. Their translates are still orthogonal [38].
Thecombination φ1(x)+φ1(x−1)+φ2(x) is the hat function, so second-order accuracyis achieved. The remarkable property is that these are “short functions”: φ1 issupported on [0, 1] and φ2 on [0, 2].
They satisfy a matrix dilation equation.These short wavelets open new possibilities for application, since the greatestdifficulties are always at boundaries.The success of the finite element methodis largely based on the very local character of its basis functions.Splines havelonger support (and more smoothness), wavelets have even longer support (andorthogonality). The translates of a long basis function overrun the boundary.There are two principal methods to solve dilation equations.
One is by Fouriertransform, the other is by matrix products. Both give φ as a limit, not as an explicitfunction.
We never discover the exact value φ(√2). It is amazing to compute witha function we do not know — but the applications only require the c’s.
Whencomplicated functions come from a simple rule, we know from increasing experiencewhat to do: Stay with the simple rule.Solution of the dilation equation by Fourier transform. Without the “2”we would have an ordinary difference equation — entirely familiar.
The presenceof two scales, x and 2x, is the problem. A warning comes from Weierstrass and deRham and Takagi — their nowhere differentiable functions are all built on multiplescales like P an cos(bnx).
The Fourier transform easily handles translation by k inequation (7), but 2x in physical space becomes ξ/2 in frequency space:(9)ˆφ(ξ) = 12Xck eikξ/2 ˆφξ2= Pξ2ˆφξ2.The “symbol” is P(ξ) =12P ck eikξ.With ξ = 0 in (9) we find P(0) = 1 orP ck = 2 — the first requirement on the c’s. This allows us to look for a solutionnormalized by ˆφ(0) =Rφ(x) dx = 1.
It does not ensure that we find a φ that iscontinuous or even in L1. What we do find is an infinite product, by recursion fromξ/2 to ξ/4 and onward:ˆφ(ξ) = Pξ2ˆφξ2= Pξ2Pξ4ˆφξ4= · · · =∞Yj=1P ξ2j.This solution φ may be only a distribution.
Its smoothness becomes clearer bymatrix methods.
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS9Solution by matrix products [12, 13]. When φ is known at the integers, thedilation equation gives φ at half-integers such as x = 32.
Since 2x −k is an integer,we just evaluate P ckφ(2x −k). Then the equation gives φ at quarter-integers ascombinations of φ at half-integers.
The combinations are built into the entries oftwo matrices A and B, and the recursion is taking their products.To start we need φ at the integers. With N = 3, for example, set x = 1 andx = 2 in the dilation equation:(10)φ(1) = c1 φ(1) + c0 φ(2),φ(2) = c3 φ(1) + c2 φ(2).Impose the conditions c1 + c3 = 1 and c0 + c2 = 1.
Then the 2 by 2 matrix in (10),formed from these c’s, has λ = 1 as an eigenvalue. The eigenvector is (φ(1), φ(2)).It follows from (7) that φ will vanish outside 0 ≤x ≤N.To see the step from integers to half-integers in matrix form, convert the scalardilation equation to a first-order equation for the vector v(x):v(x) =φ(x)φ(x + 1)φ(x + 2),A =c000c2c1c00c3c2,B =c1c00c3c2c100c3.The equation turns out to be v(x) = Av(2x) for 0 ≤x ≤12 and v(x) = Bv(2x −1)for 12 ≤x ≤1.
By recursion this yields v at any dyadic point — whose binaryexpansion is finite. Each 0 or 1 in the expansion decides between A and B. Forexample(11)v(.01001) = (ABAAB)v(0).Important: The matrix B has entries c2i−j.
So does A, when the indexing startswith i = j = 0. The dilation equation itself is φ = Cφ, with an operator C of thisnew kind.
Without the 2 it would be a Toeplitz operator, constant along each diag-onal, but now every other row is removed. Engineers call it “convolution followedby decimation”.
(The word downsampling is also used — possibly a euphemismfor decimation.) Note that the derivative of the dilation equation is φ′ = 2Cφ′.Successive derivatives introduce powers of 2.
The eigenvalues of these operators Care 1, 12, 14, . .
. , until φ(n) is not defined in the space at hand.
The sum conditionP ceven = P codd = 1 is always imposed — it assures in Condition A1 below thatwe have first-order approximation at least.When x is not a dyadic point p/2n, the recursion in (11) does not termi-nate. The binary expansion x = .0100101 .
. .
corresponds to an infinite productABAABAB . .
. .
The convergence of such a product is by no means assured. It isa major problem to find a direct test on the c’s that is equivalent to convergence —for matrix products in every order.
We briefly describe what is known for arbitraryA and B.For a single matrix A, the growth of the powers An is governed by the spectralradius ρ(A) = max |λi|. Any norm of An is roughly the nth power of this largesteigenvalue.
Taking nth roots makes this precise:limn→∞∥An∥1/n = ρ(A).
10GILBERT STRANGThe powers approach zero if and only if ρ(A) < 1.For two or more matrices, the same process produces the joint spectral radius[14]. The powers An are replaced by products Πn of n A’s and B’s.
The maximumof ∥Πn∥, allowing products in all orders, is still submultiplicative. The limit of nthroots (also the infimum) is the joint spectral radius:(12)limn→∞(max ∥Πn∥)1/n = ρ(A, B).The difficulty is not to define ρ(A, B) but to compute it.
For symmetric or normal orcommuting or upper triangular matrices it is the larger of ρ(A) and ρ(B). Otherwiseeigenvalues of products are not controlled by products of eigenvalues.
An examplewith zero eigenvalues, ρ(A) = 0 = ρ(B), isA =0200,B =0020,AB =4000.In this case ρ(A, B) = ∥AB∥1/2 = 2. The product ABABAB .
. .
diverges. Ingeneral ρ is a function of the matrix entries, bounded above by norms and belowby eigenvalues.
Since one possible infinite product is a repetition of any particularΠn (in the example it was AB), the spectral radius of that single matrix gives alower bound on the joint radius:(ρ(Πn))1/n ≤ρ(A, B).A beautiful theorem of Berger and Wang [15] asserts that these eigenvalues ofproducts yield the same limit (now a supremum) that was approached by norms:(13)lim supn→∞(max ρ(Πn))1/n = ρ(A, B).It is conjectured by Lagarias and Wang that equality is reached at a finite productΠn. Heil and the author noticed a corollary of the Berger-Wang theorem: ρ is acontinuous function of A and B.
It is upper-semicontinuous from (12) and lower-semicontinuous from (13).Returning to the dilation equation, the matrices A and B share the left eigen-vector (1, 1, 1). On the complementary subspace, they reduce toA′ =c00−c31 −c0 −c3andB′ =1 −c0 −c3−c00c3.It is ρ(A′, B′) that decides the size of φ(x)−φ(y).
Continuity follows from ρ < 1 [16].Then φ and W belong to Cα for all α less than −log2 ρ. (When α > 1, derivativesof integer order [α] have H¨older exponent α −[α].) In Sobolev spaces Hs, Eirolaand Villemoes [17, 18] showed how an ordinary spectral radius — computable —gives the exact regularity s.5.
Accuracy and orthogonalityFor the Daubechies coefficients, the dilation equation does produce a continuousφ(x) with H¨older exponent 0.55 (it is differentiable almost everywhere). Then (8)
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS11Figure 2. The family W4(2jx −k) is orthogonal.
Translates of D4 canreproduce any ax + b. Daubechies also found D2p with orthogonalityand pth order accuracy.constructs the wavelet. Figure 2 shows φ and W with c0, c1, c2, c3 = 14(1 +√3),14(3 +√3), 14(3 −√3), 14(1 −√3).What is special about the four Daubechies coefficients?
They satisfy the require-ment A2 for second-order accuracy and the separate requirement O for orthogonal-ity. We can state Condition A2 in several forms.
In terms of W, the momentsRW(x) dx andRx W(x) dx are zero.Then the Fourier transform of (8) yieldsP(π) = P ′(π) = 0. In terms of the c’s (or the symbol P(ξ) = 12P ck eikξ), thecondition for accuracy of order p is Ap:(14)X(−1)k km ck = 0 for m < por equivalentlyP(ξ + π) = O(|ξ|p).This assures that translates of φ reproduce (locally) the powers 1, x, .
. .
, xp−1 [19].The zero moments are the orthogonality of these powers to W. Then the Taylorseries of f(x) can be matched to degree p at each meshpoint. The error in waveletapproximation is of order hp, where h = 2−j is the mesh width or translation stepof the local functions W(2jx).
The price for each extra order of accuracy is twoextra coefficients ck — which spreads the support of φ and W by two intervals. Areasonable compromise is p = 3.
The new short wavelets may offer an alternative.Condition Ap also produces zeros in the infinite product ˆφ(ξ) = Π P(ξ/2j).Every nonzero integer has the form n = 2j−1m, m odd. Then ˆφ(2πn) has the
12GILBERT STRANGfactor P(2πn/2j) = P(mπ) = P(π). Therefore, the pth order zero at ξ = π inCondition Ap ensures a pth order zero of ˆφ at each ξ = 2πn.
This is the test for thetranslates of φ to reproduce 1, x, . .
. , xp−1.
That step closes the circle and meansapproximation to order p. Please forgive this brief recapitulation of an older theory— the novelty of wavelets is their orthogonality. This is tested by Condition O:(15)Xck ck−2m = 2 δ0mor equivalently|P(ξ)|2 + |P(ξ + π)|2 ≡1.The first condition follows directly from (φ(x), φ(x −m)) = δ0m.The dilationequation converts this to (P ck φ(2x −k), P cℓφ(2x −2m −ℓ)) = δ0m.
It is the“perfect reconstruction condition” of digital signal processing [20–22]. It assuresthat the L2 norm is preserved, when the signal f(x) is separated by a low-pass filterL and a high-pass filter H. The two parts have ∥Lf ∥2 + ∥Hf ∥2=∥f ∥2.
A filteris just a convolution. In frequency space that makes it a multiplication.
Low-passmeans that constants and low frequencies survive — we multiply by a symbol P(ξ)that is near 1 for small |ξ|. High-pass means the opposite, and for wavelets themultiplier is essentially P(ξ + π).
The two convolutions are “mirror filters”.In the discrete case, the filters L and H (with downsampling to remove everysecond row) fit into an orthogonal matrix:(16)LH=1√2c0c1c2c3c0c1c2c3······c3−c2c1−c0c3−c2c1−c0.This matrix enters each step of the wavelet transform, from vector y to waveletcoefficients b. The pyramid algorithm executes that transform by recursion withrescaling.
We display two steps for a general wavelet and then specifically for Haaron [0, 1]:(17)LHILHis1√2111−1√2√21√211111−11−1.This product is still an orthogonal matrix. When the columns of W4 in §1 arenormalized to be unit vectors, this is its inverse (and its transpose).
The recursiondecomposes a function into wavelets, and the reverse algorithm reconstructs it. The2 by 2 matrix has low-pass coefficients 1, 1 from φ and high-pass coefficients 1, −1from W. Normalized by 12, they satisfy Condition O (note eiπ = −1), and theypreserve the ℓ2 norm:1 + eiξ22+1 + ei(ξ+π)22≡1.Figure 3 shows how those terms |P(ξ)|2 and |P(ξ + π)|2 are mirror functions thatadd to 1.
It also shows how four coefficients give a flatter response — with higheraccuracy at ξ = 0. Then |P|2 has a fourth-order zero at ξ = π.
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS13Figure 3. Condition O for Haar (p = 1) and Daubechies (p = 2).The design of filters (the choice of convolution) is a central problem of signalprocessing — a field of enormous size and importance.
The natural formulation isin frequency space. Its application here is to multirate filters and “subband coding”,with a sequence of scales 2jx.Note.
Orthogonality of the family φ(x−k) leads by the Poisson summation formulato P |ˆφ(ξ + 2πn)|2 = 1. Applying the dilation equation (7) and separating even nfrom odd n shows how the second form of Condition O is connected to orthogonality:X|ˆφ(ξ + 2πn)|2=X Pξ2 + πn2 ˆφξ2 + πn2=Pξ22 X ˆφξ2 + π2m2+Pξ2 + π2 X ˆφξ2 + π(2m + 1)2=Pξ22+Pξ2 + π2(= 1 by Condition O).The same ideas apply to W. For dilation by 3j or M j instead of 2j, Heller hasconstructed [23] the two wavelets or M −1 wavelets that yield approximation oforder p. The orthogonality condition becomes PM−10|P(ξ + 2πj/M)|2 = 1.We note a technical hypothesis that must be added to Condition O.
It wasfound by Cohen and in a new form by Lawton (see [24, pp. 177–194]).
Withoutit, c0 = c3 = 1 passes test O. Those coefficients give a stretched box functionφ = 13χ[0,3] that is not orthogonal to φ(x−1).
The matrix with L and H above willbe only an isometry — it has columns of zeros. The filters satisfy LL∗= HH∗= Iand LH∗= HL∗= 0 but not L∗L + H∗H = I.
The extra hypothesis is applied tothis matrix A, or after Fourier transform to the operator A:Aij =NX0ckcj−2i+korAf(ξ) =Pξ22fξ2+Pξ2 + π2fξ2 + π.The matrix A with |i| < N and |j| < N has two eigenvectors for λ = 1. Theircomponents are vm = δ0m and wm = (φ(x), φ(x −m)).
Those must be the same!Then the extra condition, added to O, is that λ = 1 shall be a simple eigenvalue.
14GILBERT STRANGIn summary, Daubechies used the minimum number 2p of coefficients ck to satisfythe accuracy condition Ap together with orthogonality.These wavelets furnishunconditional bases for the key spaces of harmonic analysis (Lp, H¨older, Besov,Hardy space H1, BMO, . .
. ).
The Haar-Walsh construction fits functions with noextra smoothness [25]. Higher-order wavelets fit Sobolev spaces, where functionshave derivatives in Lp (see [11, pp.
24–27]). With marginal exponent p = 1 or evenp < 1, the wavelet transform still maps onto the right discrete spaces.6.
The contest: Fourier vs. waveletsThis brief report is included to give some idea of the decisions now being reachedabout standards for video compression. The reader will understand that the prac-tical and financial consequences are very great.
Starting from an image in whicheach color at each small square (pixel) is assigned a numerical shading between 0and 255, the goal is to compress all that data to reduce the transmission cost. Since256 = 28, we have 8 bits for each of red-green-blue.
The bit-rate of transmissionis set by the channel capacity, the compression rule is decided by the filters andquantizers, and the picture quality is subjective. Standard images are so familiarthat experts know what to look for — like tasting wine or tea.Think of the problem mathematically.
We are given f(x, y, t), with x-y axeson the TV screen and the image f changing with time t. For digital signals allvariables are discrete, but a continuous function is close — or piecewise continuouswhen the image has edges. Probably f changes gradually as the camera moves.
Wecould treat f as a sequence of still images to compress independently, which seemsinefficient. But the direction of movement is unpredictable, and too much effortspent on extrapolation is also inefficient.
A compromise is to encode every fifth ortenth image and, between those, to work with the time differences ∆f — whichhave less information and can be compressed further.Fourier methods generally use real transforms (cosines). The picture is brokeninto blocks, often 8 by 8.
This improvement in the scale length is more importantthan the control of log n in the FFT cost. (It may well be more important thanthe choice of Fourier.) After twenty years of refinement, the algorithms are stillbeing fought over and improved.
Wavelets are a recent entry, not yet among theheavyweights. The accuracy test Ap is often set aside in the goal of constructing“brick wall filters” — whose symbols P(ξ) are near to characteristic functions.An exact zero-one function in Figure 3 is of course impossible — the designers arefrustrated by a small theorem in mathematics.
(Compact support of f and ˆf occursonly for f ≡0.) In any case the Fourier transform of a step function has oscillationsthat can murder a pleasing signal — so a compromise is reached.Orthogonality is not set aside.
It is the key constraint. There may be eight ormore bands (8 times 8 in two dimensions) instead of two.
Condition O has at leasteight terms |P(ξ + kπ/8)|2. After applying the convolutions, the energy or entropyin the high frequencies is usually small and the compression of that part of thesignal is increased — to avoid wasting bits.
The actual encoding or “quantization”is a separate and very subtle problem, mapping the real numbers to {1, . .
. , N}.A vector quantizer is a map from Rd, and the best are not just tensor products[28].
Its construction is probably more important to a successful compression thanrefining the filter.Audio signals have fewer dimensions and more bands — as many as 512. One
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS15goal of compression is a smaller CD disk. Auditory information seems to comein octaves of roughly equal energy — the energy density decays like 1/ξ.Alsophysically, the cochlea has several critical bands per octave.
(An active problemin audio compression is to use psychoacoustic information about the ear.) SinceRdξ/ξ is the same from 1 to 2 and 2 to 4 and 4 to 8 (by a theorem we teachfreshmen!
), subband coding stands a good chance.That is a barely adequate description of a fascinating contest.It is appliedanalysis (and maybe free enterprise) at its best. For video compression, the MotionPicture Experts Group held a competition in Japan late in 1991.
About thirtycompanies entered algorithms. Most were based on cosine transforms, a few onwavelets.
The best were all windowed Fourier. Wavelets were down the list butnot unhappy.
Consolation was freely offered and accepted. The choice for HDTV,with high definition, may be different from this MPEG standard to send a rougherpicture at a lower bit-rate.I must emphasize: The real contest is far from over.
There are promising wavelets(Wilson bases and coiflets) that were too recent to enter. Hardware is only begin-ning to come—the first wavelet chips are available.
MPEG did not see the bestthat all transforms can do.In principle, wavelets are better for images, and Fourier is the right choice formusic. Images have sharp edges; music is sinusoidal.
The jth Fourier coefficient of astep function is of order 1/j. The wavelet coefficients (mostly zero) are multiples of2−j/2.
The L2 error drops exponentially, not polynomially, when N terms are kept.To confirm this comparison, Donoho took digitized photos of his statistics class.He discarded 95% of the wavelet and the Fourier coefficients, kept the largest 5%,and reconstructed two pictures. (The wavelets were “coiflets” [24], with greatersmoothness and symmetry but longer support.Fourier blocks were not tried.
)Every student preferred the picture from wavelets.The underlying rule for basis functions seems to be this: choose scale lengths thatmatch the image and allow for spatial variability. Smoothness is visually important,and D4 is being superseded.
Wavelets are not the only possible construction, butthey have opened the door to new bases. In the mathematical contest (perhapseventually in the business contest) unconditional bases are the winners.We close by mentioning fingerprints.The FBI has more than 30 million infiling cabinets, counting only criminals.
Comparing one to thousands of others isa daunting task. Every improvement leads to new matches and the solution of oldcrimes.
The images need to be digitized.The definitive information for matching fingerprints is in the “minutiae” of ridgeendings and bifurcations [29]. At 500 pixels per inch, with 256 levels of gray, eachcard has 107 bytes of data.
Compression is essential and 20 : 1 is the goal. Thestandard from the Joint Photographic Experts Group (JPEG) is Fourier-based,with 8 by 8 blocks, and the ridges are broken.
The competition is now betweenwavelet algorithms associated with Los Alamos and Yale [30–33] — fixed basisversus “best basis”, ℓ< 100 subbands or ℓ> 1000, vector or scalar quantization.There is also a choice of coding for wavelet coefficients (mostly near zero when thebasis is good). The best wavelets may be biorthogonal — coming from two waveletsW1 and W2.
This allows a left-right symmetry [24], which is absent in Figure 2.The fingerprint decision is a true contest in applying pure mathematics.
16GILBERT STRANGAcknowledgmentI thank Peter Heller for a long conversation about the MPEG contest and itsrules.Additional note. After completing this paper I learned, with pleasure and amaze-ment, that a thesis which I had promised to supervise (“formally”, in the mostinformal sense of that word) was to contain the filter design for MIT’s entry in theHDTV competition.
The Ph.D. candidate is Peter Monta. The competition is stillahead (in 1992).
Whether won or lost, I am sure the degree will be granted! Theseparagraphs briefly indicate how the standards for High Definition Television aimto yield a very sharp picture.The key is high resolution, which requires a higher bit-rate of transmission.
Forthe MPEG contest in Japan — to compress videos onto CD’s and computers —the rate was 1 megabit/second. For the HDTV contest that number is closer to 24.Both compression ratios are about 100 to 1.
(The better picture has more pixels. )The audio signal gets 12 megabit/sec for its four stereo channels; closed captions useless.
In contrast, conventional television has no compression at all — in principle,you see everything. The color standard was set in 1953, and the black and whitestandard about 1941.The FCC will judge between an AT&T/Zenith entry, two MIT/General Instru-ments entries, and a partly European entry from Philips and others.
These finalistsare all digital, an advance which surprised the New York Times. Monta proposeda filter that uses seven coefficients or “taps” for low-pass and four for high-pass.Thus the filters are not mirror images as in wavelets, or brick walls either.
Two-dimensional images come from tensor products of one-dimensional filters. Theirexact coefficients will not be set until the last minute, possibly for secrecy — andcosine transforms may still be chosen in the end.The red-green-blue components are converted by a 3 by 3 orthogonal matrixto better coordinates.
Linear algebra enters, literally the spectral theorem. Theluminance axis from the leading eigenvector gives the brightness.A critical step is motion estimation, to give a quick and close prediction ofsuccessive images.
A motion vector is estimated for each region in the image [34].The system transmits only the difference between predicted and actual images —the “motion compensated residual”. When that has too much energy, the motionestimator is disabled and the most recent image is sent.
This will be the case whenthere is a scene change. Note that coding decisions are based on the energy indifferent bands (the size of Fourier coefficients).
The L1 norm is probably better.Other features may be used in 2001.It is very impressive to see an HDTV image. The final verdict has just beenpromised for the spring of 1993.
Wavelets will not be in that standard, but they haveno shortage of potential applications [24, 35–37]. A recent one is the LANDSAT8 satellite, which will locate a grid on the earth with pixel width of 2 yards.
Thecompression algorithm that does win will use good mathematics.References1. A. Haar, Zur Theorie der orthogonalen Funktionensysteme, Math.
Ann. 69 (1910), 331–371.2.
J. L. Walsh, A closed set of normal orthogonal functions, Amer. J.
Math. 45 (1923), 5–24.
WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS173. R. E. Blahut, Fast algorithms for digital signal processing, Addison-Wesley, New York,1984.4.
C. Van Loan, Computational frameworks for the fast Fourier transform, SIAM, Philadel-phia, PA, 1992.5. G. Strang, Introduction to applied mathematics, Wellesley-Cambridge Press, Wellesley,MA, 1986.6.
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical recipes,Cambridge Univ. Press, Cambridge, 2nd ed., 1993.7.
P. Duhamel and M. Vetterli, Fast Fourier transforms : a tutorial review, Signal Processing19 (1990), 259–299.8. G. Strang, Introduction to linear algebra, Wellesley–Cambridge Press, Wellesley, MA, 1993.9.
I. Daubechies, Orthogonal bases of compactly supported wavelets, Comm. Pure Appl.
Math.41 (1988), 909–996.10. P. G. Lemari´e (ed.
), Les ondelettes en 1989, Lecture Notes in Math., vol. 1438, Springer-Verlag, New York, 1990.11.
S. Mallat, Multiresolution approximations and wavelet orthogonal bases of L2(R), Trans.Amer. Math.
Soc. 315 (1989), 69–88; A theory for multiresolution approximation: thewavelet representation, IEEE Trans.
PAMI 11 (1989), 674–693.12. I. Daubechies and J. Lagarias, Two–scale difference equations: I.
Existence and globalregularity of solutions, SIAM J. Math.
Anal. 22 (1991), 1388–1410; II.
Local regularity,infinite products of matrices and fractals, SIAM J. Math.
Anal. 23 (1992), 1031–1079.13., Sets of matrices all infinite products of which converge, Linear Algebra Appl.
161(1992), 227–263.14. G.-C. Rota and G. Strang, A note on the joint spectral radius, Kon.
Nederl. Akad.
Wet.Proc. A 63 (1960), 379–381.15.
M. Berger and Y. Wang, Bounded semigroups of matrices, Linear Algebra Appl. 166(1992), 21–28.16.
D. Colella and C. Heil, Characterizations of scaling functions, I. Continuous solutions,SIAM J. Matrix Anal.
Appl. 15 (1994) (to appear).17.
T. Eirola, Sobolev characterization of solutions of dilation equations, SIAM J. Math.
Anal.23 (1992), 1015–1030.18. L. F. Villemoes, Energy moments in time and frequency for two-scale difference equationsolutions and wavelets, SIAM J.
Math. Anal.
23 (1992), 1519–1543.19. G. Strang, Wavelets and dilation equations : a brief introduction, SIAM Review 31 (1989),614–627.20.
O. Rioul and M. Vetterli, Wavelets and signal processing, IEEE Signal Processing Mag. 8(1991), 14–38.21.
M. Vetterli and C. Herley, Wavelets and filter banks: theory and design, IEEE Trans.Acoust. Speech Signal Process.
40 (1992), 2207–2232.22. P. P. Vaidyanathan, Multirate digital filters, filterbanks, polyphase networks, and appli-cations : a tutorial, Proc.
IEEE 78 (1990), 56–93; Multirate systems and filter banks,Prentice-Hall, Englewood Cliffs, NJ, 1993.23. P. Heller, Regular M-band wavelets, SIAM J. Matrix Anal.
Appl. (to appear).24.
I. Daubechies, Ten lectures on wavelets, SIAM, Philadelphia, PA, 1992.25. F. Schipp, W. R. Wade, and P. Simon, Walsh series, Akad.
Kaid´o and Adam Hilger,Budapest and Bristol, 1990.26. Y. Meyer, Ondelettes et op´erateurs, Hermann, Paris, 1990; Wavelets, translation to bepublished by Cambridge Univ.
Press.27. R. DeVore and B. J. Lucier, Wavelets, Acta Numerica 1 (1991), 1–56.28.
N. S. Jayant and P. Noll, Digital coding of waveforms, Prentice–Hall, Englewood Cliffs,NJ, 1984.29. T. Hopper and F. Preston, Compression of grey-scale fingerprint images, Data Compres-sion Conference, IEEE Computer Society Press, New York, 1992.30.
M. V. Wickerhauser, High-resolution still picture compression, preprint.31. M. V. Wickerhauser and R. R. Coifman, Entropy based methods for best basis selection,IEEE Trans.
Inform. Theory 38 (1992), 713–718.32.
R. DeVore, B. Jawerth, and B. J. Lucier, Image compression through wavelet transformcoding, IEEE Trans. Inform.
Theory 38 (1992), 719–746.
18GILBERT STRANG33. J. N. Bradley and C. Brislawn, Compression of fingerprint data using the wavelet vectorquantization image compression algorithm, Los Alamos Report 92–1507, 1992.34.
J. Lim, Two-dimensional signal and image processing, Prentice-Hall, Englewood Cliffs,NJ, 1990.35. G. Beylkin, R. R. Coifman, and V. Rokhlin, Fast wavelet transforms and numerical algo-rithms, Comm.
Pure Appl. Math.
44 (1991), 141–183.36. C. K. Chui, An introduction to wavelets, Academic Press, New York, 1992.37.
M. B. Ruskai et al., Wavelets and their applications, Jones and Bartlett, Boston, 1992.38. J. S. Geronimo, D. P. Hardin, and P. R. Massopust, Fractal functions and wavelet expan-sions based on several scaling functions (to appear).Department of Mathematics, Massachusetts Institute of Technology, Cambridge,Massachusetts 02139E-mail address: gs@math.mit.edu
출처: arXiv:9304.214 • 원문 보기