📝 Original Info
- Title: Discussion of: Treelets–An adaptive multi-scale basis for sparse unordered data
- ArXiv ID: 0807.4019
- Date: 2008-07-28
- Authors: ** Robert Tibshirani (Stanford University) **
📝 Abstract
Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]
💡 Deep Analysis
Deep Dive into Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data.
Discussion of “Treelets–An adaptive multi-scale basis for sparse unordered data” [arXiv:0707.0481]
📄 Full Content
arXiv:0807.4019v1 [stat.AP] 25 Jul 2008
The Annals of Applied Statistics
2008, Vol. 2, No. 2, 482–483
DOI: 10.1214/07-AOAS137D
Main article DOI: 10.1214/07-AOAS137
c
⃝Institute of Mathematical Statistics, 2008
DISCUSSION OF: TREELETS—AN ADAPTIVE MULTI-SCALE
BASIS FOR SPARSE UNORDERED DATA
By Robert Tibshirani
Stanford University
This is a very interesting paper on an important topic—the problem of
extracting features in an unsupervised way from a dataset. There is growing
evidence that unsupervised feature extraction can provide an effective set of
features for supervised learning: see, for example, the interesting recent work
on learning algorithms for Boltzmann machines [Hinton, Osindero and Teh
(2006)].
The ideas in this paper are exciting—treelets are a neat construction that
combine clustering and wavelets, and are simple enough to be theoretically
tractible. The connection to the latent variable model is also interesting:
this kind of model is also the basis of supervised principal components, a
method that I co-developed recently [Bair et al. (2006)] for regression and
survival analysis in the p > N setting.
I have no practical experience with treelets, so my remaining comments
will be brief and mostly in the form of questions for the authors. A much
simpler approach to this problem would be to hierarchically cluster the pre-
dictors, and then take the average at every internal node of the dendrogram.
Let’s call this the “simple averaging” method. As noted by the authors,
this has already been proposed in the literature, for example, in the “Tree-
harvesting” procedure. In this approach we keep all of the original predictors
and all of the internal node averages and so end up with an over-complete
basis of 2p basis functions.
How are treelets different from simple averaging? Treelets do an orthog-
onalization after each node merge, but does this change the clustering in a
material way? What advantage is there to the orthogonal basis delivered by
treelets? After all, it looks like the resulting linear combinations of variables
are not uncorrelated. Does the simple averaging method perform as well as
treelets in the kind of examples of the paper? Do the authors’ theorems
Received December 2007; revised December 2007.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Statistics,
2008, Vol. 2, No. 2, 482–483. This reprint differs from the original in pagination
and typographic detail.
1
2
R. TIBSHIRANI
apply to the simple averaging method as well, or are treelets uniquely good
in their estimation of the components of a latent variable model?
The contrast between treelets and simple averaging is analogous to the
contrast between wavelets and basis pursuit [Chen, Donoho and Saunders
(1998)]. The former is an orthogonal basis while the latter is over-complete;
when fitting is done with an L1 (lasso) penalty, the over complete basis, can
provide a very good predictive model.
One small point—hierarchical clustering is usually done with average link-
age between pairs of predictors. A variation, commonly used in genomics and
sometimes called Eisen clustering (since it’s implemented in Eisen’s Clus-
ter program), uses instead the distance (or correlation) between centroids.
The Treelet construction looks more like Eisen clustering. The point is that
one could apply Eisen clustering, and then simply average the predictors in
every internal node.
REFERENCES
Bair, E., Hastie, T., Paul, D. and Tibshirani, R. (2006). Prediction by supervised
principal components. J. Amer. Statist. Assoc. 101 119–137. MR2252436
Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by
basis pursuit. SIAM J. Sci. Comput. 20 33–61. MR1639094
Hinton, G., Osindero, S. and Teh, Y.-W. (2006). A fast learning algorithm for deep
belief nets. Neural Comput. 18 1527–1554. MR2224485
Departments of Health Research & Policy,
and Statistics
Stanford University
Stanford, California 94305
USA
E-mail: tibs@stanford.edu
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.