A Simple CW-SSIM Kernel-based Nearest Neighbor Method for Handwritten Digit Classification

A Simple CW -SSIM Kernel-based Nearest Neighbor Method for Handwritten Digit Classification Jiheng Wang a , Guangzhe Fan a and Zhou Wang b a Dept. of Statistics and Actuarial Sci ence, Univ. of Waterloo, Waterloo, ON, Canad a b Dept. of Electri cal and Computer Engine ering, Univ . of Waterloo, Waterloo, ON , Canada Abstract We propose a sim ple kernel based nearest neighbor approach for handwritten digit classification. The "distance" here is actually a kernel defining the similarity b etween two images. W e carefully study the ef fects of different number of neighbors and weight schemes and report the results. With only a few neare st neighbors (or most similar images) to v ote, the test set error rate on MNI ST datab ase could reach abo ut 1.5 %-2.0%, which is very c lose to many advanced m odels. Introduction Due to t he high d imensionality n ature o f digital i mages, image classificatio n algorithms typically r equire a feature extraction process (suc h as co rner detection) or an app earance-based d imension reduction stage ( such a s principle component anal ysis) before th e applicatio n of statistical lear ning and clas sification algor ithms. Meanwhile, there has been some interesti ng recent pro gress on defining s imilarity metrics b etween t w o images t hat are in their o riginal 2D functional for m. T hese include the structural similarity ( SSIM) index 1 and its extension – complex wavelet SSIM (CW-SSIM) index 2,3 . Conceptually, these similarity m etrics have the potential s to be used in image classification problems, but there has not b een sufficient study on how thi s should be performed in real -world scenarios. Image s imilarity indices pla y a cr ucial role in the d evelopment, assessment and opti m ization of a large number of image pro cessing and pattern reco gnition systems. An image can be vie wed as a 2 -D function of inte nsity. Per haps the si mplest way to co mpare the similarity of t wo i mages is to co m pute the mean sq uared error between these two 2D functions. Unfortunatel y, such a point -wise similarity measure d oes not take into accou nt the correlation bet w een neighboring image pixels and has b een shown to be prob lematic in many ways 4 . Recentl y, a s ubstantially different approach called the S SIM i ndex 1 was pr oposed, where th e structural i nformation of an i mage is defined as those attributes that represe nt the structures of the objec ts in the visual scene, apart from t he mean intensity and co ntrast. Thus, the SSIM i ndex separates the co mparison of local structural patterns fro m lo cal mean intensit y a nd contras t comparisons. The SSIM index has sho wn somewhat surprising success in pred icting per ceptual image qual ity when compared with more sop histicated methods based o n psycholo gical models of the h uman visual system 4 . A common drawback of both MSE and S SIM metrics is their high se nsitivity to small geometric d istortions such as translatio n, rotation and scali ng. The CW -SSIM measure overco mes this problem by transfor ming SSIM to the co mplex wavelet transform do main 2,3 . T he key idea b ehind CW-SSIM is that small geo metric image d istortions lead to consistent phase changes in local wavele t coefficients, and t hat a consistent phase shift of the coefficients does not cha nge the structural conte nt of the image. The p otential of CW-SSI M has been demonstrated with a ser ies of applications, including image quality assessment 2 , digit recognition 2 , line-drawing comparison 3 , segmentation co mparison 3 , range-based face recog nition 5 and palmprint recognitio n 6 . The well-known MNIST datab ase of handwritten digits is co m posed of 60,00 0 tr aining and 10,000 test examples, where the data were co llected among Cens us B ureau e m ployees and high school students. T he original images have a nor malized size o f 28×28 and contain gray levels for the purpose of anti-aliasi ng. In this pap er we prop ose a series of si mple a nd fast ker nel-based classificatio n algorit hms based on CW-SSIM inde x for the MNIST database , which appears to be effective and reliable tools for the MNIST Database of Handwritten Digits Classif icatio n. Although n o feature extraction or di mension reduction p rocess is i nvolved , we obtain q uite competitive results with only a simple k - NN model. Given th at the CW-SSIM index provides a p ow erful si milarity measure between two misaligned images and there ar e su fficient tr aining examples in the MNIST datab ase, we are ab le to effectively classify test samples using only t he most similar images . Methodology Complex Wavelet Str uctural Similarity Ind ex (CW-SSIM) The SSIM index was originally proposed to predict perceived im age quality 1,4 . The fundamental p rinciple is that the human visual s ystem is high ly adap ted to extract structural information from the visual scene, and therefore, a measurement of structural si milarity should p rovide a good appro xim ati on of perceptual image q uality. In par ticular, SSIM atte mpts to discount tho se distortions that do not affect the structures ( or local intensit y patterns) of the i m age. In the spatial do main, the SSIM index between t wo i m age p atches x = { x i | i = 1, 2,…, M } and y = { y i | i = 1, …, M } is defined as           2 2 2 1 2 2 2 1 2 2 , C C C C y x S y x y x xy y x               (1) where  and  are the sample mean, sta ndard deviation a nd covariance ter ms of x , y and xy , respectivel y , and C 1 and C 2 are two small positive co nstants to avoid instabilities. T he m aximum value 1 is ac hieved if and only if x and y are identical. The major dr aw back of the sp atial do m ain S SIM algorithm is that it is highly se nsitive to translation, scaling, and rotation of i mages. T he CW-SSIM index i s an extension of the SSIM method to the complex wavelet do main. T he goal is to design a measurement that is inse nsitive to “non - s tructural” geometric distor tions that are typically caused by nuisance factors, suc h as changes in lighting conditio ns and the relative movement of the i mag e acq uisition device, rather than the actual changes i n the structures of t he obj ects. The CW -SSIM index is also inspired by the impressive patter reco gnition capab ilities of the hu man visual system 1 . In t he last three dec ades, scientists have found that neuro ns in the primar y vis ual cortex can be well -modeled using localized multi-scale b andpass oriented filters that decompose natural image signals into multiple visual channels. Intere stingly, some psy chophysical evidence suggests t hat the same set of visual channels may also be used in image pattern recognition tasks 7 . Furthermore, phase contains more structural infor mation than magnitude in t y pical natural images, and rigid translation of image str uctures leads to co nsistent phase shift. The CW -SSIM index is defin ed as             N i N i i y i x N i i y i x y x K c c K c c c c S 1 1 2 , 2 , 1 * , , 2 , ~ (2) Here c x and c y are the sets o f local coefficients (in t he neighbo ring spatial locations of the same wavelet subband) extracted from the complex wavelet trans formation (e.g. the complex version of the steerable p y ramid decomposition 8 ) of the t wo images being co mpared, respectively, c* deno tes the co mplex conj ugate of c , K is a small positive constant. T he p urpose o f K is mainl y to impro ve the robustne ss of the CW -SSIM measure when the local signal- to -noise ratios are lo w. We consider CW -SSIM as a useful measure of image structural similarit y based on the b eliefs that 1) the structural infor m ation o f local i mage features is mai nly contained in the relative p hase patterns o f wavelet co efficients, and 2) co nstant p hase shift o f all co efficients do es not c hange the structure of t he local image feature. K- Nearest Neighbors Algorithm (K -NN ) In pattern recognition, the k-nearest neig hbors algorithm (K -NN) i s a method for classif ying objects based on clo sest training exa mples in the feature space. K-NN is a type of instance -based lear ning, or lazy learn ing where the function is only appr oximated locally a nd all computation is deferred until classi fication. T he k-nearest neighbor algorithm is a m ongst the si mplest of all machine learning algorith ms: an objec t is classified b y a majority vote of its neighbors, with t he ob ject being a ssigned to the cla ss most common a mongst it s k nearest neig hbors ( k is a positive integer, typica lly small). If k = 1, then the object is simply assigned to the class of its nearest nei ghbor. The Classification Alg orithm for MNIST D atabase Here we will introduce a series of simple ker nel-based classi fication algorithms for the MNIST d atabase, which uses CW -SSIM index to define the distance b etween different images . A detailed and systematic descrip tion of kernel methods for machine lear ning is gi ven in Schölkopf and S mola 9 . As we stated b efore, only e mploying a basic statistical model, i.e. k - NN, we will onl y use most similar images to class ify test example s . We select 10 test images randoml y , one for each digit, and show t heir CW -SSIM scores with all training ima ges in Fig ure 1. It can be ob served that the value of CW -SSIM is t ypically hig her when it is used to measure t he si milarity between two image s belo nging to the sa me digit, which verifies our h ypothesis of using most similar images to classify. K=1 Case When k =1, this is the most direct implementation : we cl assify every test ima ge using the most similar tr aining image, i.e., we choose the training image with the highest CW -SSIM score to assign the label. Given that there are 60,000 examples i n the training set, this method is reasonable since the tr aining set is la rge enough to provide the most similar image to ever y test example. K>1 Case When k >1, an image is classified by a maj ority vote of it s neighbors. For the u nweighted case, we denote by the most frequent digit a mong the most k si milar i mages as our pred icted label. While for the weighted case, we u se the corresponding CW -SSIM scores as the weig hts of t hese k similar i mages. Then the digit with the highest sum o f weights (CW-SSIM scores) is denoted as the predicted label. Figure 1. The boxplots of CW-SSIM scores (One test sample is selected for e ach digit) Result Experiment on Si mulated data The experiment is first carrie d o ut on simulated data. A training/testi ng d igit image datab ase of 2 ,000 images was created by shifting, scalin g, rotating, and blurring ten hand - written template d igit images. Figure 2 sho ws a rando m subset of examples in o ur image database. Figure 2. Random samples of simulated hand-writt en digit images Figure 3. Tra in ing error of CW- SSIM on Simulated dat a using 2 ,000 training images as a function of the number of neighbor s k We repeated 5 trial s of tests where 1,60 0 images random ly selected from 2,000 images wer e used for training and th e r emaining 40 0 im ages f or t esting. The test results ar e shown i n Figure 3, including the unweig hted and weighted cases . The training error rate, as a functio n of t he numb er of neighb ors , is com pu ted as the avera ge percentage of misclassified images for each trial ov er 2,000. When k  20, the traini ng error rates for the unweighted o r the weighted case are all zero. W hen k> 20, the train er ror rates vary from 0 to 0.6 %. T his result shows t hat we ca n ob tain a higher predicted ac curacy when k is not too large, which is in accordance with our proposal of only using most similar images to classify. Experiment on MNIS T database The MNIST database is composed of 60,000 training and 10,000 test e xamples. Some s ample imag es ar e shown in Figure 4. In our preliminary test, we extracted a subset from th e database for training and t esting . Figure 4. Random samples of hand-written dig it im ages fro m the MNIST data base Figure 5. Train ing error rate as a function of the number of neighbo rs k for CW -SSIM on M NIST database using 5,000 training images First, we selected 5,000 training images and 2,00 0 test images randoml y. We repeated 5 t rials of training tests where 4,000 images randomly selected fr om 5,000 images w ere used for training and the remaining 1,000 images for testing. Th e t est results are show n in Fig ure 5, including the u nweighted a nd w eighted c ases. The training error rate is com puted as the a verage percentage of misc lassified images for each trial ove r 5,000. When k =1 , t he trai n erro r rat e is 3.3 8%. W hen k >1 , for the unweighted case, the lo west error rate is 3. 10 % when using 4 neighbors and the second lowest error rate is 3.12% when using 3 or 6 neighbors. While for the weighted case, the lo west erro r rate r eaches 2 .98% when using 4 nei ghbors and the erro r rates ar e above 3 .00% wh en using other choices . Fo r b oth cases, the train ing er ror rate is al w ays below 4.00% when using 2 to 12 neighbors . Roughly speaking, the weighted method performs slightly be tter than the u nweighted one. Figure 6. Test er ror rate as a function the nu mber of nei ghbo rs k for CW - SSIM on MNIST data base using 5,000 training images and 2,000 test images The results in Figure 6 used al l the 5,000 images fo r trai ning an d the s eparate set of 2,000 images for testi ng. The t est er ror rate is ca lculated a s th e averag e percent age of misc lassified images over 2,00 0. When k = 1, th e test erro r rate is 3.1 0%. When k >1 , for the unweighted case , the lo west error rate is 2.5 0% when using 5 nei ghbors and the error rate is always b elow 3.0 0% wh en u sing 3 to 13 neighbors. While for the weighted case, the lo west error rate reaches 2.35% when usin g 6 n eighbors and the error r ate is also belo w 3.0 0% when using 3 to 11 neighbors. Similar to the training process describ ed abo ve, the weighted method performs slightly better than the unweighted one. Figure 7. Train ing error rate as a function of the number of neighbo rs k for CW -SSIM on M NIST database using 10, 000 training images In our s econd test, we selected 10,000 training imag es and 5,000 test images randomly. Similarly, we repeated 5 trial s of tests where 8,000 images ra ndomly selected f rom 10,000 images were us ed for training and th e remaining 2,000 images from the same set we re employed for testing. The test results a re shown in Figure 7, incl uding both c ases. The results in Figur e 7 used all 10,000 ima ges for train ing and a separate set of 2,000 i mages for testing. When k =1, the tes t error rate is 3.35 %. W hen k >1, for the unweighted ca se, t he lowest error rate is 3.10% when using 4 neighbo rs and the second lowest r ate is 3 .11% when using 3 nei ghbors. While for the weighted case, the lo west error r ate reaches 2.93% when using 4 nei ghbors and t he second lo w est rate i s 3.02% when using 3 neighbor s. The weighted method is slightly better than the unweighted one. The results in Figu re 8 us ed all the 10,000 images fo r training a nd a separate set of 5,000 images f or testing. The t est er ror rate is ca lculated a s th e averag e percent age of misc lassified images over 5,000 . When k = 1, th e test erro r rate is 4.9 4%. When k >1 , for the unweighted case , the lo west error rate is 4.3 0% when using 4 nei ghbors and the second lo west error rate is 4.48% when using 6 neighbors . While for the weighted case, the lo west erro r rate i s 4.4 2% when using 5 or 6 neighbors and the second lowest error rate is 4.50% when using 8 neighbors. There is no significant difference bet ween weighted and unweighted methods. Figure 8. Test er ror rage as a function of the number of neighbo rs k for CW -SSIM on MNIST database using 10,000 training and 5, 000 test images From the abo ve two training test s, there are two obser vations. First, there is no significa nt difference bet w een weighted and un weighted me thods although some re sults sho wed that the weighted one p erforms slightly better. Second, it is hard to d ecide the exact nu m ber of nei ghbors. A choice bet ween 4 to 10 neighbors seems to be acceptable, which is in acco rdance with our proposal of only using most si milar images to classify . Finally, we co m puted the case for 60,000 training ima ges and 10,000 test images . T his test is costl y i n ter ms o f computational. However, compared with many existing machine lear ning technique s, the speed of o ur al gorithms are reasonable. The results are sho wn in Figure 9. Figure 9. Test error rate as a function o f neighbo rs k for CW-SSIM on M NIST database using 60,0 00 training and 10,000 test images When k =1, the tes t error r ate is 2 .18%. When k >1, for the un weighted case, the test error r ate is below 2% whe n using 3 to 14 neighbors and the lowest error rate is 1. 77% when using 5 neighbors . W hile for the weighted case, the result is quite similar to the u nweighted one, with the lowest er ror rate reach ing 1.73% when using 5 or 10 neighbors . There is no significant difference between weight ed and unweighted methods. Some test i mages and their most similar training images are shown in Figure 10. It can be obs erved that the re exist some “bad” training images that could be quit e misl eading and sever ely reduce the predict ion accurac y. Thus how to e liminate these bad imag es fro m the training set could be a promising direction for future research . Figure 10 . Some test exa mples and their most si milar imag es in the training set Further discussi on Introducing Decay ing Weighting Fun ction Since it is hard to decide the exact number of neig hbors, we i ntroduce a smooth decaying weighti ng functio n. In particular, we attempted an expo nential function and a Gaus sian function give n by    1    i e i w    2 1 ) (    i e i w (3) where i is the ra nk acco rding to the CW -SSIM scores i n de scending o rder. Based on t he weighted method, we use the p roduct of this additiona l decaying weight w(i) and the CW -SSIM scores a s t he weights of all trainin g ima ges, and the digit with the highest su m o f weights is d enoted as the predicted label. Both deca y ing functions in Eq. (3 ) are tested and the resu lts are shown in Figure 1 1. Figure 11 . Test erro r of CW -SSIM o n MNIST database as a function of t he value of  When usin g an exponential decaying function, the lo west err or rate is 1.66% with  set to 4. While using a Gaussian deca ying function, the lo w est error rate reaches 1.62% with  set to 21. From t he t w o curves in Fig ure 11 , it seems Gaus sian decaying function is a more reliab le and stable choice especiall y, which gives consistent results within a ran ge of  values ( fr o m 20 to 25). The error rate of about 1.62% within tha t range is the best result of all methods we tested so far. Comparison The results of all three algorithms ar e shown i n Table 1, where the method with deca ying weighting functio n involved results in to the best predict ion accuracy. Table 1 . Co mparison between M SI, V-MSI and Exp-M SI Method Best for k = 1 Best for k > 1 Best with decaying weighting function Error rate 2.18% 1.73% 1.62% Conclusion We have prop osed a series o f simple CW - SSIM ker nel-based classificatio n algorith ms, whic h appears to be effective and reliable tools when tested with the MNIST Database of handwritten d igit cla ssification. An interes ting feature o f our ap proach is that no feat ure extraction o r dimension reduction p rocess is invo lved. W e obtain competitive results with only a pretty si mple model. This result could be understood from t wo aspects. First, the CW -SSIM index provides a po werful similarity measure between two misalig ned i mages. Seco nd, the training set is large enough to al most always pro vide good matching i mages to any test sample. Reference 1. Z. Wang, A. Bovik, H. Sheik h and E. Simoncelli, “Image qualit y assessment: From error visibility to structural similarity,” IEEE Trans. Ima ge Process. , vol. 13, no. 4, pp . 600 – 612 , Apr. (2004) . 2. Z. Wang and E. P . Simoncelli , “Translation i nsensitive image similarit y in complex wavelet d omain,” IEEE Int. Conf. Acoustics, S peech, and Signal Processing , vol. I I, pp. 573-576, Philadelphia, P A, March (2005) . 3. M. P. Sampat, Z. Wang, S. G upta, A. C. Bovik and M. K. Markey, "Complex wavelet structural similarit y: A new image similarity inde x," IEEE Tran sactions on Image Processing, vol. 18 , no. 11, pp. 2385 -2401, Nov. (2009) . 4. Z. Wang and A. C. Bovik, “Mean squared error: lo ve it or leave it? - A new look at signal fidelity measures, ” IEEE Signal Pro cessing Magazine , vol. 26, no. 1, pp. 98 -117, Jan. (2009 ). 5. S. Gupta, M. P. Sam pat, Z. Wang, M. K. Markey and A. C. B ovik, “Facial range image m atching using the complex wa velet str uctural similarity metric,” IEEE Worksho p on Applications of Computer Vision , Austin, T X, Feb. 21-22 , (2007). 6. L. Zhang, Z. Guo, Z. Wang a nd D. Zhang, “Palmprint verificatio n using complex wavelet transform,” IEEE Internationa l Conference on Image Proce ssing , San Anto nio, TX, Sept. 16-19, (2007) . 7. J. A. Solomon a nd D. G. Pelli, “T he visual filter mediating letter identification,” Nature , vo l. 369, pp. 395 – 397, (1994). 8. J. Portilla and E. P. Simoncelli, “A para metric texture model based o n joint statistics of complex wavelet coefficients,” In ternational Journal o f Computer V ision , vol. 40, pp. 49 – 71, (2000). 9. B. Schlkopf and A. J . Smola, “Learnin g with Kernels, Supp ort Vector Machines, Reg ularization, Optimization, and Beyond, ” The MIT Press , (2002 ).

A Simple CW-SSIM Kernel-based Nearest Neighbor Method for Handwritten Digit Classification

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment