Industrial and Medical Anomaly Detection Through Cycle-Consistent Adversarial Networks

In this study, a new Anomaly Detection (AD) approach for industrial and medical images is proposed. This method leverages the theoretical strengths of unsupervised learning and the data availability of both normal and abnormal classes. Indeed, the AD…

Authors: Arnaud Bougaham, Valentin Delchevalerie, Mohammed El Adoui

Industrial and Medical Anomaly Detection Through Cycle-Consistent Adversarial Networks
Industrial and Medical Anomaly Detection Through Cycle-Consistent Adversarial Netw orks Arnaud Bougaham a,1, ∗ , V alentin Delchev alerie a,b,1 , Mohammed El Adoui a , Beno ˆ ıt Fr ´ enay a a F aculty of Computer Science, NaDI institute, Rue Gr andgagnage 21, Namur , 5000, Belgium b naXys institute Abstract In this study , a new Anomaly Detection (AD) approach for industrial and medical images is proposed. This method lev erages the theoretical strengths of unsupervised learning and the data a vailability of both normal and abnormal classes. Indeed, the AD is often formulated as an unsupervised task, implying only normal images during training. These normal images are de voted to be reconstructed, through an autoencoder architecture for instance. Howe ver , the information contained in abnormal data, when available, is also valuable for this reconstruction. The model would be able to identify its weaknesses by better learning how to transform an abnormal (respecti vely normal) image into a normal (respecti vely abnormal) one, helping the entire model to learn better than a single normal to normal reconstruction. T o address this challenge, the proposed method uses Cycle-Generative Adv ersarial Networks (Cycle- GAN) for (ab)normal-to-normal translation. After an input image has been reconstructed by the normal generator , an anomaly score quantifies the di ff erences between the input and its reconstruction. Based on a threshold set to satisfy a business quality constraint, the input image is then flagged as normal or not. The proposed method is ev aluated on industrial and medical datasets. The results demonstrate accurate performance with a zero f alse ne gativ e constraint compared to state-of-the-art methods. The code is av ailable at https: // github .com / V alDelch / CycleGANS- AnomalyDetection. K e ywor ds: Cycle-GAN, Industry 4.0, Industrial Images, Medical Images, Anomaly Detection, Zero False Negati ve 1. Introduction Image anomaly detection in volves identifying anomalies within visual data, playing a crucial role in highlighting unexpected patterns in images. This work proposes a new approach with a Generative Adversarial Networks (GAN) architecture for the task of Anomaly Detection (AD), which aims to combine the advantages of both unsupervised learning and the data availability of the normal and abnormal classes. Indeed, AD is often formulated as an unsupervised task due to the frequent high imbalance between normal and abnormal data, and the need for generalization across a wide range of anomalies. A common practice is to use an autoencoder architecture to encode / decode normal images. Only the normal class is taken into ∗ Corresponding author . Email address: arnaud.bougaham@unamur.be (Arnaud Bougaham) 1 These authors contributed equally to this work. account, and the reconstruction from the abnormal to the normal class is not included in this training process. Y et, this is precisely the task we are e xpecting from a reconstruction-based AD method during the inference step. The proposed method seeks to ov ercome this lim- itation by learning ho w to transform an abnormal image into a normal one by exploiting samples from both classes. The objectiv e is to generate a reconstructed im- age where any abnormal pixel is replaced by a normal one in a visually-coherent manner . During the training step, both a “normal” and an “abnormal” generator are tied together in an adversarial frame work, using Cycle-Generativ e Adversarial Networks (Cycle-GAN) (proposed by Zhu et al. [32]). Also, reconstructing the abnormal data during the training step yields to a better normal generator than the classical methods using only the normal class. Even if the abnormal datasets can be small as it is usually the case in the AD context, the normal generator performs better , because its performance is also constrained by the abnormal generator , resulting in a good reconstruction. Pr eprint submitted to Neur ocomputing J anuary 24, 2024 Figure 1: Example generated from a Cycle-GAN (see Section 5 for training details) that learns mappings between aerial photos X and Google maps Y (dataset from [1]). The initial image x ∈ X can be mapped to ˜ y ∈ Y thanks to a first generator G . The second one F can then go back from ˜ y ∈ Y to ˜ x ∈ X . A cycle-consistent constraint enforces ˜ x to be close to x . T o the best of our knowledge, this is the first time that Cycle-GAN has been studied for this purpose. W e still consider this as an unsupervised learning task because the abnormal data used during training is not necessarily representati ve of all anomalies that could occur . Abnormal data are just giv en to help during the training phase by gi ving more feedback to the generators. Therefore, the generalization is guaranteed as it is the case in a classical GAN context, except that the normal reconstruction is less noisy . Cycle-GAN is a well-known architecture proposed a few years ago. It constitutes an elegant way to learn conditional mappings from two di ff erent domains X and Y (for image-to-image translation) by applying a cycle- consistent constraint on the transformations. The popu- larity of Cycle-GAN lies in the fact that they only need a dataset of unpaired images to learn the mappings. In other words, the y do not need the one-to-one correspon- dence between data from X and Y and from Y and X , but only two independent sets of data { x i ∈ X} and { y i ∈ Y} . As an example, one can consider two un- paired datasets { x i } and { y i } made of unpaired aerial im- ages and Google maps, respecti vely . A Cycle-GAN can be trained to learn meaningful mappings from X to Y and Y to X . Figure 1 presents an example generated with this Cycle-GAN. Numerous studies ha ve demonstrated the versatility of Cycle-GAN in various image analysis applica- tions. Nonetheless, Cycle-GAN remains seldom used in practice to solve problems in the industrial and medical domains. It is for example the case of AD where no prior work directly exploits Cycle-GAN for other purposes than data augmentation. Compared to the state-of-the-art methods for AD in images as shown by [3], [10] or [24], Cycle-GAN seems to be more suitable than concurrent methods that rely on traditional methods. Furthermore, we sho w that the formalism behind Cycle-GANs makes them e ffi cient and well-suited for AD. Specifically , it introduces an identity loss for the reconstruction of normal-to-normal (and abnormal-to-abnormal) images. This makes the generators much less noisy than traditional GANs, making it possible to better discriminate normal and abnormal images. T o illustrate this, we focus our experiments on se veral industrial and medical problems of AD. This is motiv ated by (i) the abundance of AD problems in these domains, and (ii) the positi ve societal impact of dev eloping e ffi cient AD algorithms for them. The main contributions of our work could be listed as follow: • Use abnormal data in the training process by lev er- aging a Cycle-GAN architecture for AD, thus con- sidering an identity loss that allows a better dis- crimination between normal and abnormal images. • Characterize and discuss the performances of the method for div erse industrial and medical AD problems. • Conduct an extensi ve benchmark to compare the proposed approach with state-of-the-art methods. • Discuss why Cycle-GAN is well-suited for AD in specific image types and explore its potential in in- dustrial and medical domains. In the follo wing, Section 2 presents the theoretical prerequisites to understand Cycle-GAN. After that, Sec- tion 3 presents the previous works, and highlights that most of them only use simple architectures up to the tra- ditional GAN ones, by training with only normal data. The proposed AD method with Cycle-GAN is described in Section 4. Section 5 then introduces the considered datasets, the experimental setup as well as the results. Finally a discussion is presented in Section 6 before concluding with Section 7. 2. Background on Cycle-GAN This section introduces the formalism for Cycle- Generativ e Adversarial Network (Cycle-GAN); the building blocks and the loss functions are described. 2.1. Building Blocks Cycle-GAN learns image-to-image mappings from an unpaired dataset composed of two types of images from domains X and Y . Cycle-GAN is obtained by tying together two distinct conditional GANs with a 2 cycle-consistent constraint. The first GAN is made of a generator G : X ∪ Y − → Y : G ( z ) = ˜ y , and a discriminator D Y ( · ), and the other is made of a generator F : X ∪ Y − → X : F ( z ) = ˜ x , and a discriminator D X ( · ). For conv enience, let us al- ready consider an AD task where X are abnormal im- ages, while Y are normal ones. On the one hand, the aim of G is to generate from x ∈ X ∪ Y an image such that D Y cannot distinguish it from real normal images in Y . On the other hand, F aims to generate images such that D X is fooled and cannot distinguish it from real ab- normal images in X . T o achieve this, Cycle-GAN is trained with a combination of di ff erent losses that are described in the next section. 2.2. Objective Function A Cycle-GAN is made of two GANs that are tied to- gether with a cycle-consistent constraint. The loss can be broken do wn into three parts so that G ∗ , F ∗ = arg min F , G max D X , D Y L adv + λ cyc L cyc + λ ide L ide , (1) where λ cyc and λ ide are meta-parameters that constraint the di ff erent parts of the loss. The first part of (1) is made of tw o classical adv ersar- ial losses [12] L adv = L GAN ( G , D Y ) + L GAN ( F , D X ) , where, L GAN ( G , D ) = E y  log ( D ( y ))  + E x  log ( 1 − D ( G ( x )))  . On the one side, by enforcing G (resp. F ) to minimize L adv , the generator will try to generate images that look similar to images from Y (resp. X ). On the other side, by enforcing D Y (resp. D X ) to maximize L adv , the dis- criminator will try to distinguish between images com- ing from the generator G (resp. F ) and real images in Y (resp. X ). The second part of (1) is motiv ated by the fact that the reconstructed images F ( G ( x )) and G ( F ( y )) should be close to x and y , respecti vely . In other w ords, the pair of GANs should be cycle-consistent. This is achiev ed by the cycle-consistent loss L cyc = E x [ ∥ F ( G ( x )) − x ∥ 1 ] + E y  ∥ G ( F ( y )) − y ∥ 1  , where a L1 norm is used in the original work on Cycle- GAN [32]. In addition to the L adv and L cyc , an identity loss is added to constrain the generators to leav e the images unmodified if they are already in the desired output do- main, defined as L ide = E x [ ∥ F ( x ) − x ∥ 1 ] + E y  ∥ G ( y ) − y ∥ 1  , so as to enforce F ( x ) = x and G ( y ) = y . In other words, F should not add anomalies if the input image is already abnormal, and G should not make an y modification if it is already normal. Although the identity loss is present in the implementation of Cycle-GAN from the original paper , it is not discussed and seldom used in practice. Howe ver , in the context of AD, the use of the iden- tity loss is particularly rele vant. Indeed, it is expected from G that it erases an y abnormal pixel from the im- age. Nonetheless, in the case the image does not contain any of them, it should learn to leav e it unmodified. This important property is enforced by the identity loss. 3. Related W orks AD has long been an area of great concern in a wide range of fields such as biomedicine [27], industry [6] and security [19, 2]. Furthermore, a significant number of works hav e been published to characterize the AD approaches in the literature. The scope of this section is focused on previous w orks based mainly on GANs and Cycle-GAN, applied to the industrial and medi- cal domains. GANs are used for many image-related tasks, specifically in AD [28, 3, 30, 7], segmentation, data augmentation, etc. Howe ver , in the AD context, Cycle-GAN has been mostly used for data augmenta- tion only [22, 17, 11]. Since our research is focused on AD in medical and industrial applications, in this section, we revie w the most relev ant researches applied to these two fields. An in-depth analysis of the state- of-the-art methods associated with our research issue shows that recent AD methods are mainly using only GANs. Y et, Cycle-GAN can be highly useful for AD thanks to the combination of unsupervised learning and the data a v ailability of the normal and abnormal classes. Regarding the industrial studies, Bougaham et al. [6] propose to use intermediate patches (i.e., parts of image) for the inference step after a W asserstein GAN training process. The objectiv e is to produce an e ffi cient ap- proach for AD on real industrial images of electronic Printed Circuit Board Assembly (PCBA). The technique can be used to assist current industrial image process- ing algorithms and to av oid tedious manual processing. 3 Nev ertheless, due to the wide variety of possible anoma- lies in a PCB A and the high complexity of autoencoder architectures, a real-world implementation remains a challenging task, specifically for small anomalies, ev en if the method ev olved to overcome some limitations in the work of Bougaham et al. [7]. Akcay et al. [3] use an autoencoder with skip connections, culminating in a GAN discriminator , which proves to be an e ff ectiv e means of training the model for the normal class, in the context of AD. Di ff erent from GAN techniques, Roth et al. [24] recently suggest a patch-features encoding AD method applied to industrial images, modifying the prediction threshold to ensure a 100% recall rate with no false ne gativ es. Also, Defard et al. [10] use a pre- trained CNN for patch embedding and employ multi- variate Gaussian distributions to obtain a probabilistic representation of the normal class. Rippel et al. [23] and Zhang et al. [31], as for them, suggest to use Cycle- GAN to perform data augmentation by generating syn- thetic images for industrial inspection. Regarding the medical field, Schlegl et al. [28] proposed an unsupervised AD framework GANs (f- AnoGAN), that can detect the unseen anomalies of medical subjects after being trained on healthy tomog- raphy images. Among the pre vious studies in medical imaging, works of Hammami et al. [13] and Sandfort et al. [26] could be cited. These authors use Cycle- GAN to perform the data augmentation task for Mag- netic Resonance Imaging (MRI) and Computed T omog- raphy (CT) scan images, respecti vely . Again, they sho w that using Cycle-GAN for data augmentation leads to better segmentation performances afterwards. Despite being better than the previous approaches, the above AD approaches use unsupervised deep learn- ing techniques, such as autoencoders, pre-trained CNN and GANs, to characterize the normal class, without us- ing the insights giv en by the anomalies. In contrast, in this work, the anomaly images are leveraged immedi- ately in the training phase, using it as prior knowledge to strengthen the model at recognizing anomalies. Further- more, we assess our method in both industrial and med- ical images while enforcing zero false negati ve (ZFN), which is the most useful in these fields where missed detections hav e large impacts on customers or patients. In a concise w ay , a thorough analysis of the most im- portant studies in the literature shows that in the indus- trial and medical domains, Cycle-GAN has been mainly used for data augmentation. This paper demonstrates the suitability of Cycle-GAN for AD, in particular for industrial and medical images, which has nev er been cov ered in the literature. 4. Methods This section introduces the developed approach and shows its relev ance for AD with Cycle-GAN. The train- ing and inference steps are illustrated in Figure 2. The basic idea behind the use of Cycle-GAN for AD is to exploit the conditional mapping learned by one of the two generators: the one that goes from abnormal to normal images. Indeed, by forward-propagating an ab- normal image in this generator , it is expected to obtain a new image where the anomaly is erased. Nonethe- less, and thanks to the identity loss, if a normal image is forward-propagated in the generator, it is expected to remain unchanged. Therefore, by comparing the output of the generator with its input, anomalies in the input images can be located. The other generator (normal-to- abnormal) is not used for AD. It is only useful to jointly train the first one, similarly to the two discriminators, but not for the AD inference step. T o perform AD, the normal and abnormal test im- ages are giv en to the learned abnormal-to-normal gen- erator . Then, an anomaly score is computed to measure the distance between the original test image and the re- constructed one. In this paper , two metrics are consid- ered: a per-pix el sum of the squared di ff erences (SSE), and a Frechet Inception Distance (FID) [15]. The FID anomaly score is more elaborated and focuses on per- ceptual di ff erences thanks to the use of a pre-trained In- ception V3 network [29]. T wo di ff erent thresholds are considered for the anomaly detector . The first thresh- old is set by minimizing the number of classification er - rors, which yields an anomaly detector with maximum accuracy (ACC). The second one is set so that all true positiv es are detected (only false alarms can be raised, but no anomaly can be missed). This setting yields an anomaly detector with zero f alse ne gati ve (ZFN), which is the most useful in the critical b usiness applications of the industrial or medical fields, where f alse neg ativ es hav e large consequences for customers or patients. In summary , four anomaly detectors are built, i.e., one for each pair of metrics and thresholds. In addition, the re- ceiv er operating characteristic curve (A UCR OC) [8] is also considered, in order to quantify the performance at di ff erent threhsolds. Notice that the thresholds are set on the test sets, and make it possible to assess ho w much the two distributions (SSE and FID for normal and ab- normal data) are discriminated. The use of an additional validation set should be preferred, but it would have been too costly in terms of abnormal data for sev eral datasets (due to the scarcity of abnormal data). There- fore, the accuracy v alues are an ov erestimate of the clas- sification performances, and should be seen as a metric 4 Figure 2: (Color online) Architecture for the training (left side) and the inference (right side) steps (inspired from [32]). During the training step, the first generator G tries to map abnormal to normal images by fooling the discriminator D Y that should not detect fake images. F and D X follow the same idea but for normal images as input. During the inference step, only G is used e ven if the input can either be normal or abnormal. to quantify how much the method highlights abnormal images compared to normal images on the test sets. Our goal is indeed to measure the discriminativ e po wer of Cycle-GAN for AD, so as to prospectively validate the practical interest of our idea in the industrial and medi- cal domains. 5. Experiments This section presents the e xperiments carried out to ev aluate the proposed AD method. First, the datasets and the data preprocessing steps are introduced. Next, the model architecture is detailed. Finally , qualitativ e and quantitative results are presented, and then dis- cussed in Section 6. 5.1. Datasets Sev en datasets are used for both the industrial and medical domains, where se veral types of anomalies may occur . T o assess the strengths and weaknesses of our method, we defined four categories of anomalies: small / lar ge object-shaped, or small / large te xtured-shaped anomalies. Indeed, the anomaly can either arise on a specific object characterized by abrupt changes in pix- els intensity (a picture of a screw with a broken head for instance), or where the pixels intensity changes are more progressi ve and more homogeneous (an abnormal color in a wood picture for instance). T o cov er the industrial side, the public MVTEC-AD dataset [5] is used. It consists of di ff erent high res- olution industrial images from 15 di ff erent categories of object and texture-shaped products with and without anomalies. In this work, 4 datasets were selected from MVTEC-AD to co ver the di ff erent natures of images: the Hazelnut (large object-shaped images), the Screw (small object-shaped images), the T ile (large texture- shaped images) and the W ood (small texture-shaped im- ages) dataset, which are made of 501, 480, 347 and 326 images, respectively . All of these datasets are clearly imbalanced with a minority of abnormal images. T o inv estigate the medical side, three datasets of ob- ject and te xture-shaped images are used, coming from healthy and unhealthy subjects. The first dataset is made of 253 Brain MRI images (large object-shaped images) [9]. Second, Breast Ultrasound (lar ge texture- shaped images) [4] is made of 789 images. One should mention that for this dataset, man y images were man- ually labeled by experts by highlighting the tumor on the images. Therefore, to av oid any bias during train- ing, we removed all those annotated images from the dataset, resulting in a dataset of 654 images. Finally , the Retina OCT dataset (small and large texture-shaped anomalies) [18] contains 83 , 600 images of Optical Co- herence T omography . All of these medical datasets are imbalanced with a minority of normal images. T able 1 provides an overvie w of the number of normal and ab- normal images for each dataset. Some of the aforementioned datasets come with dif- ferent types of anomalies. In this case, a single ab- normal class is created by aggregating all the abnormal classes together . Also, the normal and abnormal classes are sometimes imbalanced. Therefore, in order to avoid the use of an imbalanced metric to ev aluate the results, a specific split of the dataset is applied to ensure that the test sets are fully balanced. This is obtained by keeping half of the minority class images for the test set, as well as the same number of randomly picked images from the majority class. All the remaining images are left to the training set. Given that AD may sometimes be a v ery imbalanced predictiv e problem, one of the two classes is generally ov erpopulated in the training sets. Ho w- 5 T able 1: Sizes of the training and test sets for each dataset, regarding the number of normal and abnormal data. A particular split to guarantee that the test sets are balanced is always chosen, e ven if the initial datasets are imbalanced. T raining set T est set Dataset # Normal # Abnormal # Normal # Abnormal Hazelnut 396 35 35 35 Screw 301 59 60 60 T ile 222 43 41 41 W ood 236 30 30 30 Brain MRI 49 106 49 49 Breast Ultrasound 67 455 66 66 Retina OCT 13,172 44,084 13,172 13,172 ev er, the test sets are perfectly balanced, which allows us to assess the performances with a simple accuracy metric. Furthermore, even if using imbalanced train- ing sets may hurt the performances for most of the su- pervised machine learning algorithms, the training pro- cess of Cycle-GAN is less sensitive to this. The task of Cycle-GAN di ff ers from a simple label prediction, and each image gives feedback to the two generators, either directly or indirectly . The same data preprocessing steps are performed for all the datasets. Images are resized to a resolution of 256 × 256 pix els by using a bicubic inter - polation method. Data augmentation is also performed so that objects and te xtures are rotated and flipped along both axes (except for Retina OCT where only flipping along the horizontal axis is pertinent). 5.2. Network Arc hitectur e and T raining Pr ocedur e For con venience and practical purposes, the architec- tures used in this work as well as the training proce- dures are similar for the di ff erent applications, and fol- low the experimental setup presented in the initial pa- per on Cycle-GAN [32]. The generators are formed by three conv olution layers, sev eral residual blocks [14], two fractionally-strided con volution layers and one final con volution layer . W e use 9 residual blocks for images resized at 256 × 256 resolution, and instance normal- ization. For the discriminators, we use 70 × 70 Patch- GANs [16, 20, 21]. All the models are trained through 200 iterations of the Adam optimizer with a learning rate of 2 × 10 − 4 . A linear learning decay is introduced at the middle of the training. The meta-parameters λ cyc and λ ide are fixed to 10 and 5, respectiv ely . T o giv e an idea of the computation time, training the Cycle-GAN on 500 images (4500 images with the data augmen- tation) with a 256 × 256 resolution for 200 iterations roughly takes 12 hours on a single Nvidia A100 GPU. All experiments hav e been performed with CUD A 11.3 and Pytorch 1.10.2. 5.3. Experimental Results This subsection showcases the qualitative and quan- titativ e assessments. 5.3.1. Qualitative Assessments The quality of the reconstruction, as well as the high- lighting of anomalies are presented in Figure 3. It sho ws the original image, the normal (generated) version and their squared pixel di ff erence image, for the selected in- dustrial and medical datasets. Notice that the di ff erence images (3rd column) of the abnormal data highlight the anomaly , unlike the ones of the normal data. These im- ages hav e been specifically chosen to illustrate di ff er- ent typical cases. The follo wing quantitati ve assessment ev aluates the global performances on all the test sets, which are in agreement with the qualitati ve examples presented below . 5.3.2. Quantitative Assessments T able 2 sho ws the quantitati ve results of our method (CycleGAN-AD-256). For comparison purposes, the same architecture is assessed with a lower input image resolution of 64 × 64, and only 6 residual blocks for the generator (CycleGAN-AD-64). Three di ff erent meth- ods are also compared to ours, namely Ganomaly [3], Padim [10] and P atchCore [24]. For CycleGAN-AD-256, CycleGAN-AD-64 and Ganomaly , an anomaly score (SSE or FID) is computed to measure the distance between the original test image and the reconstructed one. As for Padim and P atchCore, only the SSE of the embeddings are considered. No- tice that Ganomaly being a GAN method competitor, its anomaly score di ff ers from the one dev eloped by the 6 Figure 3: (Color online) Industrial and medical image examples. For each dataset, the left green-framed block presents normal images and the right red-framed block shows abnormal images, with the original image (1st column), the normal version generated (2nd column), and their squared pixel-wise di ff erence image (3rd column). 7 T able 2: Quantitative Performance Metrics for all the di ff erent models on all the di ff erent industrial (4 first) and medical (3 last) datasets. For each model, the accurac y with the zero false ne gativ e constraint (ZFN), the maximum accurac y (A CC) and the A UCROC (A UC) metrics are presented, considering FID (when applicable) and SSE metrics. All the metrics are the mean ± the standard deviation (both in percent) over 5 runs for which random train and test splits were generated. The mean performance is computed for each model in the last line w .r .t. their most profitable setting (SSE or FID) when av ailable, and bold values are the best of each metric. CycleGAN-AD-256 (ours) CycleGAN-AD-64 (ours) Ganomaly [3] Padim [10] PatchCore [25] ZFN ACC A UC ZFN ACC A UC ZFN ACC A UC ZFN ACC A UC ZFN ACC A UC Hazelnut FID 98.00 ± 2.14 99.14 ± 0.70 99.89 ± 0.12 74.86 ± 12.89 94.57 ± 2.10 97.32 ± 0.91 51.14 ± 1.67 66.57 ± 3.33 63.31 ± 4.73 / / / / / / SSE 96.29 ± 2.94 98.29 ± 1.07 99.67 ± 0.25 95.43 ± 2.29 98.29 ± 0.57 99.71 ± 0.14 80.29 ± 10.32 87.14 ± 7.17 92.16 ± 4.90 53.43 ± 1.71 95.71 ± 0.00 92.02 ± 0.29 54.29 ± 0.00 95.71 ± 0.00 92.16 ± 0.00 Screw FID 52.03 ± 2.18 57.63 ± 4.25 54.34 ± 8.10 50.51 ± 0.68 57.46 ± 3.36 53.08 ± 5.96 51.19 ± 1.15 57.63 ± 1.42 54.04 ± 3.14 / / / / / / SSE 52.37 ± 3.53 57.97 ± 5.46 52.81 ± 6.29 52.03 ± 3.24 57.63 ± 2.89 51.31 ± 5.06 62.71 ± 10.39 77.12 ± 5.22 80.86 ± 3.49 55.59 ± 4.57 66.27 ± 5.42 54.02 ± 11.91 59.32 ± 0.00 90.68 ± 0.00 84.83 ± 0.00 Tile FID 91.43 ± 7.88 98.33 ± 1.21 99.30 ± 0.73 58.81 ± 6.37 78.81 ± 1.90 83.53 ± 2.61 57.62 ± 3.07 70.48 ± 2.05 72.47 ± 1.56 / / / / / / SSE 78.10 ± 7.85 89.76 ± 3.42 95.40 ± 2.52 52.86 ± 2.21 75.24 ± 3.64 78.32 ± 2.67 50.48 ± 0.58 53.57 ± 2.61 42.39 ± 5.50 61.90 ± 0.00 88.10 ± 0.00 81.86 ± 0.00 61.90 ± 0.00 88.10 ± 0.00 81.86 ± 0.00 W ood FID 91.33 ± 6.78 97.00 ± 1.94 99.04 ± 0.79 71.00 ± 14.85 88.67 ± 1.63 92.82 ± 2.09 53.33 ± 3.80 56.00 ± 2.71 43.62 ± 3.76 / / / / / / SSE 97.00 ± 3.71 97.67 ± 2.91 98.89 ± 1.37 92.33 ± 6.96 96.33 ± 2.87 98.48 ± 1.51 61.00 ± 4.29 71.67 ± 7.67 75.09 ± 7.24 94.67 ± 10.67 95.33 ± 9.33 95.02 ± 9.96 100.00 ± 0.00 100.00 ± 0.00 100.00 ± 0.00 Brain MRI FID 78.57 ± 4.65 87.76 ± 3.35 93.19 ± 2.56 73.27 ± 4.05 79.39 ± 2.99 80.85 ± 3.92 54.90 ± 2.08 60.20 ± 3.48 58.88 ± 4.83 / / / / / / SSE 84.49 ± 3.95 86.94 ± 3.73 91.50 ± 2.92 84.08 ± 4.81 88.37 ± 3.63 91.96 ± 3.16 61.02 ± 4.63 68.37 ± 1.12 69.98 ± 2.78 50.41 ± 0.50 92.45 ± 13.06 91.58 ± 12.78 50.41 ± 0.50 98.98 ± 0.00 97.98 ± 0.02 Breast Ultrasound FID 83.23 ± 6.53 91.38 ± 1.23 95.45 ± 0.91 84.92 ± 3.29 85.85 ± 3.46 89.62 ± 3.52 57.69 ± 2.48 70.15 ± 4.83 74.41 ± 5.29 / / / / / / SSE 85.23 ± 2.98 89.38 ± 3.31 92.53 ± 2.41 86.46 ± 3.85 87.85 ± 3.01 91.23 ± 2.31 61.23 ± 5.19 68.62 ± 2.45 71.81 ± 2.29 50.62 ± 0.31 96.77 ± 4.92 97.27 ± 2.42 50.62 ± 0.31 99.23 ± 0.00 98.48 ± 0.01 Retina OCT FID 50.73 ± 0.59 97.23 ± 0.05 98.81 ± 0.08 50.19 ± 0.08 92.10 ± 0.22 96.87 ± 0.09 50.09 ± 0.07 69.92 ± 5.23 76.25 ± 6.61 / / / / / / SSE 50.29 ± 0.26 96.74 ± 0.07 98.49 ± 0.10 51.02 ± 0.75 96.45 ± 0.04 98.33 ± 0.07 50.37 ± 0.40 79.33 ± 1.34 86.86 ± 1.47 50.01 ± 0.01 93.90 ± 3.03 98.24 ± 1.51 50.01 ± 0.01 99.95 ± 0.05 99.97 ± 0.00 MEAN 79.89 89.93 91.43 74.31 86.25 88.05 62.03 74.89 78.83 59.52 89.79 87.14 60.94 96.09 93.61 8 authors (the di ff erence between the embedding of orig- inal and reconstructed images is considered). This is justified by the importance of the reconstructed image quality , which has to bring the pixel-wise di ff erence in- formation needed for the business e xperts. This is how- ev er not possible for the Padim and PatchCore methods that do not generate any reconstructed image. For each methods and datasets, the accuracy (for the FID, when applicable, and SSE anomaly scores) under the zero- false-neg ativ e constraint (ZFN columns), and, in a more standard way , without this constraint (A CC columns), as well as the A UCR OC (A UC columns) are reported in T able 2. All the metrics are the mean ± the standard de- viation (both in percent) ov er 5 runs for which random train and test splits were generated. Each dataset splits are the same o ver the methods. The last line giv es the mean of each metrics ov er all the datasets. The anomaly score distributions of the first run for the normal and abnormal test sets of each dataset are sho wn in Figure 4, including the accuracy calculated for the threshold set with the ZFN constraint, or without it. 6. Discussion This section discusses the use of Cycle-GAN, for AD with industrial and medical images. 6.1. Qualitative Discussion W e observe from the qualitati ve results presented in Figure 3 that the anomaly reconstruction strongly de- pends on the nature of the anomaly and the image it- self. Indeed, we can see in the red-framed block (abnor - mal data) that for Hazelnut, W ood and Brain datasets that small cracks, holes or spots are perfectly erased in the reconstructed images, which faithfully highlights the anomalies. These ones present a much higher con- trast with the normal image than other datasets like the Screw or T ile ones, where we still observe the anomalies (small scratches or large cracks) after reconstruction. Unlike the Tile images where the di ff erence is still well highlighted on the pix elwise di ff erence image, the Screw di ff erence images struggle to highlight the anomaly . For this specific dataset, it can be explained, in addition to the lo w contrast observ ed by the anomaly , by the size of the screw in the entire image, which is relativ ely small compared to the objects or textures con- sidered in the other images. The model correctly recon- structs the background and the shadow , being the ma- jority surface of the image, and the scratches could be interpreted as a reflect instead of an anomaly , in this very small surface. Another explanation could be that the latent space is not small enough to av oid overfitting, prev enting to learn e ffi ciently the subtle features in the screw object. It contains too much input information and another architecture could hav e fixed this issue. On the other hand, the T ile cracks ha ve much less probabil- ities to be interpreted as an original feature, because it is visually very di ff erent that the normal features (col- orization, continuity of the texture, etc.). Good reconstruction is also observed for the Breast Ultrasound or Retina OCT datasets, where the object- shaped anomalies ha ve a large abnormal and well con- trasted structure. They are not well erased (specifically for the Breast image) b ut enough attenuated, which pro- vides su ffi cient information in the di ff erence image to detect and localize the anomaly . W e can also observe in the green-framed block (nor- mal data) that the reconstructions (middle images) are more or less identical to the original input (left images), resulting in an almost zero di ff erence image (right im- ages). The model has extracted the features of the nor - mal distrib utions, and is able to restore normal images without changing the pix els v alue, thanks to the identity loss. Pixel areas with high discontinuity , as shown in the Screw images, the Brain MRI or the Breast Ultrasound dataset, do not fully follo w this observ ation, resulting in slight di ff erences in the generated image that disturb the anomaly score, and make it di ffi cult to obtain good separable thresholds, in the quantitativ e step. From the qualitati ve results, it is noticeable that the method can reach the anomaly at the pixel lev el, showing its exact location in the di ff erence images, in spite of the image lev el labeling the dataset only o ff ers. 6.2. Quantitative Discussion For the quantitativ e assessment, we conclude from T able 2 that our method gives an av erage accuracy of 79.89% under the zero false negati ve constraint (ZFN column of CycleGAN-AD-256 method), being at least about 17% better than other methods in this setup. With- out this constraint, it reaches the second position be- hind the PatchCore method, which is also the case for the A UCR OC metric. Notice that even the CycleGAN- AD-64 version is still 12% better in this setting com- pared to others. From this statement, we can explain the good results with the ZFN constraint by its abil- ity to reconstruct pretty small or subtle anomalies, still giving a high anomaly score for these challenging im- ages. In other words, unlike other methods, the lowest anomaly score for the abnormal test set is su ffi ciently distant from the anomaly score population of the nor - mal test set, bringing the threshold so that the false pos- 9 Figure 4: (Color online) Anomaly score distributions of normal (solid-green line and bars) and abnormal (dashed-red line and bars) images for the test datasets, with the threshold value in the ZFN setting (v ertical dashed line in grey) or in the A CC setting (vertical dashed line in black). 10 itiv es are well contained. It means that incorporating abnormal images into the training set, through the cycle consistency framework of Cycle-GAN, helps to better reconstruct the anomalies. This is the case compared to a GAN technique (like Ganomaly) or an embedding technique (like Padim or PatchCore). This observation is particularly interesting for the critical applications in industrial or medical businesses, where this ZFN setup is much more important than the standard accuracy or the A UCR OC metric. In the detail, as expected from the qualitativ e assess- ment, the Hazelnut, Tile, W ood, Brain MRI and Breast Ultrasound datasets explain the good av erage first po- sition of the method, performing better than any other methods. Despite the challenging anomalies that im- ages in T ile or Breast Ultrasound datasets can contain (anomalies not well erased but still su ffi ciently to be highlighted in the di ff erence images), their abnormal images reach a su ffi cient anomaly score to be discrim- inated compared to the normal ones. The Screw and Retina OCT datasets, as for them, reach a quasi 50% ZFN accuracy , yielding a tool with a quasi zero detec- tion capability , ev en if the metrics are satisfying for the Retina OCT when we release this constraint (97.23%). This is typically the sign of the presence of one sam- ple (at least) where the reconstruction struggles, which pushes the ZFN threshold tow ards a low anomaly score, yielding many false positiv es. Howe ver one can notice that these poor results are also seen in the other meth- ods, where Screw gets its best score of 62.71% with Ganomaly , and Retina OCT reaches also a near -random discrimination with almost 50% ZFN accuracy . The comparison between our method in a resolution of 256 × 256 and the same in 64 × 64 shows the impor- tance of getting higher resolution images, specifically for the industrial datasets (the higher resolution images bring better ZFN accuracy). It seems consistent, when we consider small anomalies consisting of a few pixels or low contrast, that the better the definition, the bet- ter the reconstruction and all the processing to find the anomaly score. This statement is not valid for the med- ical datasets, where the method still performs well at a 64 × 64 input image resolution. In contrast to the in- dustrial images (Tile is the best example with a drop of about 30%), the well contrasted anomalies shown in these datasets explain this observation. Another expla- nation is the presence of larger structures observed in the considered medical images. For Hazelnut and T ile in the ZFN setup, we can also conclude that the FID anomaly score improves the ac- curacy by getting rid of the noisy pixel-by-pixel recon- struction described abov e. For the W ood, the Brain MRI and the Breast Ultrasound datasets, it appears that for the ZFN constraint, the anomaly score based on SSE leads to better results. This could come from the fact that the Inception V3 model is not pre-trained on many medical-like images or highly homogeneous textured images, leading in a poor feature extraction. Depend- ing on the domain, the FID or the SSE anomaly score hav e to be chosen adequately . Ho wev er , these scores cannot avoid poor accuracy with datasets like Screw or Retina OCT , where small anomalies are not well cap- tured by the model and the generated images still show the anomalies. 7. Conclusion In this work, we propose and characterize for the first time an approach using Cycle-Generative Adversarial Networks (Cycle-GAN) for Anomaly Detection (AD) on industrial and medical images. This method reaches 79.89% accuracy under a zero f alse negati ve constraint, being about 17% better than other state-of-the-art meth- ods. It exploits the abnormal images at our disposal to refine its representation of normal data, by giving more insights on what is normal or abnormal. Furthermore, thanks to the use of the identity loss, we show that the formalism of Cycle-GAN is naturally well-adapted to perform AD. Particular attention has been gi ven to in- dustrial and medical applications, due to the societal impact it may o ff er , and motiv ated by the lack of stud- ies for such kind of w ork in these areas to date. The proposed method di ff ers from previous work by exploit- ing both normal and abnormal images to learn mappings that can generate ne w matched data from one domain to another , under a c ycle consistenc y constraint. The map- ping of interest for our AD method is the one that can generate normal images. From this perspecti ve, any dif- ferences between the test image and its normal (gener- ated) version can be easily identified. Qualitativ ely , the pixel squared di ff erence image is used to locate abnor - mal areas, and then quantitativ ely , an anomaly score is created to indicate whether the image contains abnormal areas, based on a pre-selected threshold. Ultimately , the method identifies anomalies at the pixel le vel while the labels are initially at the image lev el, i.e., without the requirement for tedious annotation at the pixel le vel. The achieved results demonstrate that, independent of the application, images presenting anomalies with a su ffi cient contrast compared to the pixels composing the object or texture considered tend to benefit from higher domain change mapping than those with a lo w contrast. For low contrasted anomalies, an exception is observed 11 for images of objects with coarse defects where the lo- calization of anomalies always meet expectations, ev en with a di ffi cult reconstruction. Issues still remain for object filling a small part of the entire image and with small defects. W e argue in this work that when both nor - mal and abnormal data are a vailable for training, the use of Cycle-GAN architectures should be considered as an approach by the community , mainly when the anoma- lies are known to be of te xtures or coarse objects. New applications may also be explored for future work, such as object segmentation or object counting for industrial and medical fields using the same type of cycle-consistent models. This work is a first step and a proof-of-concept for Cycle-GAN in AD for industrial and medical domains. Acknowledgment V .D. benefits from the support of the W alloon region with a Ph.D. grant from FRIA (F .R.S.-FNRS). M.E. benefits from the support of the Belgian W alloon re- gion for funding SMAR TSENS project which is part of W in 2 W AL program (agreement 2110108). The present research benefited from computational resources made av ailable on Lucia, the T ier-1 supercomputer of the W alloon Region, infrastructure funded by the W alloon Region under the grant agreement n°1910247. The au- thors thank Charline Dardenne, J ´ er ˆ ome Fink, G ´ eraldin Nanfack and Pierre Poitier for their insightful comments and discussions on this paper . References [1] Abadi, M., Agarwal, A., Barham, P ., Bre vdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow , I., Harp, A., Irving, G., Isard, M., Jia, Y ., Joze- fowicz, R., Kaiser , L., Kudlur , M., Levenber g, J., Man ´ e, D., Monga, R., Moore, S., Murray , D., Olah, C., Schuster , M., Shlens, J., Steiner, B., Sutske ver , I., T alwar , K., Tucker , P ., V anhoucke, V ., V asude van, V ., V i ´ egas, F ., V inyals, O., W ar - den, P ., W attenberg, M., W icke, M., Y u, Y ., Zheng, X., 2015. T ensorFlow: Large-scale machine learning on heterogeneous systems. URL: https://www.tensorflow.org/ . software av ailable from tensorflow .org. [2] Abdallah, A., Maarof, M.A., Zainal, A., 2016. Fraud detection system: A survey . Journal of Network and Computer Applica- tions 68, 90–113. [3] Akcay , S., Atapour-Abarghouei, A., Breckon, T .P ., 2019. GANomaly: Semi-supervised Anomaly Detection via Ad- versarial Training, in: Jawahar , C.V ., Li, H., Mori, G., Schindler , K. (Eds.), Computer V ision – ACCV 2018, Springer International Publishing. pp. 622–637. doi: 10.1007/ 978- 3- 030- 20893- 6_39 . [4] Al-Dhabyani, W ., Gomaa, M., Khaled, H., Fahmy , A., 2020. Dataset of breast ultrasound images. Data in Brief 28, 104863. [5] Bergmann, P ., Fauser , M., Sattlegger, D., Steger , C., 2019. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection, in: CVPR, IEEE. pp. 9592–9600. [6] Bougaham, A., Bibal, A., Linden, I., Frenay , B., 2021. Ganodip- gan anomaly detection through intermediate patches: a pcba manufacturing case, in: LIDT A, PMLR. pp. 104–117. [7] Bougaham, A., El Adoui, M., Linden, I., Fr ´ enay , B., 2023. Com- posite score for anomaly detection in imbalanced real-world in- dustrial dataset. Machine Learning , 1–26. [8] Bradley , A.P ., 1997. The use of the area under the roc curve in the ev aluation of machine learning algorithms. Pattern recogni- tion 30, 1145–1159. [9] Chakrabarty , N., . Brain MRI images for brain tumor detection, v ersion 1. Retrie ved May 9, 2021 from https://www.kaggle.com/navoneel/ brain- mri- images- for- brain- tumor- detection . [10] Defard, T ., Setkov , A., Loesch, A., Audigier , R., 2021. PaDiM: A Patch Distribution Modeling Framew ork for Anomaly De- tection and Localization, in: Del Bimbo, A., Cucchiara, R., Sclaro ff , S., Farinella, G.M., Mei, T ., Bertini, M., Escalante, H.J., V ezzani, R. (Eds.), Pattern Recognition. ICPR Interna- tional W orkshops and Challenges, Springer International Pub- lishing. pp. 475–489. doi: 10.1007/978- 3- 030- 68799- 1_ 35 . [11] Dirvanauskas, D., Maskeli ¯ unas, R., Raudonis, V ., Dama ˇ sevi ˇ cius, R., Scherer, R., 2019. Hemigen: human embryo image generator based on generative adversarial networks. Sensors 19, 3578. [12] Goodfellow , I., Pouget-Abadie, J., Mirza, M., Xu, B., W arde- Farley , D., Ozair , S., Courville, A., Bengio, Y ., 2014. Generativ e adversarial nets, in: NeurIPS. [13] Hammami, M., Friboulet, D., K ´ echichian, R., 2020. Cycle gan- based data augmentation for multi-organ detection in ct images via yolo, in: ICIP , IEEE. pp. 390–393. [14] He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: CVPR, IEEE. pp. 770–778. [15] Heusel, M., Ramsauer, H., Unterthiner, T ., Nessler , B., Hochre- iter , S., 2017. Gans trained by a two time-scale update rule con verge to a local nash equilibrium, in: NeurIPS. [16] Isola, P ., Zhu, J.Y ., Zhou, T ., Efros, A.A., 2017. Image-to-image translation with conditional adversarial networks, in: CVPR, IEEE. pp. 5967–5976. [17] Kerepecky , T ., Liu, J., Ng, X.W ., Piston, D.W ., Kamilov , U.S., 2022. Dual-cycle: Self-supervised dual-vie w flu- orescence microscopy image reconstruction using cycleg an. ArXiv:2209.11729 . [18] Kermany , D.S., Goldbaum, M., Cai, W ., V alentim, C.C., Liang, H., Baxter , S.L., McKeo wn, A., Y ang, G., W u, X., Y an, F ., Dong, J., Prasadha, M.K., Pei, J., Ting, M.Y ., Zhu, J., Li, C., Hewett, S., Dong, J., Ziyar, I., . . . , Zhang, K., 2018. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131.e9. [19] Kiran, B.R., Thomas, D.M., Parakkal, R., 2018. An overvie w of deep learning based methods for unsupervised and semi- supervised anomaly detection in videos. Journal of Imaging 4, 36. [20] Ledig, C., Theis, L., Husz ´ ar , F ., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., T ejani, A., T otz, J., W ang, Z., Shi, W ., 2017. Photo-realistic single image super-resolution using a gen- erativ e adversarial network, in: CVPR, IEEE. pp. 105–114. [21] Li, C., W and, M., 2016. Precomputed real-time texture synthe- sis with markovian generative adversarial networks, in: ECCV , Springer . pp. 702–716. [22] Pandey , S., Singh, P .R., T ian, J., 2020. An image augmenta- tion approach using two-stage generative adversarial network 12 for nuclei image segmentation. Biomedical Signal Processing and Control 57, 101782. [23] Rippel, O., M ¨ uller , M., Merhof, D., 2020. Gan-based defect synthesis for anomaly detection in fabrics, in: ETF A, IEEE. pp. 534–540. [24] Roth, K., Pemula, L., Zepeda, J., Sch ¨ olkopf, B., Brox, T ., Gehler , P ., 2022a. T owards total recall in industrial anomaly detection, in: Proceedings of the IEEE / CVF Conference on Computer V ision and Pattern Recognition (CVPR), pp. 14318– 14328. [25] Roth, K., Pemula, L., Zepeda, J., Sch ¨ olkopf, B., Brox, T ., Gehler , P ., 2022b. T owards total recall in industrial anomaly de- tection, in: Proceedings of the IEEE / CVF Conference on Com- puter V ision and Pattern Recognition, pp. 14318–14328. [26] Sandfort, V ., Y an, K., Pickhardt, P .J., Summers, R.M., 2019. Data augmentation using generativ e adversarial networks (cy- clegan) to improve generalizability in ct segmentation tasks. Scientific reports 9, 1–9. [27] Schlegl, T ., Seeb ¨ ock, P ., W aldstein, S.M., Schmidt-Erfurth, U., Langs, G., 2017. Unsupervised anomaly detection with gener- ativ e adversarial networks to guide marker discovery , in: In- formation Processing in Medical Imaging: 25th International Conference, IPMI 2017, Boone, NC, USA, June 25-30, 2017, Proceedings, Springer . pp. 146–157. [28] Schlegl, T ., Seeb ¨ ock, P ., W aldstein, S.M., Langs, G., Schmidt- Erfurth, U., 2019. f-AnoGAN: Fast unsupervised anomaly de- tection with generative adversarial networks. Medical Image Analysis 54, 30–44. doi: 10.1016/j.media.2019.01.010 . [29] Szegedy , C., V anhoucke, V ., Io ff e, S., Shlens, J., W ojna, Z., 2016. Rethinking the inception architecture for computer vi- sion, in: CVPR, IEEE. pp. 2818–2826. [30] Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chan- drasekhar , V .R., 2019. E ffi cient GAN-Based Anomaly Detec- tion. URL: , doi: 10. 48550/arXiv.1802.06222 . arXiv:1802.06222 [cs, stat]. [31] Zhang, G., Cui, K., Hung, T .Y ., Lu, S., 2021. Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspec- tion, in: W A CV , IEEE. pp. 2523–2533. [32] Zhu, J.Y ., Park, T ., Isola, P ., Efros, A.A., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks, in: ICCV , IEEE. 13

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment