Facial beauty prediction fusing transfer learning and broad learning system

FOCUS Facial beauty prediction fusing transfer learning and broad learning system Junying Gan 1 • Xiaoshan Xie 1 • Yikui Zhai 1 • Guohui He 1 • Chaoyun Mai 1 • Heng Luo 1 Accepted: 26 September 2022 / Published online: 29 November 2022 Ó The Author(s) 2022 Abstract Facial beauty prediction (FBP) is an important and challenging problem in the ﬁe lds of computer vision and machine learning. Not only it is easily prone to overﬁtting due to the lack of large- scale and effective data, but also dif ﬁcult to quickly build robust and effective facial beauty evaluation models because of the variability of facial appearance and the complexity of human perc eption. Transfer Learning can be able to reduce the dependence on large amounts of data as well as avoid overﬁtting problems. Broad learning system (BLS) can be capab le of quickly completing models building and training. For this purpose, Transfer Learning was fused with BLS for FBP in this paper. Firstly, a featur e extractor is constructed by way of CNNs models based on transfer learning for facial feature extraction, in which EfﬁcientNets are used in this paper, and the fused features of facial beaut y extracted are transferred to BLS for FBP, called E-BLS. Secondly, on the basis of E-BLS, a connection layer is desi gned to connect the feature extractor and BLS, called ER-BLS. Finall y, experimental results show that, compared with the previous BLS and CNN s methods existed, the accuracy of FBP was improved by E-BLS and ER-BLS, demonstrating the effectiveness and superiority of the method presented, which can also be wi dely used in pattern recognition, object dete ction and image classi ﬁcation. Keywords Facial beauty prediction  Transfer learning  Broad learning system 1 Introduction Since Plato proposed the concept of aesthetics, research has been conducted in the ﬁelds of philosophy, psychology, and medicine to explore the nature of beauty and the criteria for evaluating it, but there is still no scientiﬁc deﬁnition. Facial beauty prediction (FBP) is a frontier topic in artiﬁcial intelligence of the nature and laws of hu man cognition, which is the stud y of how to make computers have the ability to judge or predict the beauty attractiveness of human faces similar to humans. But there is no clear and universal deﬁnition of facial beauty, which makes auto- matic FBP more challenging. Therefore, research of FBP is scientiﬁcally important for understanding the perception mechanism of human brain and simulating human intelli- gence. Sim ultaneo usly, explori ng how to better interpret, quantify and predict beauty will help people understand and describe beauty more scientiﬁcally and objectively, further promoting the rapid development of related indus- tries, such as makeup eval uation (Wei et al. 2022 ), makeup transfer (Wan et al. 2022 ), personalization recommenda- tion (Lin et al. 2019a , b ) and cosmetic surgery planning (Xie et al. 2015 ). In recent years, scholars have been working hard to explore deep learni ng and use it for FBP. Liu et al. ( 2019 ) proposed a method for understanding facial beauty via deep facial features, in which facial fea- tures are extract ed by LightCNN and facial beauty is Communicated by Oscar Castillo. & Junying Gan junyinggan@163.com Xiaoshan Xie xiaoshanxie.xsx@gmail.com Yikui Zhai yikuizhai@163.com Guohui He ghhe126@126.com Chaoyun Mai maichaoyun@foxmail.com Heng Luo bigboloo@163.com 1 Department of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China 123 Soft Computing (2023) 27:13391–13404 https://doi.org/10.1007/s00500-022-07563-1 (0123456789().,-volV) (0123456789(). ,-volV) evaluated by random forest, revealing the importance of deep features in understanding facial beaut y, and laying the foundation for quantitat ive analysis of FBP. At the same time, adaptive attribute-awar e convolut ional neural net- work of FBP is propos ed (Lin et al. 2019a , b ), which adaptively uses attribute-aware as additional input of model by modulating the ﬁlters of the network. And a pseudo attribut e-aware convolutional neural networ k was proposed also (Lin et al. 2019a , b ), which learned the input pseudo attribute-aware by a lightweight pseudo attribute distiller, and effectively improves the performanc e of FBP. However, FBP still suffers from insufﬁcient ly supervised information and is prone to overﬁtting becau se of the lack of large-scale valid data, so that it is dif ﬁcult to construct an eff ective and robus t beaut y asse ssment model. The training of convolutional neural networks often requires a large amount of data as training samples, while in reality a large number of labeled training samples are lacking and difﬁcult to obta in. How to train a robust net- work when the training sample is insufﬁcient need be solved. Transfer learning is a good optio n. In recent years, transfer learning has attracted extensive attention and research from industry and schol ars. Some work (Agarwal et al. 2021 ; Zhuang et al. 2019 ) reviewed transfer lea rning, explaining the deﬁnition and classiﬁcation of transfer learning in detail. Transfer learning starts by unfreezing the fully connected layers of CNN and uses the frozen part of CNN as a ﬁxed feature extractor for new dataset. At the same time, the model trained on the large dataset is frosted by thawing the shallow layer of CNN , and the deep part of CNN is traine d with n ew dataset to improv e the perf or- mance of the model and prevent overﬁtting. Xu et al. ( 2018 ) ﬁrst proposed transf erring the deep rich features in the pre-tr ained model to the Bayesian ridge regression algorithm for FBP. Our group utilized multiscale CNN, transfer learning and max-feature-ma p as activation func- tion to solve FBP problem, through integrating differ ent scales featur es to get good results (Zhai et al. 2019 ). Although these methods have achieved better resu lts, they rely heavily on high-perfor mance hardware devices and spend a lot of training time. In addition, in order to improve the generalization abili ty and accuracy of models , it needs to be retrained on the new ly added data so that a lot of computer resources and time are wasted. The training of convolutional neural networ ks relie s on high-performance equipment and takes a lot of time. Broad learning system (B LS) was proposed to address these problems (Chen and Liu 2018 ), which is an efﬁcient incremental learning syst em without deep architecture . In recent year s, the emergence of BLS is moving towards establishing more efﬁcient and effective machine lea rning methods (Gong et al. 2021 ). Zhang et al. ( 2019 ) proposed a face recognition method based on BLS with feature block, demonstrating how face reco gnition with the help of BLS is not aff ected by the number of facial features in strong illumination and occlusi on, and hold high accuracy . And a new FBP appro ach was designed by our group based on local feature fusion and BLS, in which the fused featur es of facial beauty are extracted by 2D dimensiona l principal component analysi s, and these featur es are input into BLS for FBP, greatly reducing training time (Zhai et al. 2020 ). Furthermore, a new method was designed for facial expression recognition in human robot interact ion based on enhanced broad Siamese networ k (Li et al. 2021 ), which efﬁciently decr eased consumption of computing time and memory resources. But the accur acy of FBP is much lower than that of methods based on deep convolutional neural networks because of its insufﬁcient feature extraction ability. To solve the probl ems above, we propose a new idea to integrate transfer learning and BLS in this paper, which can improve the training speed of the model and ensure the accuracy of FBP. Firstly, EfﬁcientNets (Tan and Le 2019 ) are used as the backbone network, and all the convolutional layers are congealed, the weights are transferred from ImageNet-1 k, which are applied as feature extractor to extract facial features that would be transf erred to BLS for FBP, calle d E-BLS. Secondly, based on E-B LS, a con- nection layer is designed to connec t the featur e extractor and BLS, called ER-BLS. In the connection layer, facial features wer e performed by globa l average pooling, batch normalization and regulariz ation operations, and were activated by radial basi s functi on (RB F). We implemented extensi ve experiments on SCUT- FBP5500 (Liang et al. 2018 ) database and the Large Scale Asian Female Beauty Dataset (LSAFBD) (Zhai et al. 2016 ) to study the properties of E-BLS and ER-B LS. Experi- mental results show that E-BLS and ER-BLS presented achieve better results and outperform previous BLS methods and CNNs. Meanwhile, our methods were com- pared with the state-of-th e-art related methods on SCUT- FBP5500 and LSAFB D, further prooﬁng the effectiveness and superiority of the methods proposed. The main contributions of this work are presented as follows: 1. We present a new idea to solve the problems of overﬁtting and slow training speed of FBP, by integrating transfer learning and BLS. 2. We instantiate two methods to fuse transfer learning and BLS, i.e., E-BLS and ER-BLS, in which the accuracy and training speed of FBP are better balanced. 3. Compared with BLS and the other methods for FBP, extensive exper imental results demonstrate the superi- ority and effectiveness of the methods proposed, which 13392 J. Gan et al. 123 can also be proverbially applied in pattern recognition, object detect ion and image classiﬁcation. The remaining cont ent is arranged as follows: Sect. 2 outlines the related works and Se ct. 3 describ es the overall schemes. Section 4 analyzes experiments and com pares the performance of the propos ed methods with the other existing methods. Section 5 concludes this work. 2 Related works 2.1 Transfer Learning Currently, most of the databases for FBP are small-scale. Not only it is prone to overﬁtting, but also has slow training speed, whe n CNN s are used direct ly to train FBP models. Transfer learning improves the learning effect of the learners in the target domain by transferring the prior knowledge of the relevan t source domain. It not only enables direct mode l migration by way of trained CNNs to avoid retraining large-scale deep networks, but also improves the stability and general ization ability of network models. In order to consider the correl ation between tasks, our group proposed a multi-task Transfer Learning for FBP (2 M BeautyNet), in which gender recognition was taken as an auxiliary task and FBP was taken as the main task, by way of information sharing betwee n multiple tasks to achieve ﬁne results (Gan et al. 2020a , b ). Before long, Vahdati and Suen ( 2022 ) adopted transfer and multi-task learning for FBP, which used gender recognition and eth- nicity recogniti on as accessorial tasks to improv e the per- formance of FBP. Boug ourzi et al. ( 2022 ) proposed a two- branch architecture (REX-INC EP) based on ResneX t and Inception, by migrating traine d weights on large-scale datasets, combining robus t losses and ensem ble regression to achieve better results on SCUT-FBP55 00. Meanwhi le, Dornaika and Moujahid ( 2022 ) present ed Multi-Similarity Metric Fusion Manifold Embeddin g (MSMFME) for FBP, by migrating the weights trained on a large amount of unlabeled face data. Therefore, the problem of insufﬁci ent supervision information for FBP can be solved. We build our feature extractor via EfﬁcientNets (Tan and Le 2019 ) and transfer its weight s trained on ImageNet- 1k. EfﬁcientNets are a family of models that optimize ﬂoating point operations and parameter efﬁci ency. They optimize computational complexity and para meters by balancing the scalin g multipliers ð d ; r ; w Þ on three dimensions: depth, width, and reso lution. The scaling cri- terion is. depth : d ¼ a k width : w ¼ b k resolution : r ¼ c k s : t : a  b 2  c 2  2 ; a  1 ; b  1 ; c  1 ð 1 Þ where a , b , c are constants determined by a small grid search, which specify how to assign these extra resources of network width , depth, and resolution, respectively. Intuitively, k is a user-speciﬁed coefﬁcient controlling how many reso urces are available for model scaling, and the speed of model operation is proportiona l to d , w 2 , r 2 . Compared with the other deep convolutional neural net- works, EfﬁcientNets have better tradeoff in terms of computing speed and accuracy. 2.2 Broad learning system Training of CNNs requires a lot of time and high-perfor- mance equipment, and a number of works have shown that this problem can be solv ed by way of BLS, a high-speed learning system without deep arch itecture (Chen et al. 2019 ; Zhang et al. 2019 ; Zhai et al. 2020 ; Li et al. 2021 ; Chang and Chun 2022 ). Zhai et al. ( 2020 ) designed a new FBP architect ure via BLS, by combining local featur e fusion and 2D principal component analysis, tra ining time was effect ively reduc ed while maintaining good accuracy. Ranjana et al. ( 2022 ) proposed Broad Learning and Hybrid Transfer Learning System for face mask dete ction, in which good resu lts were obtaine d. BLS contains three essential parts: mapping featur e nodes, enhancement feature nodes and ou tput layer. Above all, images are mapped to feature nodes with random weights. Secondly, feature nodes are mapped to enhance- ment featur e nodes with random weights. Finall y, outpu ts of BLS are computed through mapping feature nodes and enhancement feature nodes. The detailed process of BLS algorithm is as follows. Firstly, the i th ( i = 1,..., n ) group of featur e nodes gen- erated by the featur e map ping / i is obta ined by Z i ¼ / i XW ei þ b ei ðÞ ; i ¼ 1 ; 2 ; ... ; n ð 2 Þ where the weight s W ei and b ei are rando m weights and bias, respectively. Whole feature nodes are denot ed as Z n , ½ Z 1 ; Z 2 ; ... ; Z n  . Secondly, the k th ( k = 1 ,..., m ) group of enhanc ement nodes is generated by the nonlinear activation function n k , that is H k ¼ n k Z n W hk þ b hk ðÞ ; k ¼ 1 ; 2 ; ... ; m ð 3 Þ Facial beauty prediction fusing transfer learning and broad learning system 13393 123 analogously, W hk and b hk are random samples of some certain distributions. Whol e enhancement nodes are indi- cated as H m , ½ H 1 ; H 2 ;  ; H m  . Finally, the output layer of BLS constructs a desired result Y . The consequent equatio n is Y , ½ Z n ; H m  W o ð 4 Þ where W o is the weights of the output layer. It should be noted that W o ¼½ Z n ; H m  þ Y is obtained by the pseudo inverse of matrix ½ Z n ; H m  . An important advant age of BLS is that addi tional featur e nodes and enhancem ent nodes can be dynam ically added to the system, and retraini ng the whole system can be avoi- ded. In incremental BLS, the ( n ? 1)-th feature mapping group nod es are added and expresse d as Z n þ 1 ¼ / n þ 1 XW e n þ 1 þ b e n þ 1  ð 5 Þ where W e n þ 1 and b e n þ 1 are samples of some given distributions. The corresponding ( m ? 1)-th enhancement nodes are represented as H m þ 1 ¼ n m þ 1 Z n þ 1 W hm þ 1 þ b hm þ 1  ð 6 Þ similarly, W hm þ 1 and b hm þ 1 are rando m weights and bias, respectively. Suppose A m n , ½ Z n ; H m  , the updated combined matrix and its pseudo -inverse matrix are given by A m þ 1 n þ 1 ¼½ A m n ; Z n þ 1 ; H m þ 1 ð 7 Þ To update the output weights W m þ 1 n þ 1 from W m n , the dynamic solution could be calcul ated by A m þ 1 n þ 1  þ ¼ ð A m n Þ þ  DB T B T  ð 8 Þ W m þ 1 n þ 1  ¼ A m þ 1 n þ 1  þ Y ¼ ð W m n Þ DB T Y B T Y  ð 9 Þ where the matrices D and B are computed by D ¼ð A m n Þ þ ½ Z n þ 1 ; H m þ 1 ð 10 Þ B T ¼ C þ ; C 6¼ 0 ð 1 þ D T D Þ  1 D T ð A m n Þ þ ; C ¼ 0 ( ð 11 Þ C ¼½ Z n þ 1 ; H m þ 1  AD ð 12 Þ Therefore, BLS is increm entally updated without the retraining of the whol e system, greatly improv ing the learning efﬁciency of the system. 3 Fused strategies 3.1 E-BLS The architecture of E-BLS is shown in Fig. 1 . Among them, the network backbone contains the feature extractor and BLS. The feature extractor is used to extract facial features, which can be implemented by various existing CNN models. And we apply Efﬁci entNets in our experi- ments. BLS is used for FBP model training and testing. The details of E-BL S are expre ssed in Algorit hm 1. Facial images are fed into a feature extractor , which is built on EfﬁcientNets with transf er learning. the fuse d features of facial b eauty are output from the las t convolu- tional layer of EfﬁcientNets by the following formula. 13394 J. Gan et al. 123 M ¼ Swish XW e þ b e ðÞ ð 13 Þ where X represents the input images and Swish repre- sents activation function. W e and b e are the weights and biases of Efﬁcient Nets, respectively. We map M to feature nodes Z n with random weights by formula ( 2 ). And then we map Z n to enhancement nodes H m with rando m weights by formula ( 3 ). Finally, we construct a desired result Y by formula ( 4 )t o assess facial beauty. If training accuracy threshold does not satisfy our expec tation, we expand the feature nodes and enhancement nodes of BLS to improve accuracy and training speed by formulas ( 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 ). 3.2 ER-BLS The architecture of ER-BL S is shown in Fig. 2 , which consists of a feature extractor , connection layer and BLS. We designed a connected layer to connect the feature extractor and BLS. The facial features are processe d by the connected layer and then input to BLS for FBP. In the connected layer, global average pooling, batch normaliza- tion and regularization are perf ormed to facial features, which are activated with RBF. At the same time, the connected layer can heighten the traini ng speed of the model and avoid overﬁt ting. The details of ER-BLS are expressed in Algorithm 2. Similarly, images X are fed into EfﬁcientNets for fea- tures extraction and ou tput facial features M by formula ( 13 ). After processing through the connec ted layer, we output new facial features M  by formula ( 14 ), which are fed into BLS for FBP. M  ¼ e  BN MW r þ b r ðÞ  BN MW r þ b r ðÞ = 2 ½ ﬃﬃ ﬃ 2 p p ð 14 Þ where BN is Batch Nor malization, W r and b r are the weights and biases of connected layer, respective ly. We adopt RBF as activation function. Analogously, we map M  to the feature nodes Z n with random weights by formul a ( 2 ). Th en we map Z n to the enhancement nodes H m with rando m weights by formula ( 3 ). Finally, we compute the output of BLS by formula ( 4 ) to assess facial beauty. If training accuracy threshold does not mee t our expectati on, we extend the featur e no des and enhancement nodes of the model to improve accuracy and training speed with formul as ( 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 ). 4 Experiments and results analysis To stud y the prope rties of E-BL S and ER-BLS, ﬁrstly, the methods proposed in thi s paper are compared with the deep convolutional n eural networ k methods based on Transfer Learning. Secondly, the effectiveness and superiority of E-BLS and ER-BLS have been demonstrated through numerous trials. Finally, our methods are compared with the related methods, respectively. All the experiments were conducted on a Python softw are platform with an Intel-i7 3.6 GHz CPU and 64 GB RAM desktop computer. Facial beauty prediction fusing transfer learning and broad learning system 13395 123 4.1 Experimental object SCUT-FBP5500 [10] SCUT-FBP5500 is a facial beauty prediction database established by South China Univ ersity of Technology. It contains 5,500 frontal face images at the resolution of 350  350 with different races, gender and age. Each image is rated by 60 volunteers and is labeled with a beauty score ranging from 1 to 5. The larger the score, the more attractive. We use mode as criterion, all the images are divided into ﬁve levels ‘ ‘1’ ’, ‘ ‘2’ ’, ‘ ‘3’ ’, ‘ ‘4’ ’ and ‘ ‘5’ ’, which is relevant to ‘ ‘extremely unattractive’ ’, ‘ ‘unattractive ’ ’, ‘ ‘average’ ’, ‘ ‘attractive’ ’ and ‘ ‘extremely attractive’ ’, respectively. Among them, there were 76 images in level ‘ ‘1’ ’, 821 images in level ‘ ‘2’ ’, 3278 images Fig. 2 Architecture of ER-BLS. It consists of a feature extractor, connecting layer and BLS. The feature extractor can be used to extract facial features as the input of a connected layer. In the connection layer, facial features were performed by global average pooling, batch normalization and regularization operations, and were activated by RBF. BLS predicts facial beauty Fig. 1 Architecture of E-BLS. It consists of a feature extractor and BLS. The feature extractor can be used to extract facial features as the input of BLS. And BLS predicts facial beauty 13396 J. Gan et al. 123 in level ‘ ‘3’ ’, 1226 images in level ‘ ‘4’ ’ and 99 images in level ‘ ‘5’ ’. Figur e 3 shows the distribution some beautiful levels of SCUT -FBP5500 and Fig. 4 shows some image samples of SCU T-FBP5500. LSAFBD [10] LSAFBD is a facial beauty predictio n database established by our group, which consists of 20,000 labeled images and 80,000 unlabeled images with the resolution of 144  144. Most facial images include variations in background, pose, and age. Each image is rated by 75 volunteers and all the images were divided into ﬁve levels, labe led as ‘‘0’’, ‘‘1’’, ‘‘2’’, ‘‘3’’ and ‘‘4’’, in increasing order of beaut y. Among them, there were 948 images in level ‘‘0’’, 1148 images in level ‘‘1’’, 3846 images in leve l ‘‘2’’, 2718 images in level ‘‘3’’ and 1333 images in level ‘‘4’’. This paper focuses on the predictio n of female beauty, and only 10,000 LSAFB D female images were used to verify the eff ectiveness of our methods for FBP problems. Figure 5 shows the distribution of some beautiful levels of LSAFB D and Fig. 6 shows some image samples of LSAF BD. 4.2 Model training and testing These two databases are randomly divided into training set and testing set in the ratio of 8: 2 in our experiments, and experiments are conducted with cross-validat ion to ensure reliability. E-BLS and ER-B LS presented contain a very small number of hyperparameter s. For BLS, its hyperpa- rameters include the number of feature windows ( N1 ), the number of nodes in each feature window ( N2 ), and the number of enhancement nodes ( N3 ). In this section, Hyperopt (Bergstra et al. 2022 ) was used to optimize the optimal value of N1 , N2 and N3 . To quantif y the improvement effect of the presented methods, the plain BLS model was trained with the same hyperpara meters. In addition, some classical deep CNNs, such as ResNet50 (He et al. 2016 ), Ince ptionV3 (Szegedy et al. 2016 ), 1.38% 14.93% 59.60% 22.29% 1.80% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 12345 The beautiful levels dist ribution of SCUT-FBP5500 Fig. 3 The distribution of SCUT-FBP5500 with beautiful levels 12345 Fig. 4 Facial images with different properties of SCUT-FBP5500 9.48% 11.48% 38.46% 27.18% 13.40% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 01234 The beautiful levels distribution of LSAFBD Fig. 5 The distribution of LSAFBD with beautiful levels 12345 Fig. 6 Facial images with different properties of LSAFBD Facial beauty prediction fusing transfer learning and broad learning system 13397 123 DensNet121 (Huang et al. 2017 ), InceptionResNetV2 (Szegedy et al. 2017 ), EfﬁcientNetB7 (Tan and Le 2019 ), MobileNetV2 (Sandler et al. 2018 ), NASNet (Zoph et al. 2018 ), and Xcepti on (Chollet 2017 ) based on Trans fer Learning are also trained with the same hyperparameters. The initial learning rate is 0.001. When training accuracy does not improv e for more than 3 epochs, the multiplica- tive factor of learning rate decay is 0.5. The batch sizes and epochs are 16 and 50, respectively. The coefﬁcient of regularization is 0.3 and activation function utilizes linear rectiﬁcation function. Meanwhile, the initial weight of these networks comes from ImageNet -1 k. Because deep CNNs are trained by Transfer Learning, all the layers except the ﬁnal layer were froz en and then only the ﬁnal layer was trained. When im ages are fed into these networ ks for training, the ori ginal resolution is maint ained. The time, accuracy (AC) and Pears on correlation (PC) are utilized to evaluate the performance of our methods, while a short time, a high AC and PC denotes better performanc e. Experimental results are the average of ﬁve tests. 4.3 Experiments on SCUT-FBP5500 In this section, we conducted extensive exper iments on SCUT-FBP5500 to study the properties of E-B LS and ER- BLS. In E-BLS, Hyperopt was used to optimize the hyperparameters: N 1 = 12, N 2 = 54 and N 3 = 2296 for the training of BLS and E-BLS. Experimental results are listed in Table 1 . The prediction accuracy of FBP utilizi ng BLS directly is only 65.85 %, which is the lowest among all the methods. Neverthel ess, testing accuracy of E-BL S is 73.13%, which is 7.28% better than BLS, 0 .43% better than EfﬁcientNetB7, and only 0.03% lower than InceptionV3. Furthermore, training time for deep convolutional neural networks based on Transfer Learning was betwee n 4224.42 s and 2 5,896.82 s, while E-BLS only needs 1399.88 s. There is no doubt that E-BLS improv ed the efﬁciency of FBP among several times and more than a dozen times while maintaining the accuracy of the model. The training loss curve and validation loss curve of these deep CNN s are shown in Figs. 7 and 8 , respective ly. Each network nearly tends to converge after about 50 epochs. When we increase epochs with little performanc e improvement, the training time increas es grea tly. In addi- tion, the loss curve of EfﬁcientNetB7 training and valida- tion is more stable than the other networks. 4.4 Experiments on LSAFBD In this sectio n, we continue to conduc t extensive exper i- ments to evaluate the performance and study the properties of E-BLS and ER-BLS on LSAFBD. In E-B LS and BLS, the hyperparame ters are as follows: N 1 is 25, N 2 is 72 and N 3 is 3088. Experimental results are listed in Table 2 . Testing accur acy of FBP by BLS directly is 52.96%, which is the lowest among all the methods. Nev ertheless, the testing accuracy utilizing E-BLS is 60.82% , which is 9.17% better than BLS, 2.91% better than EfﬁcientNetB7. Its testing accuracy outperformed the convolut ional neural networks. Furthermore, training time for deep convolu- tional neural networ ks based on Trans fer Learning was between 7830.47 s and 36,853.53 s, while E-BLS merely needs 2300.48 s. Th erefore, training efﬁcienc y of FBP with E-BLS is sign iﬁcantly improved. 4.5 Further discussion Improved quantiﬁcation of ER-BLS is liste d in Tables 3 and 4 , respective ly. As we can see, on SCUT-FB P5500, compared with E-BLS, testing accuracy of ER-BLS is improved by 1.56% and training time is reduced by 108.05 s. Furthermore, compared wi th the other algo- rithms, accur acy of ER-BLS is improved betwee n 1.53% and 8.84%, and the training time is shortene d between 2932.59 s and 24,604.99 s. Table 1 Results of FBP on SCUT-FBP5500 Model Training time (s) Testing AC (%) Training AC (%) E-BLS (ours) 1399.88 73.13 75.38 BLS 640.5 65.85 68.83 NASNet 5493.06 68.11 73.92 MobileNetV2 4224.42 69.03 79.3 DensNet121 9862.52 70.77 75.97 ResNet50 8211.02 71.23 76.63 InceptionResNetV2 25,896.82 71.69 74.28 EfﬁcientNetB7 17,589.68 72.70 76.68 Xception 6770.64 72.98 75.01 ER-BLS (ours) 1291.83 74.69 76.76 InceptionV3 11,630.7 73.16 78.02 13398 J. Gan et al. 123 On LSAFBD, compare d wi th E-BLS, testing accuracy of ER-BLS is improv ed by 1.31% and training time is reduced by 13.94 s. Moreover, compared with the other algorithms, accuracy of ER-BLS is improved betwee n 2.20% and 9.17%, and training time is short ened between 5543.93 s and 34,566.99 s. The accuracy and training speed of FBP is signiﬁcantly imp roved with ER-B LS. ER-BLS is able to get ﬁne accuracy with a few parameters. ER-B LS is trained with 12  54 feature nodes and 2966 enhancement nodes. These nodes directly deter- mine the learning effect of ER-BLS. To further explore the performance of ER- BLS, studies with different mapping feature nodes and enhancement nodes were conducted. From Table 5 , we can see that the more the total number of nodes, the longer the training time, but testing accuracy of ER-BLS decreases after the initial improvem ent. This implies that the choice of the numb er of nodes will become a key factor affecting the accur acy of FBP. Therefor e, it is a rel iable scheme to select parameters by Hyperopt. Both the number and distribution of samples may affect the accur acy of ER-BLS. To further investigat e this issue, we modiﬁed the number of training samples on LSAFBD. LSAFBD contains 10,000 images, in which 8000 images are used as training set and 2000 images as testing set. The number of images in testing set is not changed, and the number of images in training set is modiﬁed to 4000 * 8000, as listed in Table 6 . As can be seen from Table 6 , the testing accuracy increas es greatly when the number of training samples increas es. Thus, we believe that the number of training samples has a large inﬂue nce on the accuracy of ER-BLS. 4.6 Ablation study To investigate the property of each component of our networks more clearly, Efﬁci entNets are selected to exe- cute our ablation stud ies. In the experiments, EfﬁcientNets are set as the default backbo ne n etworks of ER-BLS. Firstly, we carry out a series of experiment s on Efﬁ- cientNetB7 that increases the numb er of training epoch and batch siz e gradually. Secondly, we conduct a series of experiments on ER-BLS with transfer learning or not. In the following, we conduct extensive ablation experiments to stud y the properties of our methods. Different training epoch and batch size To explore the effect of different training epoch and batch size for FBP, we conduc t experiments on EfﬁcientNe tB7 for FBP. The comparison results on SCUT-FBP5500 are shown in Figs. 9 and 10 , respectively. Thus, we can draw the fol- lowing concl usions: 1. 1. when training epoch is 50 and batch size is 16, the effect is the best; 2. increasing the number of training epoch and batch size will greatly increase training time, but accuracy has not been improv ed accor dingly. Whether to fuse transfer learnin g We explore the effect of Transfer Learning on ER-BLS. In our experiments, EfﬁcientNets equipped with different scale are imple- mented as backbones. Experiments on SCUT-FBP5500 were performed in this section. As listed in Table 7 , we can see that: 1. ER-BLS with transfer lea rning perform s better than without tra nsfer learning under the same backbo ne; 2. Under transfer learning, ER-B LS based on Efﬁ- cientNetB7 performs better than the other backbones. NASNet MobileNetV2 DensNet121 ResNet50 InceptionResNetV2 EfficientNetB7 Xception InceptionV3 0 5 10 15 20 25 30 35 40 45 50 0.5 0.6 0.7 0.8 0.9 1.0 Training Loss Epoch Fig. 7 Training loss curves for each algorithm 0 5 10 15 20 25 30 35 40 45 50 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 Validation Loss Epoch NASNet MobileNetV2 DensNet121 ResNet50 InceptionResNetV2 EfficientNetB7 Xception InceptionV3 Fig. 8 Validation loss curves of each algorithm Facial beauty prediction fusing transfer learning and broad learning system 13399 123 3. We infer that ER-BLS with transfer learning is superior to that without transfer learning, which suggests that prior knowledge can improve the performance of FBP. 4.7 Incremental learning ER-BLS can complete model construction and trainin g by incremental learning. It is a dynamic system whose per- formance would be improved by adding additional feature nodes, enhancement nodes or new data. At the same time, the structure of ER-BLS would be quickly updated and trained, which grea tly im proves the efﬁcienc y of ER-B LS. To verify the incremental learning effect of the presented Table 2 Results of FBP on LSAFBD Model Training time (s) Testing AC (%) Training AC (%) ER-BLS (ours) 2286.54 62.13 72.34 E-BLS (ours) 2300.48 60.82 71.58 BLS 1446.37 52.96 67.89 NASNet 12,043.95 53.12 60.72 MobileNetV2 7830.47 54.74 66.66 Xception 16,852.60 57.01 62.91 InceptionResNetV2 36,853.53 57.31 60.91 EfﬁcientNetB7 21,165.61 57.91 61.32 ResNet50 20,014.05 58.37 65.10 InceptionV3 25,201.29 58.42 66.47 DensNet121 19,520.28 59.93 63.54 Table 3 Performance improvement of ER-BLS on SCUT-FBP5500 Model Decreased time(s) Improved AC (%) InceptionResNetV2 24,604.99 3.00 EfﬁcientNetB7 16,297.85 1.99 InceptionV3 10,338.87 1.53 DensNet121 8570.69 3.92 ResNet50 6919.19 3.46 Xception 5478.81 1.71 NASNet 4201.23 6.58 MobileNetV2 2932.59 5.66 E-BLS (ours) 108.05 1.56 BLS - 651.33 8.84 Table 4 Performance improvement of ER-BLS on LSAFBD Model Decreased time(s) Improved AC (%) InceptionResNetV2 34,566.99 4.82 InceptionV3 22,914.75 3.71 EfﬁcientNetB7 18,879.07 4.22 ResNet50 17,727.51 3.76 DensNet121 17,233.74 2.20 Xception 14,566.06 5.12 NASNet 9757.41 9.01 MobileNetV2 5543.93 7.39 E-BLS (ours) 13.94 1.31 BLS - 840.17 9.17 Table 5 ER-BLS with different feature nodes and enhancement nodes on SCUT-FBP5500 Feature nodes Enhancement nodes Training time (s) Testing AC (%) 12  54 1000 1292.1 73.77 12  54 2000 1294.65 74.226 12  54 3000 1294.71 74.59 12  54 4000 1295.97 74.317 12  54 5000 1295.95 74.317 12  54 6000 1296.36 73.678 12  54 7000 1295.75 73.133 12  54 8000 1297.60 72.951 14  54 8000 1300.92 72.222 16  54 8000 1303.64 72.86 16  57 8000 1306.09 72.86 16  60 8000 1305.21 72.678 Table 6 ER-BLS with different number of training samples on LSAFBD Training samples Training time (s) Testing AC (%) Training AC (%) 4000 2280.87 51.30 93.45 5000 2282.06 51.40 89.10 6000 2283.64 55.11 81.17 7000 2285.04 58.72 77.19 8000 2286.54 62.13 72.34 13400 J. Gan et al. 123 methods, we conducted exper iments on SCUT-FBP5500 and LSAFBD by adding feature nodes and enhancement nodes. For SCUT-FBP5500, the initial feature nodes are 548 and enhancement nodes are 500. For LSAFBD , ER- BLS was initialized with 1700 feature nodes and 1000 enhancement nodes. Then, the incremental algori thm was adopted to add dynamically 20 feature nodes and 500 enhancement nodes each time. The testing result s of incremental learning are listed in Table s 8 and 9 . For SCUT-FBP5500, ER-BLS is reco nstructed and trained within 12 s. For LSAFBD , ER-BLS is reconstructed and trained within 37 s. Compa red with deep convolutional neural networks that consum e a lot of time for retraining, ER-BLS greatly improved the efﬁciency of the model. 4.8 Comparison with the other methods To further verify the effect iveness of E-BLS and ER-B LS, the methods presented were compared with the related methods, respectively. The results are listed in Tables 10 and 11 . For SCUT-FBP5500, the Pearson correlation of E-BLS is 0.9104 and ER-B LS is 0.9303, which is the best among various methods. For LSAFBD, testing accuracy of E-BLS is 60.82% and ER-BL S is 62.13%, which is the best among various methods. E-B LS and ER-BLS presented can be extended to the ﬁelds of pattern reco gnition, object detection and image classiﬁcation. 5 Conclusion FBP is a challe nge task because of the complexity of human perception, the subjectivit y of human aesthetics, and the diversity of human appea rance. In this paper , we present a new novel scheme, BLS fused transfer learning, for FBP task to improv e its real-ti me performance. Meanwhile, we instantiate two fusing networks E-BLS and ER-BLS. And extensi ve ablati on stud ies conﬁrm the effectiveness and super iority of E-BLS and ER-BLS on two database s. Traditionally, deep networks depend on high-performance PC and take training time of hours . The methods proposed enable the establishmen t of a high accuracy FBP model in a normal PC within 40 min. 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 71.0 71.5 72.0 72.5 73.0 Accuracy(%) Epoch Batch size=16 Batch size=32 Fig. 9 The accuracy of EfﬁcientNetB7 with different training epoch and batch size for FBP on SCUT-FBP5500 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 15000 20000 25000 30000 35000 40000 45000 50000 55000 Time(s) Epoch Batch size=16 Batch size=32 Fig. 10 The training time of EfﬁcientNetB7 with different training epoch and batch size for FBP on SCUT-FBP5500 Table 7 Comparison of ER- BLS with different backbones and Transfer Learning on SCUT-FBP5500 Backbones Transfer learning Training time (s) Testing AC (%) PC EfﬁcientNetB1 No 791.05 61.65 0.7680 Yes 689.09 71.59 0.9085 EfﬁcientNetB3 No 1033.91 63.66 0.7895 Yes 881.34 72.13 0.9127 EfﬁcientNetB5 No 1262.44 62.75 0.8361 Yes 1031.13 73.22 0.9209 EfﬁcientNetB7 No 1426.64 61.75 0.8763 Yes 1291.83 74.69 0.9303 Facial beauty prediction fusing transfer learning and broad learning system 13401 123 In the future, in addi tion to continuing to improve the performance of our networks, we also will consid er psy- chological conclusions about facial beauty and facial attribute-aware such as gender , race and age that inﬂuence facial beauty. Acknowledgements This work was supported in part by the National Natural Science Foundation of China under Grant 61771347, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2019A1515010716 and in part by the Basic Research and Applied Basic Research Key Project in General Colleges and Universities of Guangdong Province under Grant 2018KZDXM073. Funding The authors have not disclosed any funding. Data availability Enquiries about data availability should be directed to the authors. Declarations Conflict of interest All the authors declare that he/she has no conflict of interest. Ethical approval This paper does not contain any studies with human participants or animals performed by any of the authors. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/ . Table 8 Incremental learning results of ER-BLS on SCUT- FBP5500 Feature nodes Enhancement nodes Training time (s) Testing accuracy (%) 548 500 5.18 73.32 548–568 500–1000 8.39 73.50 568–588 1000–1500 9.49 73.68 588–608 1500–2000 10.12 73.59 608–628 2000–2500 11.19 74.14 628–648 2500–3000 11.98 73.86 Table 9 Incremental learning results of ER-BLS on LSAFBD Feature nodes Enhancement nodes Training time (s) Testing AC (%) 1700 1000 19.54 60.62 1700–1720 1000–1500 26.52 61.12 1720–1740 1500–2000 28.49 60.87 1740–1760 2000–2500 30.99 61.17 1760–1780 2500–3000 32.94 61.32 1780–1800 3000–3500 36.45 61.42 Table 10 Performance compariso n with the other methods on SCUT- FBP5500 Model PC P-AaNet (Lin et al. 2019a , b ) 0.8965 2 M BeautyNet (Gan et al. 2020a , b ) 0.8996 AestheticNetG (Danner et al. 2021 ) 0.9011 AaNet (Lin et al. 2019a , b ) 0.9055 E-BLS (ours) 0.9104 MSMFME (Dornaika and Moujahid 2022 ) 0.9113 R 3 CNN (Lin et al. 2019a , b ) 0.9142 REX-INCEP (Bougourzi et al. 2022 ) 0.9165 ER-BLS (ours) 0.9303 Table 11 Performance comparison with the other methods on LSAFBD Model Testing AC (%) Deep cascaded forest (Zhou and Feng 2017 ) 54.29 Multi-scale K-means (Gan et al. 2017 ) 55.07 NIN (Szegedy et al. 2015 ) 58.30 NetA ? DAL (Gan et al. 2019 ) 59.90 DeepID2 (Zhai et al. 2019 ) 60.25 Noise Labels (Gan et al. 2022a , b ) 60.80 E-BLS (ours) 60.82 Cross Network (Gan et al. 2022a , b ) 61.29 LDCNN (Gan et al. 2020a , b ) 62.00 ER-BLS (ours) 62.13 13402 J. Gan et al. 123 References Agarwal N, Sondhi A, Chopra K, Singh G (2021) Transfer learning: Survey and classiﬁcation. Smart Innov Commun and Comput Sci 2021:145–155 Bergstra J, Yamins D, Cox DD (2022) Hyperopt: Distributed asynchronous hyper-parameter optimization. In: Astrophysics source code library, ascl: 2205.008 Bougourzi F, Dornaika F, Taleb-Ahmed A (2022) Deep learning based face beauty prediction via dynamic robust losses and ensemble regression. Knowl-Based Syst 242:108246 Chang P, Chun D (2022) Monitoring multi-domain batch process state based on fuzzy broad learning system. Expert Syst Appl 187:115851 Chen C, Liu Z (2018) Broad learning system: an effective and efﬁcient incremental learning system without the need for deep architecture. IEEE Trans Neural Netw Learn Syst 29:10–24 Chen C, Liu Z, Feng S (2019) Universal approximation capability of broad learning system and its structural variations. IEEE Trans Neural Netw Learn Syst 30:1191–1204 Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1251–1258 Danner M, Weber T, Peng L, Gerlach T, Su X, Ra ¨ tsch M (2021) Ethically aligned deep learning: unbiased facial aesthetic prediction. arXiv preprint arXiv: 211 1.05149 Dornaika F, Moujahid A (2022) Multi-view graph fusion for semi- supervised learning: application to image-based face beauty prediction. Algorithms 15(6):207 Gan J, Zhai Y, Wang B (2017) unconstrained facial beauty prediction based on multi-scale K-means. Chin J Electron 2017:548–556 Gan J, Zhai Y, Huang Y, Zeng J et al (2019) Research of facial beauty prediction based on deep convolutional features using double activation layer. Acta Electonica Sin 47:636–643 Gan J, Jiang K, Tan H, He G (2020b) Facial beauty prediction based on lighted deep convolution neural network with feature extraction strengthened. China. J. Electron 29:312–321 Gan J, Xiang L, Zhai Y, Mai C, He G, Zeng J, Bai Z, Labati R, Piuri V, Scotti F (2020a) 2M Beautynet: facial beauty prediction based on multi-task transfer learning. In: IEEE Access, pp 20245–20256 Gan J, Wu B, Zhai Y, He G, Mai C, Bai Z (2022a) Face beauty prediction with self-correcting noise labels. Chin J Image Graph 27(8) Gan J, Wu B, Zou Q, Zheng, Z, Mai C, Zhai Y, Bai Z (2022b) Application research for fusion model of pseudolabel and cross network. In: computational intelligence and neuroscience Gong X, Zhang T, Chen C, Liu Z (2021) Research review for broad learning system: algorithms, theory, and applications. IEEE Trans Cybern 52:1–29 He K, Zhang X, Ren S, Sun Jian (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 Huang G, Liu Z, Maaten L, Kilian Q, Weinberger (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recog nition (CVPR), pp 4700–4708 Li Y, Zhang T, Chen C (2021) Enhanced broad siamese network for facial emotion recognition in human–robot interaction. IEEE Trans Artif Intell 2:413–423 Liang L, Lin L, Jin L, Xie D, Li M (2018) SCUT-FBP5500: A diverse benchmark dataset for multi-paradigm facial beauty prediction. In: Proc 24th int conf pattern recognit (ICPR), pp 1598–1603 Lin L, Liang L, Jin L (2019a) regression guided by relative ranking using convolutional neural network (R 3 CNN) for facial beauty prediction. IEEE Trans Affect Comput 13:1–14 Lin L, Liang L, Jin L, Chen W (2019b) Attribute-aware convolutional neural networks for facial beauty prediction. In: Proc. 28th int joint conf artif intell, pp 847–853 Liu X, Li Peng T, H, Chuoying Ouyang I, Kim T and Wang R (2019) Understanding beauty via deep facial features. In: CVPR workshops, pp 246–256 Ranjana R, Rao BNK, Nagendra P, Chakravarthy S (2022) Broad learning and hybrid transfer learning system for face mask detection. In: Telematique, pp 182–196 Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4510–4520 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convo- lutions. In: Proceedings of the IEEE confe rence on computer vision and pattern Recognition (CVPR), pp 1–9 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826 Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-V4 inception-ResNet and the impact of residual connections on learning. In: Proc. AAAI, pp 1–3 Tan M, Le Q (2019) EfﬁcientNet: Rethinking model scaling for convolutional neural networks. In: Proc 36th int conf mach learn, pp 6105–6114 Vahdati E, Suen C (2020) Facial beauty prediction using transfer and multi-task learning techniques. In: International conference on pattern recognition and artiﬁcial intelligence, pp 441–452 Wan Z, Chen H, An J, Jiang W, Yao C, Luo J (2022) Facial attribute transformers for precise and robust makeup transfer. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1717–1726 Wei W, Ho ES, McCay KD, Damas ˇ evic ˇ ius R, Maskeliu ¯ nas R (2022) Esposito A (2022) Assessing facial symmetry and attractiveness using augmented reality. Pattern Anal Appl 25(3):635–651 Xie D, Liang L, Jin L, Xu J, Li M (2015) SCUT-FBP: a benchmark dataset for facial beauty perception. In: IEEE international conference on systems, man, and cybernetics, Hong Kong, China, pp 1821–1826 Xu L, Xiang J, Yuan X (2018) Transferring rich deep features for facial beauty prediction. arXiv preprint Zhai Y, Yu C, Qin C, Zhou W, Ke Q, Gan J, Labati RD, Piuri V, Scotti F (2020) Facial beauty prediction via local feature fusion and broad learning system. IEEE Access 8:218444–218457 Zhai Y, Huang Y, Xu Y, Zeng J, Yu F, Gan J (2016) Benchmark of a large scale database for facial beauty prediction. In: Proc int conf intell inf process, pp 131–135 Zhai Y, Cao H, Deng W, Gan J, Piuri V, Zeng J (2019) BeautyNet: joint multiscale CNN and transfer learning method for unco n- strained facial beauty prediction. In: Computational intelligence and neuroscience, pp 1–14 Zhang D, Yang H, Chen P, Li T (2019) A face recognition method based on broad learning of feature block. In: Proc. IEEE 9th annu. int. conf. CYBER technol. automat., control, intell. syst. (CYBER), pp 307–310 Zhou Z, Feng J (2017) Deep forest: towards an alternative to deep neural networks. In: Proceedings of the twenty-sixth interna- tional joint conference on artiﬁcial intelligence. Melbourne, Australia, pp 3553–3559 Facial beauty prediction fusing transfer learning and broad learning system 13403 123 Zhuang F, Qi Z, Duan K et al (2019) A Comprehensive survey on transfer learning. In: Proceedings of the IEEE, pp 43–76 Zoph B, Vasudevan V, Shlens J, Le Q (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pat- tern recogni- tion (CVPR), pp 8697–8710 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. 13404 J. Gan et al. 123

Facial beauty prediction fusing transfer learning and broad learning system

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment