Advanced Unsupervised Learning: A Comprehensive Overview of Multi-View Clustering Techniques

Machine learning techniques face numerous challenges to achieve optimal performance. These include computational constraints, the limitations of single-view learning algorithms and the complexity of processing large datasets from different domains, s…

Authors: Abdelmalik Moujahid, Fadi Dornaika

Advanced Unsuper vised Lear ning: A Compr ehensiv e Over view of Multi-V iew Clustering T echniques Abdelmalik Moujahid 1 and Fadi Dornaika * 2, 3 1 Universidad Internacional de La Rioja (UNIR) , 2 University of the Basque Country , 3 IKERB ASQUE, Basque F oundation for Science , abdelmalik.moujahid@unir.net, fadi.dornaika@ehu.eus Abstract Machine learning techniques face numerous challenges to achie ve optimal performance. These in- clude computational constraints, the limitations of single-view learning algorithms and the complexity of processing large datasets from di ff erent domains, sources or views. In this context, multi-view clustering (MVC), a class of unsupervised multi-view learning, emerges as a powerful approach to overcome these challenges. MVC compensates for the shortcomings of single-view methods and provides a richer data representation and e ff ectiv e solutions for a variety of unsupervised learning tasks. In contrast to traditional single-view approaches, the semantically rich nature of multi-vie w data increases its practical utility de- spite its inherent complexity . This survey makes a threefold contribution: (1) a systematic categorization of multi-view clustering methods into well-defined groups, including co-training, co-regularization, sub- space, deep learning, kernel-based, anchor -based, and graph-based strategies; (2) an in-depth analysis of their respectiv e strengths, weaknesses, and practical challenges, such as scalability and incomplete data; and (3) a forward-looking discussion of emerging trends, interdisciplinary applications, and future direc- tions in MVC research. This study represents an extensiv e workload, encompassing the revie w of over 140 foundational and recent publications, the dev elopment of comparati ve insights on integration strate- gies such as early fusion, late fusion, and joint learning, and the structured in vestigation of practical use cases in the areas of healthcare, multimedia, and social network analysis. By integrating these e ff orts, this work aims to fill existing gaps in MVC research and provide actionable insights for the advancement of the field. Keyw ords— Machine learning, unsupervised learning, Multi-view clustering, data representation, similarity graph, spectral embedding, kernel representation. 1 Intr oduction Multi-view learning focuses on using information from di ff erent sets of features or representations, called views , to improv e learning performance. The basic idea is that di ff erent views of the same data provide unique and complementary insights, so it is beneficial to consider them simultaneously . The field addresses challenges such as dealing with missing or noisy views, ensuring alignment between vie ws, and selecting appropriate integration techniques. In multi-view clustering, the way in which the information from multiple views is inte grated has a significant impact on the clustering results. Depending on when and how the vie ws are combined, the methods for multi-view clustering * Corresponding author 1 can be broadly categorized into three primary integration strategies: early fusion , late fusion and joint learning . The three categories serve as an ov erarching framework for understanding multi-vie w clustering (MVC) approaches, o ff ering a conceptual foundation from which v arious methodologies can be explored. While these categories provide a general ov erview , the subsequent sections of this paper delve into specific techniques and advancements within each category , examining how recent innov ations and refinements have addressed the limitations and enhanced the performance of multi-view clustering methods across a v ariety of applications. This structure aims to provide readers with both a broad understanding and a detailed, in-depth perspectiv e on the e volving landscape of multi-view clustering. The three primary categories are as follo ws: • Early Fusion : Early fusion methods combine the features from multiple views at the input lev el by merging the feature representations from all views into a single, unified feature model [ 1 , 2 ]. This approach allo ws the model to learn from all av ailable data simultaneously , processing the combined information early in the learning process. One of the main adv antages of early fusion is its computational e ffi ciency , as it allows a single model to work with the combined feature set. Howe ver , early fusion assumes that all vie ws are perfectly aligned and equally informativ e, which is often not the case in practice. In real-world applications, the views may be noisy or contain incomplete information, which can negati vely impact the performance of early fusion methods. In addition, this approach cannot handle heterogeneity between vie ws (e.g. di ff erent data modalities) as e ff ectively as other strategies. • Late Fusion : Late fusion approaches take a di ff erent strategy by training separate models for each view inde- pendently and combining their results — such as predictions or cluster assignments — at a later stage. In this way , each view can be processed individually , providing flexibility in dealing with heterogeneous data types or situations where some views might be missing [ 3 , 4 ]. For example, in a clustering problem with multiple views, the individual clustering results from each view could be merged into a final consensus clustering. While late fusion is highly adaptiv e and robust to incomplete or varying data types, it cannot capture complex dependencies and interactions between the di ff erent views. Since each model is trained independently , the relationships between vie ws are not always well utilized, which can lead to suboptimal integration of the data. • Joint Learning : Joint learning techniques aim to simultaneously learn a common representation or task-specific features for all views, often by mapping each vie w into a common latent space. These methods explicitly cap- ture the interactions and complementarities between views during the learning process, promoting an integrated understanding of the data [ 5 , 6 ]. Joint learning approaches can be particularly powerful when the views are complementary , as they allow the model to exploit the full potential of the av ailable information. By learning a common representation or model, these techniques improv e the matching and integration of data from di ff erent views, which can lead to better performance than early or late fusion methods. Howe ver , joint learning is often more computationally intensive and requires more sophisticated optimization techniques and larger computational resources. It can also depend on the choice of model architecture, and the learning process can be more complex and prone to overfitting, especially when the data is noisy or sparse. Each of these strategies has its own strengths and limitations, making them suitable for di ff erent application scenar- ios. For example, early fusion methods are e ff ectiv e when the views are well aligned and hav e similar characteristics, such as in multi-channel image processing. Late fusion is often used in scenarios with heterogeneous data types, such as in bioinformatics, where each view represents a di ff erent biological modality . Joint learning is particularly useful in applications such as sensor networks or multitask learning, where capturing interactions between di ff erent views is crucial. This revie w systematically examines these strategies and provides a comparativ e analysis of the underlying prin- ciples, strengths and limitations of early fusion, late fusion and joint learning methods in multi-view clustering. The goal is to provide researchers with a comprehensiv e understanding of the multi-vie w clustering landscape and help them identify the most appropriate integration strate gy for specific applications. 2 The emerging field of multi-view clustering techniques specifically aims to seamlessly integrate this heterogeneous data landscape and extract meaningful patterns from it. By considering multiple data perspectiv es, these techniques have the potential to outperform traditional single-view clustering approaches and lead to more accurate and robust cluster mappings, as highlighted in studies by Xu et al. (2015) and Kumar et al. (2011) [ 7 , 8 ]. Furthermore, the application of multi-view clustering goes beyond impro ved clustering accuracy . It enables a deeper understanding of complex data by rev ealing hidden structures, relationships and insights that may remain undetected when analyzing single perspectives in isolation, as Cai et al. [ 9 ] emphasize. This synergy between multi-view learning and clustering techniques opens up av enues for improved data analysis and kno wledge extraction in various real-w orld applications. 1.1 Addressing the Importance of Multi-view Clustering Multi-view clustering has become a central approach in the field of unsupervised learning, driv en by the increasing prev alence of heterogeneous and multimodal datasets in modern applications. In contrast to single-view learning meth- ods that rely on a single representation or perspecti ve of the data, multi-vie w clustering utilizes multiple complementary views to pro vide a more holistic understanding of the underlying structures. This paradigm is particularly important for sev eral reasons: 1. Over coming limitations in data repr esentation: Single-vie w methods often fail to capture the variety and richness of data coming from di ff erent sources or modalities. Multi-view clustering addresses this limitation by integrating complementary information from multiple views, resulting in a more robust and comprehensi ve data representation. In biomedical research, for example, combining genomic data (one view) with proteomic data (another vie w) leads to a more comprehensi ve understanding of biological processes and improves the clustering of disease subtypes. 2. Handling heterogeneous datasets: In real-world scenarios, datasets often comprise di ff erent domains, each of which provides unique insights. In multimedia analysis, for example, a dataset may contain text descriptions, audio recordings and visual features of videos. Multi-view clustering processes these heterogeneous modalities to e ff ectively group similar multimedia content, outperforming single-view approaches that only consider one modality at a time. 3. Reducing bias through single-view dependency: Dependence on a single view can introduce bias and limit generalizability , as the selected view may not fully represent the v ariability of the data. Multi-view clustering synthesizes information across multiple views to mitigate this problem. For example, when analyzing social networks, a single-view approach may only consider user interactions (e.g. likes or shares), while multi-view clustering integrates other perspectiv es such as user profiles and content preferences, resulting in more accurate community detection. By systematically addressing these constraints, multi-view clustering pro vides a transformativ e framew ork that meets the needs of modern machine learning tasks. Its ability to fuse complementary information, align di ff erent data modalities, and improv e clustering performance highlights its crucial role in advancing unsupervised learning methods. 1.2 Challenges Multi-view clustering addresses the challenge of heterogeneity in real-world datasets by integrating di ff erent types of information and perspectiv es into a unified framew ork [ 7 ]. By bridging the semantic gap between simple and high- lev el features, it enables the discov ery of meaningful patterns, especially in multimedia data [ 9 ]. Howe ver , despite its advantages, multi-vie w clustering faces several interrelated challenges that need to be overcome for e ff ectiv e application: One ke y challenge in clustering with multiple vie ws is the high-dimensional feature space that results from the inte- gration of multiple vie ws. While additional information improves clustering, it also introduces the "curse of dimension- ality", which can degrade performance and increase computational complexity . Dimensionality reduction techniques, such as principal component analysis (PCA) [ 10 ] and t-SNE [ 11 ], reduce the dimensionality of indi vidual views. Ad- vanced methods such as multi-vie w subspace learning [ 12 ] uncover common latent representations across all vie ws and thus improv e the e ffi ciency of clustering. 3 T o e ffi ciently process large datasets, anchor-based methods provide a scalable approach by selecting a small set of representativ e data points, called anchors, to approximate the original dataset. These methods e ff ectiv ely reduce computational complexity while preserving the structural relationships within the data. Anchor Graph Regularization (A GR) [ 13 ] uses these anchors to construct similarity graphs that enable e ffi cient computations and preserve the integrity of the data structure. In the context of multi-vie w clustering, anchor -based techniques e xtend this principle by identifying anchors that capture common structures across multiple vie ws [ 14 , 15 , 16 , 17 ], which promotes the discovery of common patterns and ensures computational e ffi ciency . Apart from dimensionality , the div ersity and heterogeneity of multi-view datasets require more sophisticated meth- ods to uncov er complex relationships. T ensor factorization models hav e prov en useful for clustering and merging data across modalities [ 18 , 19 , 20 ]. T echniques such as Tuck er decomposition [ 21 ] and Kolda and Bader’ s framew ork [ 22 ] reduce dimensionality while identifying common latent structures. Sequentially T runcated Higher -Order Singular V alue Decomposition (ST -HOSVD) [ 23 ] provides rob ust data reconstruction and clustering, while hybrid methods combining Dynamic Mode Decomposition (DMD) with Graph Laplacian techniques [ 24 ] are promising for large-scale multi-view clustering. These tensor-based approaches complement anchor-based methods and provide powerful solutions to the challenges posed by high-dimensional, heterogeneous data. In addition to the challenges associated with dimensionality and heterogeneity , several other issues need to be addressed for the e ff ectiv e application of multi-view clustering • Model selection and scalability: Choosing an appropriate multi-view learning model and fusion strategy is a non-trivial task as it depends on the specific problem and the dataset. In addition, some multi-view learning algorithms hav e di ffi culty scaling e ff ectiv ely to large datasets, which limits their practical utility [ 25 ]. • Merging clustering results: A key challenge is to merge the clustering results from di ff erent views in a natural and coherent way . Since each vie w provides a di ff erent perspective on the data, integrating these results requires nov el clustering target functions that can capture the di verse information from multiple sources. • Determining the importance of views: Not all views contribute equally to the clustering process. One of the challenges of clustering with multiple views is determining the relative importance of the individual views. This requires mechanisms that weight the views appropriately and ensure that the more informativ e views have a greater influence on the clustering result • Heterogeneity of data with multiple views: In real-world applications, multi-vie w data often exhibits significant heterogeneity in terms of scale, modality and quality . This heterogeneity poses a major challenge in multi-view learning [ 26 ]. In particular , ev aluating the correlation and redundancy between the di ff erent vie ws is crucial, as ov erly correlated views can lead to suboptimal clustering results [ 27 ] • Incomplete Multi-view Clustering (IMC): Addressing the challenge of missing data has led to the emergence of Incomplete Multi-vie w Clustering (IMC). Howe ver , most existing multi-view clustering methods assume com- plete data, which makes them less e ff ecti ve in real-world scenarios where data is often incomplete. Dev eloping methods that can deal with missing data, especially with a high rate of missing views, remains an ongoing chal- lenge [ 28 , 29 ]. T able 1 provides an overvie w of ke y surve ys on multi-view clustering (MVC), summarizing their main focus and contributions. Existing surv eys often focus on specific subfields, applications, or disciplinary boundaries, leading to gaps in coverage and ov erlooking new interdisciplinary approaches. Each of these surveys presents a distinct perspecti ve on MVC techniques, ranging from deep matrix factorizations and generative versus discriminativ e approaches to the cate- gorization of algorithms and their e valuation in real-world applications. The surve ys address various challenges in MVC, such as scalability and noise management, and propose future research directions to adv ance the field. Consequently , there is a compelling need for a systematic and up-to-date survey that bridges these gaps, o ff ering a comprehensive ov erview of multi-view clustering methods and their applications across div erse domains. This table highlights the breadth of approaches and provides a comparati ve look at the state-of-the-art in multi-vie w clustering. 4 Y ear Ref. Title Main Focus Key Contrib utions 2025 [ 30 ] A Survey on Representation Learning for Multi-view Data Focuses on multi-view clustering, self-supervised multi-view clustering and incomplete multi-view clustering. Provides a novel survey on multi-view clustering by orga- nizing existing algorithms into two distinct categories: non- self-supervised and self-supervised multi-view clustering, ad- dressing the gap left by previous surveys that overlooked si- multaneous consideration of both. 2024 [ 31 ] Breaking down multi-vie w clustering: A comprehen- sive review of multi-view approaches for complex data structures Provides a comprehensive classification of Multi- V iew Clustering (MVC) methods, categoriz- ing them into generative and discriminativ e ap- proaches, with a focus on deep learning-based tech- niques. Classifies MVC methods into generative and discriminative categories, emphasizing deep learning’ s role in complex data structures. Provides a systematic comparison and identifies research gaps. 2024 [ 32 ] Multi-modal data clustering using deep learning: A sys- tematic revie w Focuses on introducing a novel taxonomy for deep learning-based multi-modal clustering. Explores CNNs, Autoencoders, RNNs, and GCNs, identifies gaps in multi-modal clustering research, and suggests future research directions. 2023 [ 33 ] A Comprehensive Survey on Multi-V iew Clustering The survey categorizes current MVC approaches into two main technical Mechanisms: heuristic- based multi-view clustering (HMVC) and neural network-based multi-view clustering (NNMVC). Explores key approaches within HMVC, including nonnega- tive matrix factorization, graph learning, and tensor learning, as well as deep representation learning and deep graph learn- ing in NNMVC. 2022 [ 34 ] A Survey on Incomplete Mul- tivie w Clustering Focuses on incomplete multi-view clustering (IMC). IMC is particularly relevant for practical applications such as disease diagnosis, multimedia analysis, and recommendation systems, where in- complete data is common The survey unifies IMC methods under common frame- works, conducts a comparative analysis of representative ap- proaches, and highlights open problems to guide future re- search. 2021 [ 35 ] A survey on deep matrix fac- torizations Focuses on deep matrix factorization techniques and their applications. Explores advanced matrix factorization methods in deep learning, highlights applications in multidimensional data analysis, and identifies future challenges. 2021 [ 36 ] A Survey on Multiview Clus- tering Revie w of multiview clustering methods, propos- ing a novel taxonomy based on generative and dis- criminative approaches. Classifies multiview clustering approaches into generative and discriminative categories; links MVC with representa- tion, ensemble clustering, multi-task learning. 2020 [ 37 ] An overview of recent multi- view clustering Revie w of recent multiview clustering algorithms with an experimental focus. Divides algorithms into three main categories; conducts ex- tensive experiments on seven datasets using accuracy , NMI, and purity metrics; proposes future directions. 2018 [ 38 ] Multi-view Clustering: A Sur- vey Revie w of multiview clustering algorithms, orga- nized according to mechanisms and underlying principles. Proposes a taxonomy in five categories: co-training, multi- kernel learning, graph-based clustering, subspace clustering, and multitask multivie w clustering. T able 1: Summary of existing surveys on multi-vie w clustering methods, highlighting their focus on specific subfields, applications, or methodologies. The table underscores the need for a recent and comprehensive surve y , like ours, to address gaps in interdisciplinary approaches and provide a unified perspective on the ev olving landscape of multi-view clustering techniques and applications. 1.3 Motivation The moti vation for conducting this survey lies in the recognition of multi-view learning as a central and e ff ecti ve ap- proach to overcoming the limitations of single-vie w methods. Multi-view learning not only compensates for these limi- tations, but also provides a more comprehensi ve representation of the data and thus solutions to v arious machine learning challenges. The versatility of multi-view learning makes it an inv aluable tool that can be used in various domains and applications. In contrast to traditional data representations that represent objects from a single view , multi-view data is seman- tically rich, which increases its practical utility despite its higher inherent complexity . In light of these considerations, this surve y aims to provide a systematic overvie w of the main clustering methods documented in the scientific literature for processing multi-view data. Moreover , it contributes to a better understanding of the landscape of multi-view learn- ing methods, their applications and their potential importance in addressing the challenges faced by today’ s machine learning techniques. On the other hand, the lack of comprehensi ve surv ey papers in the literature addressing multi-vie w clustering arises from the div erse methodological landscape, rapid ev olution of techniques, application-specific focus, and the overall growth of the field. Existing surveys [ 35 , 36 , 38 , 37 ] often focus on specific subfields, applications or disciplinary boundaries, leading to gaps in coverage and overlooking new interdisciplinary approaches. Consequently , there is a compelling need for a systematic and up-to-date survey that bridges these gaps, o ff ering a comprehensi ve overvie w of multi-view clustering methods and their applications across di verse domains. 5 1.4 Structure of the Sur vey This re view is systematically organized to provide a comprehensive overvie w of the multi-vie w clustering landscape, structured as follows: 1. Introduction The opening section highlights the gro wing importance of multi-view clustering in modern machine learning and data analytics. It discusses the significance, challenges, and motiv ations dri ving this area of research. 2. Fundamentals of Multi-view Clustering (Section 2 ) This section lays the groundwork by explaining the key principles of multi-view clustering. It presents a taxonomy of multi-view approaches to help contextualize the methods discussed later . 3. Classical Methods for Multi-view Clustering (Section 3 ) W e categorize and explain seminal approaches in multi-view clustering, focusing on classical techniques that ha ve shaped the field. 4. Exploring Graph-based Multi-view Clustering (Section 4 ) W e provide a comprehensiv e ov erview of novel graph-based algorithms tailored to the challenges of multi-view clustering, such as handling missing data, noise reduction, and e ffi cient computation. 5. Multi-view Clustering with Missing or Incomplete Data (Section 5 ) This section addresses the recent chal- lenges and methods for clustering with incomplete data, a common issue in multi-view settings. 6. Formal Review of T ypical Appr oaches (Section 6 ) A rigorous mathematical examination of the typical multi- view clustering approaches, focusing on the formal frame work and methods used. 7. Datasets for Multi-view Clustering (Section 3 ) W e provide an overvie w of key datasets used to ev aluate multi- view clustering methods, detailing common feature extraction techniques and considerations for selecting the optimal number of views. 8. Conclusion This section summarizes the main findings of the re view and reinforces the importance of multi-view clustering in modern machine learning. 9. Future Dir ections (Section 9 ) The final section discusses the potential future developments and research oppor- tunities in multi-view clustering. 2 Fundamentals of Multi-view Clustering Multi-view learning has gained much attention due to its potential to improve the performance of models by leveraging multiple data perspectives. Integrating information from di ff erent vie ws increases prediction accuracy and provides a more comprehensive understanding of the data structure, improving the rob ustness of the clustering process [ 12 ]. By capturing the underlying structures from di ff erent views, multi-view methods can often perform better than single-view approaches [ 8 ]. Furthermore, learning from multiple views is crucial for domain adaptation, i.e. the alignment of data representations from di ff erent sources or domains [ 39 ]. Despite its advantages, multi-view clustering faces several challenges. The heterogeneity of data in terms of scale, modality and quality can make it di ffi cult to e ff ectiv ely merge di ff erent views. In addition, overly correlated views can a ff ect the performance of clustering algorithms and lead to suboptimal results. High-dimensional feature spaces resulting from the integration of multiple vie ws require the use of dimensionality reduction techniques to av oid problems such as the "curse of dimensionality" [ 40 ]. In addition, model selection and scalability remain a challenge, as the choice of an appropriate multi-view learning model depends on the specific problem and the dataset, and some models do not scale well to large datasets [ 41 ]. In the field of multi-view clustering, established methods can be systematically categorized into di ff erent groups, each tailored to a specific strategy for merging information from di ff erent views. W e emphasize that the boundaries between the categories may be blurred, as a gi ven clustering approach may belong to more than one cate gory . 6 2.1 T axonomy of Multi-View Clustering A pproaches Giv en the extensi ve div ersity of multi-view clustering methods and their varying approaches to view integration, our systematic revie w organizes this methodological landscape into a comprehensiv e taxonomy , presented in T able 2 . Below , we categorize these methods into distinct groups based on their underlying principles and techniques. While these categories provide a structured overvie w , it is important to note that certain approaches may ov erlap multiple categories. • Co-training : This method starts by clustering data from a single view and iterativ ely refines the results across other views. It is advantageous because it allows for the enhancement of initial clustering through additional views. Howev er , it requires a strong initial clustering and may struggle when vie ws are highly inconsistent or noisy [ 8 , 42 ]. • Co-regularized multi-view spectral clustering : This technique integrates multi-view learning with spectral clustering by introducing a co-regularization term, harmonizing clusterings across di ff erent views. The benefit is that it can e ff ectively combine the views’ information in an unsupervised manner . Howe ver , the choice of the appropriate regularization term can be challenging and may a ff ect the clustering performance significantly [ 8 ]. • Ker nel-based multi-view clustering : K ernel methods map data into high-dimensional spaces to handle non- linearity , making it easier to cluster data that are not linearly separable. This method is useful for addressing the div ersity in shapes across vie ws. Howe ver , learning optimal kernels and performing dimensionality reduction are computationally expensi ve tasks, and the risk of o verfitting increases as the number of vie ws grows [ 43 , 18 , 44 ]. • Subspace multi-view clustering: Subspace multi-view clustering aims to learn a unified feature representation by assuming that all views share this representation. It can be divided into subspace-based methods [ 45 , 46 ] and matrix factorization approaches [ 47 ], both of which are designed to analyze low-dimensional representa- tions embedded in multiple views. Howe ver , conv entional methods often struggle to capture high-dimensional information from nonlinear subspaces or overlook the high-lev el relationships between fundamental partitions obtained by clustering in a single view . These relationships could bridge the gap between heterogeneous feature spaces and improv e clustering performance. Moreov er , existing multi-view ensemble clustering methods often overlook the noise that arises in the data gen- eration phase. T o ov ercome this challenge, recent approaches have introduced new methods that incorporate regularization techniques and denoising strategies. For example, Zheng et al. [ 48 ] proposed a multivie w sub- space clustering method that combines hypergraph p-Laplacian regularization with low-rank subspace learning. This method was de veloped to capture comple x hierarchical structures in the data while e ff ectively mitigating the e ff ects of noise. By integrating ensemble strategies, noise reduction and weight adaptation, these techniques improve the robust- ness of multi-view clustering and lead to better performance in heterogeneous and noisy environments. Similar approaches that fuse regularization, denoising, and lo w-rank learning in multi-vie w en vironments can be found in recent studies such as [ 49 ], which propose a robust low-rank graph multi-view clustering method that integrates spectral embedding, non-con ve x low-rank approximation and noise handling. • Deep learning based approach : The integration of deep learning with multi-vie w clustering lev erages neural networks to create joint representations, significantly improving clustering performance in datasets with multiple views. Howev er , current models hav e been criticized as shallow , as they directly map multi-view data to low- dimensional space, often neglecting essential nonlinear structure information within each vie w [ 50 ]. T o approximate these limitations, the authors in [ 51 ] propose DeConFCluster, an unsupervised multi-view clus- tering framew ork based on Deep Con volutional Transform Learning (CTL). By eliminating the need for an ad- ditional decoder network during training, DeConFCluster reduces overfitting in data-constrained scenarios, a common drawback of encoder-decoder -based methods. Furthermore, the model incorporates a K-Means-inspired loss function, enhancing representation learning for clustering tasks. The frame work outperforms state-of-the-art multi-view deep clustering techniques on fi ve benchmark datasets, sho wcasing its e ffi cac y in capturing both joint representations and nonlinear structural information. 7 • Graph-based multi-view clustering: Graph-based methods integrate multiple views by constructing a unified graph from individual similarity matrices. The relationship between the data points of each view is represented as a graph, and the fusion of these graphs helps to capture complex interactions between the views [ 52 , 53 ]. This approach is very powerful for managing diverse relationships, but it also brings computational challenges, especially in graph construction and matrix fusion, especially for large datasets [ 54 ]. A ke y problem is the selection of rele vant vie ws for graph creation, as not all views contrib ute equally to cluster- ing. Some views may cause noise or redundanc y , so it is important to identify and prioritize the most informati ve views. Methods such as adaptiv e graph learning and regularization techniques have been proposed to address these issues and optimize both computational e ffi ciency and clustering accurac y [ 52 , 54 ]. • Anchor -based methods: meet the main objectives of multi-view clustering as they provide a scalable approach to processing large data sets. By selecting representativ e samples (anchors) to approximate the entire dataset, scalability challenges are e ff ectiv ely addressed while preserving essential data structures. This approach impro ves scalability by reducing computational complexity and promotes e ffi ciency when processing diverse, large-scale data sources. Anchors also contribute to robustness by capturing important patterns in di ff erent views, ensuring consistent performance e ven in the presence of noise or incomplete data. In addition, they enable flexibility as they can be seamlessly adapted to di ff erent clustering frameworks and methods. Ultimately , anchor-based methods promote e ff ectiv eness by preserving the quality of clustering results while optimizing resource utilization. A notable technique in this area is Anchor Graph Regularization (AGR) [ 13 ], which constructs a similarity graph using the selected anchors and enables e ffi cient computations. In multi-view clustering, these methods are ex- tended by selecting anchors that encapsulate common information across multiple views, which improves com- putational e ffi ciency and data representation. Recent research, including work by [ 14 , 15 , 16 , 17 ], has further explored anchor-based approaches, highlighting their ability to identify common patterns across di ff erent views and provide practical solutions for lar ge-scale clustering tasks. • Other approaches address the challenges of multi-view clustering by tackling specific limitations in traditional methods. For instance, the work [ 55 ] focuses on two significant issues: (1) the inability of hard clustering tech- niques to capture uncertainty between samples and clusters, and (2) the challenge of e ff ectiv e incremental learning when the number of views increases. T o address these, the study introduces a three-way fuzzy spectral clustering algorithm that generates soft clustering results, e ff ectiv ely modeling uncertainty . Furthermore, it incorporates an incremental learning mechanism based on sequential decision-making to handle dynamically increasing views. By combining these advancements, the proposed multi-view clustering algorithm based on sequential three-way decision-making achieves enhanced clustering accuracy and e ffi ciency , as validated through experimental ev alu- ations. 2.2 Practical A pplications of Multi-view Clustering Multi-view clustering has found wide application in a number of areas where the integration of multiple data views enables a more comprehensive understanding of the problem at hand. This section highlights the main areas where multi-view clustering methods ha ve been used e ff ectiv ely: • Healthcare and medical diagnostics: Multivie w clustering methods have proven their e ff ectiveness on various types of data, including medical imaging, multi-omics data and physiological signals such as EEG. In medical imaging, di ff erent imaging modalities (e.g. MRI, CT scans, PET scans) are combined to improve diagnostic accuracy . Similarly , when integrating multi-omics data, consistent clustering patterns (common to all omics lev els) can be identified alongside di ff erential patterns (specific to individual omics types), helping to uncover biologically meaningful correlations and insights into disease mechanisms [ 56 ]. Multi-view clustering has also been successfully applied to EEG signals and enables the classification of brain activity into di ff erent patterns, 8 such as seizures and non-seizures, thus improving diagnostic precision [ 57 ]. In addition, these techniques are used to identify disease subtypes, support personalized treatment planning and predict patient outcomes by integrating multiple clinical and biological data sources [ 58 , 59 ]. [ 56 ]. • Social network analysis: In social networks, multi-view clustering can be used to analyze data from di ff er- ent sources such as social interactions, user behavior and content analysis. By combining multiple views, e.g. interactions between users, posts and metadata, multi-view clustering enables more accurate identification of communities, influencers or trends within the network [ 60 , 61 , 62 ]. • Computer vision and image processing: Multi-view clustering has been widely applied in computer vision, especially in object recognition and scene analysis. These tasks benefit from the ability to integrate informa- tion from multiple views or modalities to improve the accuracy of the models. One notable application is the tracking of pedestrians in crowded environments where targets are temporarily blocked and reappear . T o solve this problem, a nov el multi view clustering method has been developed that improves tracking accurac y by utiliz- ing the correlation of fusion features within a multivie w system. This method is particularly beneficial in cases where pedestrians are occluded or reappear after disappearing briefly [ 63 ]. In object detection and scene analysis, multi-view clustering enhances recognition accuracy by integrating multiple views of the same scene. T ech- niques employing multi-view learning, such as those using deep neural networks, e ff ecti vely capture inter-view relationships, leading to impro ved performance in complex tasks lik e 3D object recognition and scene segmenta- tion. These methods have prov en successful in dynamic environments, o ff ering significant advancements in scene understanding and recognition accuracy [ 64 , 65 ]. • Natural language processing: In natural language processing, multi-view clustering has been successfully ap- plied to document clustering and sentiment analysis. These applications often in volve integrating di ff erent vie ws of text data, such as syntactic and semantic information, to improve the quality and relev ance of clustering results. Multi-view clustering techniques have shown e ff ectiveness in these tasks by capturing diverse features from the text, which helps in grouping documents with similar content or sentiment [ 66 , 57 , 67 , 68 , 69 ]. • Recommendation systems and personalized medicine: Multi-view clustering plays a crucial role in data fu- sion as it enables the integration of di ff erent data views for more informed decision making. In recommendation systems, multi-view clustering methods enable the combination of di ff erent user data sources (e.g. behavior , pref- erences and demographics) to provide personalized recommendations. Multi-view clustering has also been used in personalized medicine to identify disease subtypes by combining di ff erent medical views (e.g. imaging data and genetic information). For example, a multi-view learning approach has been developed to identify imaging- related subtypes in mild cognitive impairment (MCI). The approach uses techniques such as Deep Generalized Canonical Correlation Analysis (DGCCA) to learn low-dimensional correlated embeddings, which significantly advances personalized medicine and medical diagnostics [ 59 ]. The inclusion of these practical applications emphasizes the versatility and impact of multivie w clustering techniques in practice and demonstrates their potential in various domains. This section provides readers with both an ov erview of existing approaches and practical insights into ho w these methods e ff ectiv ely address complex, real-w orld challenges. In the following sections, we revie w di ff erent methodological approaches to multi-view clustering (MVC), each addressing distinct aspects of the problem rather than forming a step-by-step optimization process. Section 3 presents classical MVC methods that serve as the foundation for more advanced techniques. Section 4 focuses on graph-based approaches, which incorporate graph structures to enhance clustering performance by capturing relationships across views. Finally , Section 5 e xplores MVC techniques designed to handle missing or incomplete data, addressing a critical challenge in real-world applications. These three sections provide complementary perspectives on MVC, highlighting di ff erent methodological advancements tailored to specific challenges in multi-vie w learning. 9 T able 2: Compilation of Multi-V iew Clustering Methods De veloped in the Last Fiv e Y ears. Ref. Y ear Methods Multi-view Clustering Strategy Dataset [ 15 ] 2025 ✓ Graph learning ✓ Anchor-based ✓ Bipartite graph • Integrate anchor graph learning and subspace graph construction into a unified opti- mization framew ork based on a bipartite graph. • It enhances clustering by jointly optimizing the projection matrix, consensus anchor matrix, and similarity matrix, ensuring connectivity constraints to form clusters di- rectly . • Image datasets • Multi-view datasets [ 70 ] 2025 ✓ Hypergraph ✓ Non-negati ve matrix factorization ✓ T ensor Schatten p - norm • Reconstructe missing views using a hyper graph, capturing both local structures and higher-order relationships. • It integrates representation learning and clustering into a one-step frame work, a void- ing suboptimal results from two-step approaches. • Incomplete multi-view datasets [ 71 ] 2025 ✓ Multi-lev el graphs ✓ Deep non-negati ve matrix factorization • addresses the challenge of balancing di versity and consistency across multiple views. • It integrates feature learning, multi-level topology representation, and clustering into a unified framew ork. Specifically , it uses deep non-negativ e matrix factorization (DNMF) to learn multi-lev el (hierarchical) representations of objects. • Image datasets • Benchmark datasets [ 72 ] 2024 ✓ Sparse graph learn- ing • Addresses the challenge of allocating contributions of di ff erent vie ws by assigning view-specific weights instead of equal weights. • The method directly obtains cluster indicators by applying low-rank constraints, eliminating the need for post-processing. • Caltech101-07, Caltech101 • UCI-digit, Mfeat, STL10 [ 73 ] 2024 ✓ Graph learning ✓ Bipartite graph ✓ Dynamic adaptation • A learnable graph filter that dynamically refines the original feature space, progres- siv ely filtering out noise and producing a smooth, clustering-friendly representation. • A unified bipartite graph that combines multi-granular structural information from di ff erent views, capturing both distinct and shared features across vie ws. • Multiple benchmark datasets [ 74 ] 2024 ✓ Deep learning • combines the flexibility of deep learning with the statistical benefits of data-driv en and knowledge-dri ven feature selection, providing interpretable results. • It learns nonlinear relationships in multi-vie w data by using deep neural networks to create low-dimensional, view-independent embeddings, while imposing a regu- larization penalty on the reconstructed data. The method uses the normalized Lapla- cian of a graph to model bilateral relationships between v ariables within each view , promoting the selection of related variables. • Holm Breast Cancer , Study , LGG Dataset (grade 2 and 3) • Shear T ransformed, MNIST Dataset Continued on next page 10 T able 2 – Continued from pre vious page Ref. Y ear Methods Multi-view Clustering Strategy Dataset [ 75 ] 2024 ✓ Graph-based ✓ Subspace-based ✓ Kernel-based • Incomplete multi-view subspace clustering based on multiple kernel completion, low-redundant representation learning, and weighted tensor lo w-rank constraint. • The unified objective function combines the intact view-specific subspaces and the hidden low-rank tensor . • BBC-Sport, MSRCv1, • 100Leav es, NGs, • 3Sources, ORL. [ 76 ] 2023 ✓ Graph-based ✓ Subspace-based ✓ Anchor-based • Anchor-based Multi-V ie w Subspace Clustering with Graph Learning. • Integrate anchor learning and the construction of coe ffi cient matrices through a uni- fied optimization procedure that exploits the global and local structure of the samples and the learned anchors. • Caltech101-7, Caltech101-20, • SUN-RGBD, Animal, A wA, • NUSWIDEOBJ, Y outubeFace. [ 77 ] 1 2023 ✓ Graph-based ✓ Anchor-based • High-order multi-view clustering based on graph filtering, intrinsic relationships up to infinity order, adaptiv e graph fusion, and anchors selected by high-order structure . • A CM, DBLP , IMDB, • Amazon Photos / Computers. [ 78 ] 2023 ✓ Graph-based • Integrate adaptiv e weighting, Laplacian embedding (spectral embedding), consensus graph learning and discrete indicator matrix learning into a unified framew ork. • Outputs clustering results directly without the need for post-processing. • Leav es100, COIL100, NGs, • BBCSport, HW , ORL, • Mfeat, ALOI. [ 79 ] 2023 ✓ Graph-based ✓ Kernel-based • Unified single-phase multi-vie w clustering with consensus graph learning and spec- tral representation. Jointly generates similarity graphs of the vie ws and their joint similarity matrix using a unified global objectiv e function. • It takes as input a kernelized representation of the features and directly returns the individual graphs, the joint graph, the joint spectral representation and the cluster assignments. • ORL, COIL20, BBCSport, • MSRCv1, MNIST -25000. [ 80 ] 2023 ✓ Graph-based ✓ Kernel-based • Multi-view clustering via k ernelized graph and nonnegati ve embedding. • It is based on a single global criterion that jointly provides the consistent similarity matrix for all vie ws, the consistent spectral representation, the soft cluster assign- ments and the view weights. • COIL20, ORL, Out-Scene, • MNIST , BBCSport, MSRCv1, • Caltech101-7, Extended-Y ale. Continued on next page 1 See Section 6.4 for the formal definition of the method. 11 T able 2 – Continued from pre vious page Ref. Y ear Methods Multi-view Clustering Strategy Dataset [ 81 ] 2023 ✓ Low-rank tensors • An approach for learning low-rank tensors that can provide a consensus low- dimensional embedding matrix for incomplete multivie w clustering. • It in volv es learning individual low-dimensional embedding matrices from incom- plete multivie w data, utilizing the self-expression property of high-dimensional data. • Reuters, O-Scene, • Handwritten, COIL-20 • ProteinFold, Flo wer17, • SUN-RGBD, • 100leav es, Caltech101 [ 82 ] 2022 ✓ Subspace Dual Clustering • Combines dual-clustering and multivie w subspace learning to simultaneously dis- cov er consensus representation and dual-clustering structure using alternating opti- mization. • A unified framework is dev eloped to jointly explore clustering and subspace learn- ing. • Real-world multi view dual • single-clustering datasets [ 83 ] 2022 ✓ V irtual-label Guided Matrix Factorization (VLMF) • Utilizes graph regularization to capture geometric structure, and a virtual-label guided matrix factorization to recov er and learn consensus latent representations. • The approach integrates clustering and latent representation learning into a joint optimization process. • Incomplete multi-view datasets [ 84 ] 2022 ✓ Double embedding- transfer-based Multi-view Spec- tral Clustering (DETMSC) • Incorporates two types of embeddings: consistency embedding and feature embed- ding. • Knowledge transfer between these embeddings is achiev ed via bipartite graph co- clustering, which improves clustering accuracy by learning both consistency across views and di versity of features. • Robustness to noisy data is enhanced through sparse constraints. • Real-world benchmark datasets [ 85 ] 2022 ✓ Anchor-based Incomplete Multi- view Spectral Clus- tering (AIMSC) • Uses anchor points to connect instances from each view and reco ver missing data. • The similarities between data points are derived from their relationship with anchor points. • The method then applies anchor-based spectral clustering to generate accurate clus- tering results. • Multiple benchmark datasets [ 86 ] 2 2022 ✓ Graph-based ✓ Anchor-based • A unified framework for anchor learning, graph construction and partitioning, while keeping the complexity almost linear . • Through mutual improvement, the model achiev es a more discriminativ e and flexi- ble anchor representation and cluster indicator . • Notting-Hill, • Caltech101-20, • VGGF ace2-50, YTF . Continued on next page 2 See Section 6.7 for the formal definition of the method. 12 T able 2 – Continued from pre vious page Ref. Y ear Methods Multi-view Clustering Strategy Dataset [ 87 ] 2022 ✓ Graph-based ✓ Kernel-based • A robust self-tuning multi-vie w clustering method. • It solves the problems of existing multi-view clustering methods, such as initializa- tion sensiti vity , fixing the number of clusters, and reducing the influence of outliers. • Advertisement, 3source, Flo wers • Caltech101, ImageS, Cornell, • T exas, W ashingdon, Wisconsion. [ 17 ] 2022 ✓ Graph-based ✓ Subspace-based ✓ Anchor-based • Fast parameter -free multivie w subspace clustering with consensus anchor guidance. • A subspace clustering method with linear time complexity , joint anchor selection and graph construction, and parameter-free characteristics for large-scale applica- tions. • Caltech101-20, Caltech101-all, • CCV , SUN-RGBD, NUS-WIDE, • A W A, MNIST , Y outubeFace. [ 88 ] 3 2021 ✓ Graph-based ✓ Anchor-based ✓ Bipartite-graph • A scalable graph learning framew ork that incorporates anchor points and the concept of a bipartite graph. • In contrast to con ventional, a bipartite graph is constructed to illustrate the relation- ship between samples and anchor points. • A connecti vity constraint is used to represent clusters directly by connected compo- nents. • Caltech101-7, • Citeseer , • NUS. [ 89 ] 4 2021 ✓ Graph-based • Multi-view spectral clustering via constrained nonnegati ve embedding, overcoming limitations of traditional spectral clustering by integrating constraints for smoother nonnegati ve embedding and orthogonal columns. • COIL201, ORL, Out-Scene, • BBCSport, Caltech101, MSRCv1, • Extended-Y ale, MNIST -10000. [ 90 ] 2021 ✓ Graph-based ✓ Subspace-based • An approach to similarity merging in multi-view spectral clustering that addresses the challenge of assigning uniform weights to all samples within a view . • It deals with biased or missing elements in incomplete views, and using sparse sub- space clustering to form initial similarity matrices. • BBCSport, ORL, • Still DB, MSRC, • UCI, NUS, 3-Sources. [ 91 ] 2021 ✓ Graph-based ✓ Subspace-based • An approach to merge similarities in multi-vie w spectral clustering. • Addresses the challenge of assigning uniform weights to all samples within a view , manages missing elements in incomplete views, • Uses sparse subspace clustering to form initial similarity matrices. • BBC, • NGs, • W ebKB, • 100leav es. Continued on next page 3 See Section 6.5 for the formal definition of the method. 4 See Section 6.3 for the formal definition of the method. 13 T able 2 – Continued from pre vious page Ref. Y ear Methods Multi-view Clustering Strategy Dataset [ 92 ] 2021 ✓ Graph-based ✓ Subspace-based ✓ Kernel-based • A one-step multi-view spectral clustering method that addresses the problem of in- consistency . • It splits the non-negati ve embedding matrix into two matrices: the joint non-negati ve embedding matrix, which represents the joint cluster structure, and the specific non- negati ve embedding matrix, which represents the specific cluster structure for each view . • COIL20, MSRC-v1, • Caltech101-7, • Caltech101-20, • ORL, 3-Sources. [ 93 ] 2021 ✓ Graph-based ✓ Subspace-based ✓ Graph filtering • A smoothed multivie w subspace clustering that preserves the geometric features of the graph through graph filtering, which simplifies the subsequent clustering pro- cess. • Handwritten, Citeseer • Caltech101-7. [ 94 ] 2021 ✓ Graph-based ✓ Subspace-based • A one-step multivie w subspace clustering method with incomplete views. • It uses low-rank matrix factorization to learn a consensus representation matrix, and then combines it with the objectiv e function of non-negativ e embedding and spectral embedding subspace clustering. • BBCSport, NGs, W ebKB, • BU AA, Orl, Y ale, NUS-WIDE, • Caltech101, CCV . [ 95 ] 2020 ✓ Graph-based ✓ Subspace-based • A subspace learning based multivie w clustering method. • It derives a joint latent representation from the latent subspace rather than from the original data space by linear transformation. • The latent representation has a low-rank structure, which reduces computational complexity . The similarity matrix is then dynamically learned from this latent rep- resentation using manifold learning. • MSRC-v1, • UCI Digits, • NUS-WIDE, • Scene15. [ 96 ] 2020 ✓ Graph-based • An innovati ve multivie w spectral clustering model capable of performing graph fu- sion and spectral clustering simultaneously . • The fusion graph approximates the original graph from each individual view while maintaining a clear and distinct cluster structure. • BBC, Reuters, • Digits, Caltech101-20. [ 97 ] 2020 ✓ Graph-based • The approach can jointly estimate the similarity graph matrix, the unified graph ma- trix, and the final cluster assignment by an innov ativ e multi-vie w fusion technique. • This method imposes a rank constraint on the Laplacian matrix, ensuring the accu- rate deriv ation of clusters. T wo to y data sets • T wo-Moon, • Three-Ring. Continued on next page 14 T able 2 – Continued from pre vious page Ref. Y ear Methods Multi-view Clustering Strategy Dataset [ 98 ] 2020 ✓ Graph-based • A simple and e ffi cient approach to multivie w spectral clustering that aims to learn a sparse similarity matrix that is consistent across all views. • The advantage of the model lies in its ability to directly obtain a closed-form solution without the need for iteration. • The method introduces only one additional parameter, which depends on the way the similarity matrix is constructed and can be practically set to a small value. • Caltech101, • MSRCv1, • NUS-WIDE, • ORL, • 3-Sources. [ 99 ] 2020 ✓ Subspace-based • An approach that learns the joint information to le verage the underlying correlations across multiple views while capturing vie w-specific details to represent specific fea- tures for each independent view . • This is achiev ed without being a ff ected by redundancy or the high dimensionality of the data. • Y ale, MSRCV1, • Caltech101-7, • BBCSport, CMU-PIE. [ 100 ] 5 2019 ✓ Graph-based ✓ Kernel-based • An innovati ve multivie w learning model is presented that is capable of simultane- ously performing a multivie w clustering task and capturing similarity relationships in kernel spaces. • The model autonomously assigns optimal weights to each view without the need for additional parameters. • T exas, Cornell, • W ashington • W inconsin, • BBC, BBCSport, NUS-WIDE. [ 101 ] 2019 ✓ Co-clustering • The method addresses the problem of missing data in multivie w clustering. In con- trast to existing multivie w co-clustering approaches that struggle with incomplete data, especially when di ff erent patterns of missing data are present. • This method uses an indicator matrix. The indicator matrix highlights which data elements are present and clustering performance is measured solely on the observed values. • Clinical data from the heroin treatment study . [ 102 ] 2019 ✓ Graph-based ✓ Consensus learning ✓ Perturbation risk • Addresses clustering of multi-view data with missing instances using spectral pertur- bation theory to construct similarity matrices and learn consensus Laplacian matrix. • 100Leav es, Flowers17, Mfeat, • ORL, 3Sources, BBCSport. Continued on next page 5 See Section 6.1 for the formal definition of the MVCSK method. 15 T able 2 – Continued from pre vious page Ref. Y ear Methods Multi-view Clustering Strategy Dataset [ 103 ] 2019 ✓ Graph-based • A method that estimates the number of clusters by providing cross-view consensus on vie w-specific similarity graphs instead of relying on vie w-specific data represen- tations. • this end, a no vel objectiv e function is used to project the ra w data into a space where the projection accounts for geometric consistency and cluster assignment consis- tency . • Caltech101, • LandUse-21, • Scene-15, • Still-DB. [ 104 ] 2019 ✓ Subspace-based • A correntropy-based method for dealing with noise in multivie w clustering using a view-specific embedding from an information-theoretic perspecti ve. • The objective function uses the Frobenius norm to e ffi ciently estimate the dense connections between points that lie in the same subspace. • UCI Digits, 3-Sources, • Movies617, BBC, • Cora, W ashington. [ 105 ] 2018 ✓ Graph-based • A graph learning framew ork, optimizing initial graphs from di ff erent views using low-rank constraint Laplacian matrix. • It leads to a global unified graph estimation. • UCI Digits, • Caltech101, • Notting-Hill, COIL-20. 16 3 Classical Methods f or Multi-view Clustering Classical machine learning approaches for multi-view learning encompass di verse strategies. The co-training approach, as proposed by Blum et al. [ 106 ], uses multiple data views to improve model performance. Each vie w is trained independently and the information between the views is iteratively exchanged and refined. Figure 1 shows a graphical illustration of this idea. Figure 1: A graphic illustration of the co-training scheme with sev eral views. The matrices { Z 1 , ..., Z m } refer to the di ff erent data matrices corresponding to each view and { L 1 , ..., L m } are the co-trained models. In this scheme, the information obtained from the individual views is systematically refined and iterativ ely exchanged, which promotes joint learning from the di ff erent vie ws. Co-training is particularly beneficial in scenarios where there is little labeled data and allows the e ff ectiv e use of unlabeled data. Howe ver , practical considerations such as selecting features, addressing class imbalances and defining a robust matching criterion for the selection of instances to be labeled are essential. Co-training has applications in various fields, including natural language processing, computer vision and bioinformatics. The co-regularized approach for semi-supervised learning from multiple views presented by Kumar et al ([ 8 ]) pro- vides a powerful technique. This method combines the principles of co-regularization and semi-supervised learning to jointly learn representations for each vie w and make predictions in an unsupervised manner . Co-regularization enforces a constraint that ensures that the representations of the same instance from di ff erent vie ws are close to each other , which promotes robust and discriminati ve representations. Multi-kernel spectral clustering (MKSC) inte grates kernel features from di ff erent vie ws into a unified kernel matrix and uses spectral clustering for e ff ecti ve multi-view clustering [ 107 ]. Canonical correlation analysis (CCA) is another approach that identifies correlated subspaces between views, as described in [ 108 , 59 ]. Alternativ e strategies include the Joint Non-Negati ve Matrix Factorization approach, where data matrices from di ff erent views are factorized into non-negati ve matrices to capture common and view-specific information. Li et al. [ 109 ] demonstrate spectral clustering on the learned representations for multi-view clustering. Multi-V iew Spectral Embedding (MVSE) constructs individual similarity graphs for each view and merges them into a unified graph, which is subjected to spectral embedding for low-dimensional representations and subsequent clustering [ 110 ]. Chen et al. [ 111 ] propose an improved spectral multivie w clustering method that incorporates tissue- like P-systems and optimizes the similarity matrices through a weighted iterative process. The approach combines the K-nearest neighbor algorithm, a weighted fusion operation, and an iterati ve update to obtain non-negati ve embedding matrices and achiev e clustering results. Figure 2 shows the flow of chart of this approach. The method first determines the similarity matrices of each view ( Z ) by the K-nearest neighbor algorithm and then fuses all views into a unified matrix 17 P by a weighted fusion operation. The unified matrix P in turn updates the similarity matrix for each view . Through an iterativ e updating process, the algorithm then takes the updated similarity matrices obtained in the pre vious step as input and combines the spectral clustering algorithm and the symmetric non-ne gati ve matrix factorization algorithm to obtain the non-negati ve embedding matrix M to directly output the clustering results. Recent advances include joint framew orks that incorporate the orthonormality constraint and co-regularization [ 112 ]. This improves class di ff erences and feature scale consistency , which contributes to improved multiview cluster- ing performance. In addition, Hajjar et al. [ 89 ] constraints to maintain consistent smoothing and enforce orthogonality , which improves the rob ustness and stability of clustering results. This method ov ercomes the limitations of con ventional spectral clustering and provides more reliable and accurate results. Figure 2: The flow chart of the Multi-view Spectral Clustering [ 111 ]. First, similarity matrices of the individual vie ws ( Z ) are created using the K-nearest neighbor algorithm. Then, all views are combined into a unified matrix P using a weighted fusion operation. This unified matrix P is used to update the similarity matrix for each view . The algorithm iteratively refines the similarity matrices obtained in the pre vious step. It inte grates the spectral clustering algorithm and the symmetric non-ne gative matrix f actorization algorithm to generate the non-negati ve embedding matrix M , which leads directly to the clustering results. 3.1 K ernel-Based A pproaches Multi-view spectral clustering algorithms are designed to e ff ectiv ely handle data with multiple views, exploiting the complementary information present in each view to improv e the quality of clustering results. In their pursuit of optimal performance, many multikernel-based methods have also been developed with the main goal of obtaining a unified kernel by specifying a predefined kernel for each di ff erent vie w and then combining all kernels linearly or nonlinearly . Kernel functions are an essential part of these methods. They define a measure of similarity or a ffi nity between data points in the transformed feature space. Commonly used kernel functions include the linear kernel, which computes the dot product between data points in the original feature space and thus measures their linear similarity , and the radial basis function (RBF) kernel, which measures similarity based on the Euclidean distance between data points in the 18 transformed space and thus captures nonlinear relationships. Figure 3 gives a graphical illustration of the flow chart of Kernel-based approaches for multi vie w clustering. Figure 3: The flowchart of kernel-based multi-view spectral clustering. For multi-view data, we denote ( Z 1 , Z 2 , ..., Z m ) as the data matrix with m views. Then an indi vidual kernel is constructed for each view so that we hav e ( K 1 , K 2 , ..., K m ). A specific clustering method is then selected to cluster the data based on the unified kernel K . For example, in [ 100 ], the authors discuss two kernel-based graph learning multi-view clustering methods with automatic weights. These two methods are used to map the data into a space where they are linearly separable. The first method uses a single kernel per view , while the second method uses a combination of multiple kernel matrices to improve the utilization of the input kernel matrix. Moreover , these methods simultaneously estimate the unified similarity matrix, the consistent spectral projection matrix, and the weight of each view without additional parameters. Many of these approaches follow a two-step process to achieve clustering results. The initial step in volv es learning the joint a ffi nity matrix, while the second step utilizes a hard clustering method like k-means clustering to obtain the final result. T o address the inconsistency issue arising from the fact that the primary objecti ve of the first step is not optimal clustering performance, a nov el method is introduced. In [ 113 ], the authors present a method called One-step Multi-view Spectral Clustering (OMSC), which integrates the steps of learning the a ffi nity matrix of each vie w and the joint a ffi nity matrix learned from the low-dimensional space of the data, as well as the step of k-means clustering in one framework. The joint a ffi nity matrix is considered as the final clustering assignment. Moreover , the weighting of each view is automatically learned to reduce the impact of noisy views. In [ 114 ], the authors jointly estimate an optimal graph and an appropriate consensus kernel for clustering by forc- ing the global kernel matrix to be a conv ex combination of a set of basis kernels. Their proposed model enforces a regularization of the unified graph and the final kernel matrix. Howe ver , the performance of multikernel-based methods strongly depends on the predefined kernel, including the kernel type (e.g. Gaussian kernel, linear kernel and polynomial kernel) and the corresponding parameters. In addition, most of these methods do not exploit the label space of the data (i.e. the soft mappings of the cluster) and only extract information from the data space. New approaches hav e been dev eloped to utilise the information in the label space. For instance, in [ 115 ], the authors dev elop a new method that incorporates the non-negativ e embedding matrix, which can be used as a cluster indicator matrix to perform the final cluster assignment without further post-processing steps such as k-means or spectral rotation. Unlike other multi-view clustering methods, this method can create a new graph based on the soft cluster labels. Moreover , this method smoothes the cluster-label indices ov er the data graphs and the label graph, which improv es the performance. 19 3.2 Subspace-Based A pproaches Subspace clustering has been extensiv ely researched o ver the last ten years due to its promising performance. Basically , it assumes that the data points come from multiple low-dimensional subspaces, with each cluster fitting into one of these low-dimensional subspaces. Considerable progress has been made in uncovering these underlying subspaces. In this context, se veral innov ativ e methods hav e been proposed to o vercome the challenges associated with multi-view data. In this section, four notable works are presented, each of which presents a di ff erent approach for clustering subspaces from multiple perspectiv es. The work of Zhang et al. [ 116 ] provides a method called Latent Multi-view Subspace Clustering (LMSC). LMSC introduces a unique perspectiv e by clustering data points with latent representations while exploring complementary information from multiple views. Unlike traditional single-vie w subspace clustering, LMSC searches for an underlying latent representation, resulting in more accurate and robust subspace representations. The method is intuitive and is e ffi ciently optimized by the Augmented Lagrangian Multiplier with Alternating Direction Minimization (ALM-ADM) algorithm, which has been confirmed by extensi ve e xperiments on benchmark datasets. Khan et al. [ 117 ], howe ver , addresses a limitation in existing multivie w subspace clustering approaches by con- sidering the structure of self-representation through consistent and specific representations. Based on low-rank sparse representations, the method uncov ers global common representation structures among views and preserves geometric structures based on consistent and specific representations. Other [ 118 ] focuses on integrati ve clustering for high-dimensional data with multiple views based on manifold optimization. T o identify consensus clusters, the algorithm constructs a joint graph Laplacian and optimizes a joint clus- tering target while minimizing the disagreements between individual and joint views. This optimization is alternativ ely performed over k-means and Stiefel manifolds [ 119 ], modeling nonlinearities and di ff erential clusters within individual views. The con ver gence of the algorithm is demonstrated over the manifold, and experimental results on benchmark and multi-omics cancer datasets show its superiority o ver e xisting multi-view clustering approaches. The work of [ 17 ] addresses the e ffi ciency challenges in multi-view subspace clustering by proposing a Fast Parameter - free Multi-view Subspace Clustering with Consensus Anchor Guidance (FPMVS-CAG). This method performs anchor selection and subspace graph construction in a unified optimization formulation, thus promoting clustering quality . FPMVS-CA G has linear time complexity with respect to the number of samples and automatically learns an optimal anchor subspace graph without additional hyperparameters. 3.3 Integration of Deep Learning with Multi-view Clustering Integrating deep learning with multi-view clustering is a modern approach that uses neural networks to create joint representations for di ff erent views of data. This strategy improves clustering performance by learning feature represen- tations that capture both intra-vie w and inter -view relationships. Howev er , despite the growing interest and proliferation of algorithms based on various theories, many existing models are superficial and map multi-view data directly to low- dimensional spaces. T o address the limitation of shallo w models, deep multi-view clustering algorithms hav e emer ged as a promising solution. Du et al. [ 120 ] propose a methodology utilizing multiple autoencoders with a layer-wise approach to cap- ture nonlinear structural information within each view . This model incorporates local in v ariance within a view and integrates consistent, complementary information between views. The algorithm surpasses shallow models by unifying representation learning and clustering in a cohesiv e framew ork, enabling joint optimization of both tasks. In a similar context, [ 74 ] combines the flexibility of deep learning with the statistical benefits of data-dri ven and knowledge-dri ven feature selection, providing interpretable results. It learns nonlinear relationships in multi-vie w data by using deep neural networks to create low-dimensional, vie w-independent embeddings, while imposing a regular- ization penalty on the reconstructed data. The method uses the normalized Laplacian of a graph to model bilateral relationships between variables within each vie w , promoting the selection of related variables. Other approaches [ 121 ] attempt to ov ercome the limitations of traditional deep multi-view subspace clustering ap- proaches by using a decomposed optimization strategy that in volv es three stages: Pre-training with auto-encoders to extract multiscale features, fine-tuning to learn consensus self-e xpressions and generate high-quality pseudo-labels, and 20 re-training with self-label supervision for rob ust clustering. This approach impro ves clustering performance by address- ing issues such as ske w in pipelined methods and complex parameter optimization in end-to-end methods. • Deep subspace approaches: Multivie w subspace clustering (MVSC) uses complementary information from multiple views to achiev e better clustering compared to single-view methods. Ho wev er , it often struggles with high-dimensional and noisy raw data. T o overcome this, the self-guided deep multivie w subspace clustering (SDMSC) model [ 122 ] combines deep feature embedding with subspace analysis to uncover a reliable consensus data a ffi nity relationship across views and embedding spaces. By using raw feature a ffi nities as monitoring signals, SDMSC controls the embedding process itself, reduces the risk of poor local minima, and improv es clustering performance. Similarly , Zhu et al. [ 123 ] address the major limitations of traditional MVSC methods, such as insu ffi cient multivie w integration and lack of end-to-end learning, with the Multivie w Deep Subspace Clustering Network (MvDSCN). This approach uses two sub-networks: Dnet for vie w-specific representations and Unet for joint representations. Deep conv olutional autoencoders are used to construct a multivie w self- representation matrix in an end-to-end method. • Advances in graph neural networks: Recent advancements in graph neural networks (GNNs) have provided new methods for multi-view learning. GNNs hav e garnered significant attention in recent years [ 124 , 125 , 126 ], with their primary idea being the embedding of node representations by capturing and aggregating information from local neighborhoods. This technique has shown great promise, especially in tasks such as social network analysis, recommendation systems, anomaly detection, and the analysis of physiological data [ 127 , 128 , 129 , 130 ]. GNNs integrate information from multiple views, either through graph structures or self-supervised learn- ing, leading to improv ed predictions at both the node and graph lev els, and e ffi ciently utilizing unlabeled data. • Generative models f or multi-view learning: In parallel, generative models such as V ariational Autoencoders (V AEs) and Generative Adversarial Networks (GANs) have been adapted to handle multi-view data [ 131 , 132 , 133 ]. These models enable the generation of data samples from multiple views, o ff ering advantages such as data augmentation and enhanced model robustness. Adversarial multi-view learning techniques, in particular , align representations across di ff erent views while preserving view-specific characteristics, thus addressing the challenges posed by view heterogeneity . For instance, T ang et al. [ 134 ] propose Consistent and Div erse Deep Latent Representations (CDDLR), which integrates K-means with spectral clustering to enforce consistency and div ersity in the latent representations. This approach employs Laplacian-regularized deep neural networks to maintain div ersity and consistency , providing a rob ust solution for multi-view subspace clustering. • Solutions f or deep multi-view subspace clustering: T o address challenges in deep multi-vie w subspace cluster- ing, Shi and Zhao [ 135 ] introduce a no vel solution kno wn as Deep Multi-V iew Clustering based on Reconstructed Self-Expressiv e Matrix (DCRSM). This approach utilizes a reconstruction module to approximate self-expressiv e coe ffi cients and resolves scalability issues by employing a minimal number of training samples. It e ff ecti vely combines common and specific layers, facilitating the fusion of consistent and view-specific information, lead- ing to improved clustering outcomes. Additionally , W ang et al. [ 121 ] propose Decomposed Deep Multi-V iew Subspace Clustering with Self-Labeling Supervision (D2MVSC), which addresses traditional challenges in deep multi-view subspace clustering through a decomposed optimization strategy . This method includes a three-stage training process, adaptiv e fusion, and structure supervision, improving clustering accurac y and robustness. • Further contributions to multi-view learning: W ang et al. [ 136 ] contribute to subspace-based multi-view learning by proposing Multi-view Orthonormalized P artial Least Squares (MvOPLSs). This framew ork incorpo- rates regularization techniques for model parameters, decision values, and latent projected points. By extending the framew ork with nonlinear transformations using deep networks, MvOPLSs achieves superior performance across di verse multi-vie w datasets compared to existing methods. These innov ations mark significant progress in deep multi-view subspace clustering and contrib ute to the ongoing evolution of multi-vie w learning techniques. Recent advancements in multi-vie w learning underscore the growing potential of multi-view data to enhance model performance, rob ustness, and generalization. The choice of algorithm often depends on the specific characteristics of the data and the nature of the problem. Approaches such as deep learning and generative techniques provide comprehensive 21 solutions, enabling the e ff ectiv e handling of complex multi-view data and improving clustering outcomes across a variety of applications. 4 Exploring Graph-Based Multi-V iew Clustering Graph-based multi-vie w clustering, an essential aspect of data analysis, has made significant progress with the introduc- tion of various methods. This section provides an overvie w of the latest approaches developed over the last five years, summarizing k ey contrib utions, including nov el algorithms and methods that address the main challenges. These contri- butions pro vide v aluable insights into the e volving landscape of multi-view clustering, o ff ering significant contrib utions to the broader field of machine learning and data analytics. Researchers working on multi-view clustering will find this compilation valuable for understanding and na vigating these developments. Figure 4: The flowchart of graph-based multi-view clustering, adapted from [ 97 ]. In this scheme, the data matrix of each view is conv erted into a graph matrix, follo wed by applying a fusion method across all vie ws to create a unified graph. • Multi-view graph clustering: The goal is to find a fusion graph across all vie ws and apply graph-cut algorithms (e.g., spectral clustering) on the fusion graph to produce the final clustering result. W e focus on graph-based clustering, where each view’ s data points are represented by a graph. These graphs are combined to facilitate clustering of the entire dataset, utilizing complementary information from di ff erent views to improve cluster mapping and provide a more comprehensi ve understanding of comple x data structures. • Schematic repr esentation of graph-based methods: Figure 4 illustrates the process. Here, the data matrix from each view is con verted into a graph matrix, followed by a fusion method applied to create a unified graph. The fusion process derives the weights for each view automatically using a novel multi-view fusion technique. The unified graph similarity matrix is learned jointly with the individual graph similarity matrices, representing pairwise similarities between data samples. • Limitations and Challenges in Graph-based Multi-view Clustering: Traditional graph-based methods face sev eral challenges that impact their performance. Issues such as sample selection bias, additional clustering steps, and variations in similarity metrics complicate the clustering process [ 97 ]. Furthermore, specific challenges in graph-based methods include: – Post-pr ocessing necessity: Final clustering results often require methods like K-Means or spectral rotation to achiev e consistent spectral embeddings, introducing uncertainties due to initialization. 22 – Parameter selection: Many graph-based models introduce additional parameters, complicating the task, which is inherently unsupervised. – High computational costs: Spectral multi-view clustering methods require eigen value decomposition with a computational complexity of O ( n 3 ), and multi-view subspace clustering requires matrix in version with similar computational complexity , making these methods particularly challenging for large-scale data [ 137 ]. Approaches like ASMV (Adaptiv e Similarity Metric Fusion for Multi-view Clustering) [ 138 ] and GMC (Graph- based Multi-vie w Clustering) [ 97 ] have been proposed to mitigate these limitations and improve the e ff ecti veness of graph-based multi-view clustering. • A novel solution - MCGLSR: In response to these challenges, a new method called Single Phase Multi-view Clustering using Unified Graph Learning and Spectral Repr esentation (MCGLSR) [ 79 ] is proposed. Unlike con- ventional methods that directly integrate similarity matrices (which may introduce noise), MCGLSR generates similarity graphs and a joint similarity matrix using a unified global objective function. This ensures that similar- ity matrices from di ff erent views align e ff ecti vely , mitigating noise and promoting a more coherent data structure. The method outputs individual graphs, a joint graph, a joint spectral representation, and cluster mappings, elimi- nating the need for an external clustering algorithm. • Incomplete data handling: In the next section, we address the challenges of multi-view clustering with in- complete data. [ 75 ] presents a breakthrough solution to the limitations of subspace clustering in the context of incomplete data. This approach uses a multiple kernel completion scheme to detect intact kernels and ensures the learning of complete, low-redundancy representations. By integrating multi-view subspaces with a weighted tensor low-rank constraint, the method explores higher-order relationships between views and assigns appropri- ate weights to each view . The result is a unified model that learns low-redundancy representations, vie w-specific subspaces, and their low-rank tensor structure, significantly improving subspace clustering in incomplete data scenarios. 5 Multi-V iew Clustering With Missing or Incomplete Data The rapid de velopment of technology has led to a massi ve growth in data, with multi-view data o ff ering complementary information from di ff erent sources. This makes multi-view data more informativ e than single-view data. Howe ver, in real-world applications, multi-view data is often incomplete. For example, a video without sound or data points with missing features represent incomplete views. Traditional multi-view clustering techniques cannot handle such incomplete data [ 34 ]. Dealing with incomplete multivie w data has become an important research topic that has attracted much attention in recent years [ 139 , 20 , 140 ]. Several methods ha ve been proposed to solv e this problem: • Filling missing views: A common approach is to fill missing views with zeros or the av erage values of the av ailable instances for that view [ 141 , 142 ]. In addition, weighted non-negati ve matrix factorization with ℓ 2 , 1 regularization was used to improve clustering performance on incomplete views. Howe ver , this method often results in clustered data that lacks meaningful information, as filled values may cluster unrelated samples together . • Consensus-based anchor guidance: A more reasonable approach proposed in [ 17 ] emphasizes the use of av ail- able information in incomplete multivie w data. This method uses consensus-based anchor guidance to achieve faster and more rob ust clustering, especially in image processing applications. • Nonnegative matrix factorization (NMF): In [ 143 ], NMF was used to construct a latent subspace from the av ailable multivie w data, ensuring correct alignment between the corresponding instances across the di ff erent views. In [ 144 ], graph embedding techniques were applied to capture global structure in heterogeneous data with missing values and impro ve clustering of multimodal visual data. • Hypergraph-based approach: T o address the challenges of processing incomplete multi-view data, the authors in [ 70 ] propose a hypergraph-based approach. In contrast to traditional methods that focus on pairwise relations, 23 their approach captures the higher order relations in the data using a hyper graph. This enables a more comprehen- siv e exploration of both local and global data structures. Furthermore, the method combines non-negati ve matrix factorization with orthogonality constraints and K-means clustering, eliminating the need for post-processing. Apart from these approaches, additional techniques such as sparsity constraints, weighted learning, graph learning, local embedding structures, and global embedding structures hav e been integrated into Incomplete Multi-V iew Clustering (IMVC) models to further improv e clustering performance [ 145 , 146 , 147 ]. Despite the success of these techniques, there are still challenges to be addressed. Many models treat the contribu- tions of all views equally , ignoring the varying discriminativ e information each view may provide. Additionally , IMVC methods are prone to misclassification, often resulting in fewer clusters than actually exist. Furthermore, there is a risk of imbalanced clustering, where some clusters are over -partitioned, while others are under-represented, even when the data is balanced. The challenge of clustering multi-view data with missing information remains an activ e area of research. Future work should refine existing methods, explore new techniques, and address specific application areas to enhance the handling of incomplete multi-view data. 6 F ormal Review of T ypical A pproaches Graph-based multi-view clustering (MVC) methods hav e made significant progress in recent years and have overcome important challenges in this area. The main ones are constructing reliable similarity graphs, mastering the computational complexity of large datasets, ensuring consistency between di ff erent views, and optimizing the integration of informa- tion from multiple views. A critical problem with MVC is the construction of the similarity graph for each view , which often requires large amounts of memory and computational resources. Many methods have developed more e ffi cient techniques for con- structing graphs, either by integrating multiple views into a common framework or by using scalable algorithms that reduce computational costs without sacrificing accuracy . Another major challenge is e ff ectiv ely assigning appropriate weights to views, which may have di ff erent importance in di ff erent datasets. Several methods hav e addressed this problem by proposing automatic or adaptive weight assignment mechanisms that eliminate the need for manual tuning and improv e the robustness of the clustering process. Consistency across multiple views is another key concern in MVC. Since di ff erent views may represent di ff erent per- spectiv es or aspects of the same data, it is critical that the clustering results remain consistent across views. Newer approaches address this issue by explicitly modeling both consistent and inconsistent information, impro ving the overall accuracy and reliability of the clustering process. Finally , the integration of various components inv olved in clustering, such as similarity matrices, spectral representa- tions, and soft cluster assignments, has traditionally been handled sequentially . Howe ver , newer methods take a more integrated approach by optimizing these components together in a single framework, thus improving the e ffi ciency and consistency of clustering results. By overcoming these challenges, the methods described in the following sections contribute to scalable, robust and e ff ectiv e multi-view clustering and push the boundaries of what is possible with lar ge, complex datasets. T o begin, we introduce the primary notation used in this section, where matrices are sho wn in bold capital letters and vectors in bold lo wercase letters. Let X ( v ) be a data matrix ( x ( v ) 1 , x ( v ) 2 , ..., x ( v ) n ) ∈ R n × d ( v ) where n is the number of data instances, d v the number of features in view v where v = 1 , ..., V . Given a matrix A , its trace is denoted by T r( A ) and its transpose by A T . A i j is an element of the matrix A . 6.1 A uto-weighted Multi-view Clustering via K ernelized graph lear ning (MVCSK) Graph-based approaches have been widely adopted for multi-view clustering due to their ability to reveal hidden struc- tures within data. Howe ver , a major challenge in these methods is the construction of accurate similarity graphs, which 24 can be influenced by se veral factors, including the scale of the data, neighborhood size, choice of similarity metric, and the presence of noise and outliers. T o overcome these limitations, [ 100 ] propose a novel approach to multi-view clustering that simultaneously performs the clustering task and learns similarity relationships in kernel spaces. The key feature of this method is its ability to construct a similarity graph that can be directly partitioned into c connected components, where c is the number of clus- ters. This ensures that the clustering process is more accurate and aligned with the underlying data structure. One of the most significant innovations of this approach is its automatic weight assignment for each view . Unlike tra- ditional methods, which either require manual assignment of weights or introduce additional parameters, the model in [ 100 ] learns the optimal weight for each view as part of the clustering process. This automatic weighting improves the robustness and adaptability of the clustering method, as it allows the model to adjust to the unique characteristics of the data without requiring prior knowledge of the vie w importance. Additionally , the proposed model incorporates multiple k ernel learning to address the sensiti vity to the input kernel ma- trix. The inclusion of this extension enables the model to better handle nonlinear relationships in the data and ensures that it can capture complex patterns that would otherwise be missed by traditional graph-based clustering methods. The model operates within a joint learning framew ork, which simultaneously solves three subtasks: • Constructing the most accurate similarity graph, • Automatically assigning optimal weights to each view , • Finding the cluster indicator matrix. By solving these subtasks jointly , each task enhances the others, leading to a more robust and e ff ectiv e clustering out- come. Experimental results on benchmark datasets show that the method proposed by [ 100 ] outperforms several state- of-the-art multi-view clustering algorithms, demonstrating its e ff ectiveness in handling complex, incomplete, and noisy multi-view data. The objectiv e function of MVCSK is: min S , P V X v = 1 q T r ( K v − 2 K v S + S T K v S ) + µ || S || 2 2 + λ T r ( P T L P ) s . t . S ≥ 0 , P T P = I , (1) where S is the unified similarity matrix, K v is the kernel matrix of each view , P is the unified spectral projection matrix from which the final clustering is estimated using an extra step. The objecti ve function is designed to achiev e two main goals: (1) to construct an accurate and smooth similarity matrix S that reflects the true relationships between data points, and (2) to learn the optimal projection matrix P that enables the e ff ectiv e clustering of the data. The first term in the objective function minimizes the distance between the kernel matrices of each view and the unified similarity matrix, promoting consistency between the di ff erent views. The second term regularizes S by enforcing its smoothness, while the third term regularizes the projection matrix P by encouraging it to preserve the underlying graph structure as encoded in the Laplacian matrix L . The constraints S ≥ 0 and P T P = I ensure that the similarity matrix is non-negativ e and that the projection matrix is orthonormal, respecti vely . T ogether, these terms and constraints guide the model toward a rob ust and e ff ectiv e clustering solution. 6.2 Consistency-A war e and Inconsistency-A ware Graph-Based Multi-V iew Clus- tering (CI-GMVC) In [ 91 ], the authors propose a nov el graph-based multi-view clustering method called CI-GMVC, which addresses a major limitation of existing approaches, including GMVC. Although con ventional methods use a unified graph matrix for clustering multi-view data, they do not consider the inconsistent parts of the input graph matrices. This omission can lead to suboptimal clustering performance. The CI-GMVC method explicitly separates the consistent and inconsistent 25 parts of the graph matrices and thus improves the robustness and accuracy of the clustering process when analyzing multi-view data. The proposed objecti ve function aims to jointly estimate the consensus graph, the spectral representation of the data, and the consistent graphs in each view . This is giv en by: min A v = 1 ,..., V , U , F , α V X v = 1 α v || U − A v || 2 2 + 2 λ T r  F T L U F  + V X v , w = 1 b vw T r  ( S v − A v )( S w − A w ) T )  subject to U i j ≥ 0 , X j U i j = 1 , F T F = I , S v ≥ A v ≥ 0 (2) where S v is the graph matrix of the v -th vie w (known input), U is the consensus graph, F is the common spectral representation of the data, and A v is the consistent graph matrix of view v . The objectiv e function consists of sev eral ke y terms, each with its own role in the clustering process: • α v || U − A v || 2 2 : This term measures the di ff erence between the consensus graph U and the consistent graph matrix A v for each view v . The parameter α v controls the importance of this term for each view . Minimizing this term encourages the consensus graph to be close to the consistent graph for each individual view , ensuring that the consensus graph captures the consistent parts of the multi-view data. This helps to identify the parts of the data that are reliably shared across all views. • 2 λ T r ( F T L U F ): This term regularizes the common spectral representation F by enforcing smoothness on the graph defined by L U , the graph Laplacian associated with the consensus graph. The parameter λ controls the strength of the regularization, encouraging the spectral representation to preserve the structure of the consensus graph. This helps to capture the global structure of the data and ensures that the learned representation aligns with the consensus information across all views. • P V v , w = 1 b vw T r  ( S v − A v )( S w − A w ) T  : This term captures the pairwise relationships between the consistent graph matrices of di ff erent views. The matrices S v and S w represent the graph matrices for views v and w , and the term ( S v − A v ) represents the inconsistent part of the graph for view v . Minimizing this term encourages the consistent parts of the graph from di ff erent views to be similar to each other . The parameter b vw controls the weight of the relationship between views v and w . This term is critical for handling the inconsistencies in the multi-view data and aligning the parts of the data that are inconsistent across views. This approach includes sev eral constraints to ensure the validity of the graph-based multi-view clustering process. Specifically , the consensus graph matrix U must be non-negativ e, with each row summing to 1. The common spectral representation matrix F is required to be orthonormal, meaning F T F = I , where I is the identity matrix. Additionally , the consistent graph matrix A v for each view must be non-negativ e and less than or equal to the graph matrix S v of the respectiv e view . T ogether , these terms and constraints aim to learn a consensus graph that captures the most consistent structure across the vie ws, while also considering the indi vidual inconsistencies in each view . The model balances between capturing the global structure (via the consensus graph) and maintaining the consistency within each view (via the consistent graphs). By jointly learning the consensus graph, spectral representation, and the consistent parts of the graphs, this approach enables more accurate and robust multi-vie w clustering. 6.3 Constrained Multi-view Spectral Clustering V ia Integrating Nonnegative Em- bedding and Spectral Embedding (CNESE) In [ 89 ], the authors address the limitations of spectral clustering-based methods, which usually require an additional clustering step after performing a non-linear projection of the data, which can lead to poorer clustering results due to factors such as initialization or outliers. T o overcome these challenges, they propose a constrained version of a method called "Constrained Multi-view spectral clustering via integrating Nonnegati ve Embedding and Spectral Embedding" (CNESE) by integrating nonnegati ve embedding and spectral embedding. The model retains the advantages of the original method while incorporating two important constraints: 26 • Ensuring consistent smoothing of the nonnegati ve embedding across all views and, • Imposing an orthogonality constraint on the columns of the nonnegati ve embedding. This approach pro vides the clustering result directly and av oids the need to post-process the clustering and use additional parameters by finding the non-neg ativ e embedding and the spectral embedding matrix simultaneously . The proposed objectiv e function aims to jointly estimate the soft cluster assignment matrix H and the individual spectral projection matrices P v . This function is given by: min P v , H V X v = 1 || S v − H P T v || 2 + λ V X v = 1 q T r  H T L v H  + α T r  ( H T H − I ) T ( H T H − I )  s . t . H ≥ 0 , P T v P v = I . where λ is a regularization parameter , and α is a large positiv e value ensuring the orthogonality of the matrix H (last term). The objectiv e function consists of three main terms: • Factorization term , P V v = 1 || S v − H P T v || 2 , aims to factorize the input graphs S v of each view into the product of the cluster indicator matrix H and the view-specific spectral projection matrix P v . This term ensures that the clustering information contained in each vie w is well represented by the cluster assignment and projection matrices. • Smoothing term , P V v = 1 p T r ( H T L v H ) , introduces a smoothing e ff ect on the cluster indicator matrix H . This term promotes consistency of cluster assignments across views by promoting similarity within clusters across views, where L v is a graph Laplacian matrix representing the local structure of the data. • Orthogonality constraint , α T r (( H T H − I ) T ( H T H − I )) , ensures that the matrix H is orthogonal. The orthogo- nality constraint is controlled by the parameter α , which is a large positiv e value. This term helps maintain the distinctness of the clusters by enforcing that the cluster indicator matrix H forms an orthonormal matrix, meaning that each cluster is represented by a unique vector in the space. Finally , the optimization problem is subject to the following constraints: 1. H ≥ 0, ensuring that the cluster assignment matrix has non-negati ve entries. 2. P T v P v = I , ensuring that the projection matrices for each view are orthonormal. This objectiv e function jointly optimizes the cluster indicator matrix H and the projection matrices P v across all views, taking into account both the consistency within the indi vidual views and the orthogonality of the cluster representations. 6.4 High-Order Multi-V iew Clustering (HMvC) In [ 77 ], the authors propose a nov el approach to multi-view clustering, called High-Order Multi-view Clustering (HMvC), which addresses several challenges in graph-based clustering methods. The key points of this approach can be summa- rized as follows: • graph-based clustering with high-order inf ormation : HMvC incorporates higher-order neighborhood infor- mation to capture complex interactions within the data that are often ov erlooked by conv entional lower-order methods. • Graph filtering for structure encoding : The approach uses graph filtering to encode structural information and enables unified processing of attributed graph data and non-graph data in a single frame work. • Exploring long-distance intrinsic relationships : By utilizing intrinsic relationships up to infinite order, HMvC enriches the learned graph, captures distant connections between data points, and improves the representation of underlying structures. • Adaptive graph fusion mechanism : T o integrate the consistent and complementary information from di ff erent views, the authors propose an adapti ve graph fusion mechanism. This mechanism generates a consensus graph that e ff ectiv ely combines the relev ant information from all views. 27 • Superior performance : Experimental results sho w that HMvC outperforms several state-of-the-art multi-vie w clustering methods, including some deep learning-based techniques, on both non-graph and attributed graph datasets. This approach highlights the importance of high-order relationships and adapti ve fusion mechanisms in achie ving robust and e ff ectiv e multi-view clustering. The objective function is gi ven by: min S v , S ,γ V X v = 1 γ v  || X T v − X T v S v || 2 2 + α || S v − f v ( W v ) || 2 2  + β || S − V X v = 1 γ v S v || 2 2 + µ || S || 2 s.t. V X v = 1 γ v = 1 , γ v ≥ 0 (3) Here the data matrix X v corresponds to a filtered version with k -order filtering based on the adjacency matrix of view v , W v is a normalized similarity matrix based on the cosine similarity , and f ( W v ) = W v + W 2 v + ... + W n v . The objectiv e function consists of four main terms: • Self-repr esentation term , P V v = 1 γ v || X T v − X T v S v || 2 2 , ensures that the graph of each view S v can accurately reconstruct the filtered data matrix X T v . The data matrix X v is derived by filtering the order k based on the adjacency matrix of the view v , which captures the structural information within the vie w . • Higher - order consistency term , P V v = 1 γ v α || S v − f v ( W v ) || 2 2 , includes the higher order neighborhood information in the graph construction. Here, f v ( W v ) = W v + W 2 v + · · · + W n v aggregates the information from multi-order relationships based on the normalized similarity matrix W v . This term ensures that the learned graph S v matches the higher order structural information within each view . • Consensus graph alignment term , β || S − P V v = 1 γ v S v || 2 2 , encourages the consensus graph S to be a weighted com- bination of the view-specific graphs S v . The weights γ v are adaptiv ely learned to reflect the importance of each view and to ensure consistenc y between views. • Regularization term , µ || S || 2 , serves as a regularization of the consensus graph S to promote smoothness and av oid ov erfitting. The constraints P V v = 1 γ v = 1 and γ v ≥ 0 ensure that the weights γ v form a valid con ve x combination that maintains the balance between the contributions of the di ff erent vie ws This formulation e ff ecti vely inte grates self-representation, higher -order relations, and adaptiv e graph fusion, making it robust for multi view clustering tasks across di ff erent datasets. 6.5 Multi-view Structured Graph Lear ning (MSGL) In [ 88 ], the authors propose a scalable graph learning frame work tailored for subspace clustering that addresses three key challenges of existing graph-based methods: high computational cost, the inability to explicitly detect clusters, and the lack of generalizability to unseen data. The main contributions of their work are summarized as follo ws: • Bipartite graph construction : Instead of constructing a full graph for n samples, the proposed framework builds a bipartite graph to model the relationships between data samples and anchor points. This approach significantly reduces computational complexity and impro ves scalability , making the method suitable for large datasets. • cluster interpretability : The method contains a connecti vity constraint that ensures that the connected compo- nents in the bipartite graph correspond directly to the clusters. This eliminates the need for an additional clustering step and the cluster memberships are explicitly displayed. • link to K-means clustering : The authors provide a theoretical link between their learning approach for bipartite graphs and the K-means clustering algorithm, which provides insights into its clustering mechanism and further justifies its e ff ectiv eness. 28 • extension to multi-view data : The framew ork is extended to handle multi-view datasets. This multi-vie w model achiev es linear scalability with respect to n , maintaining computational e ffi ciency while e ff ectiv ely integrating information from multiple views. The work in [ 88 ] represents a significant advancement in graph-based clustering, particularly for large-scale and multi- view datasets, by combining e ffi ciency , explicit cluster discov ery , and the ability to generalize to new data points. The objectiv e function in [ 88 ] is designed to tackle the challenges of scalability , explicit cluster identification, and multi-view data integration in subspace clustering. The function is expressed as: min α, Z , P V X v = 1 α v || X v − A v Z T || 2 2 + λ 1 || Z || 2 2 + λ 2 T r ( P T L P ) + V X v = 1 α γ v s . t . P T P = I , Z ≥ 0 , Z T 1 = 1 , γ < 0 (4) where A v is the matrix of anchors in view v , Z is the consensus graph matrix between data and their anchors. The terms and constraints in this objectiv e function are described below: • Reconstruction term , P V v = 1 α v || X v − A v Z T || 2 2 , models the reconstruction error between the data matrix X v of view v and the product of the anchor matrix A v and the consensus graph matrix Z T . This term ensures that the learned bipartite graph accurately represents the relationship between the data points and their anchors, weighted by the view-specific coe ffi cient α v . • Consensus graph regularization , λ 1 || Z || 2 2 , imposes an ℓ 2 -norm penalty on the consensus graph matrix Z , encour - aging sparsity and regularizing the learned graph. • Smoothness term , λ 2 T r ( P T L P ) , promotes the smoothness of the clustering solution by incorporating the Lapla- cian matrix L . Here, P is the projection matrix encoding the cluster memberships, and this term ensures that points connected in the graph are assigned similar cluster memberships. • W eight r egularization term , P V v = 1 α γ v , regularizes the vie w-specific weights α v , controlling their influence on the clustering process. The hyperparameter γ < 0 promotes a balanced contribution from di ff erent vie ws. The constraints in the objectiv e function serve to ensure the validity and interpretability of the learned matrices. The orthogonality constraint P T P = I guarantees that the projection matrix P encodes unique cluster memberships, where the clusters are represented as distinct, non-ov erlapping components. The non-negativity condition Z ≥ 0 enforces that the consensus graph matrix Z contains only non-negati ve en- tries, which is crucial for interpretability as it represents the relationships between data points and anchor points in a meaningful way . Finally , the normalization constraint Z T 1 = 1 ensures that each row of Z sums to one, e ff ectiv ely treating it as a valid probability distribution. This combination of constraints ensures a robust and meaningful clustering solution. This objectiv e function integrates graph learning and clustering in a unified framework, and ensures scalability , interpretability and robustness in both single-vie w and multi-view subspace clustering scenarios. 6.6 F ast Parameter -Free Multi-view Subspace Clustering with Consensus Anchor Guidance (FPMVS-CA G) In [ 17 ], the authors propose a new method called Fast Parameter-free Multi-view Subspace Clustering with Consensus Anchor Guidance (FPMVS-CAG). This approach addresses the main limitations of existing multi-view subspace clus- tering techniques, especially their cubic time complexity and dependence on heuristic anchor sampling strategies. The main contributions of this method can be summarized as follo ws: • Unified Optimization Framework : FPMVS-CAG integrates anchor selection and subspace graph construction into a single optimization problem. This joint formulation allows the two processes to interact and reinforce each other , improving the quality of clustering. 29 • parameter -free learning : Unlike con ventional methods that require manual tuning of hyperparameters, FPMVS- CA G automatically learns an optimal anchor subspace graph without introducing additional parameters. This eliminates the time-consuming selection of parameters, which improv es the applicability of the method. • scalability : The method achie ves linear time complexity with respect to the number of samples. This makes it particularly suitable for large-scale applications and eliminates the ine ffi ciency of previous methods with cubic complexity . The combination of these properties ensures that FPMVS-CA G is well suited for large-scale multi-vie w clustering tasks while maintaining high accuracy and rob ustness T o achiev e this, the authors proposes to estimate a graph matrix that relate the data to their anchors, optimizing the following objecti ve function: min α, W v = 1 ,..., V , A , Z V X v = 1 α 2 v || X v − W v A Z || 2 2 s . t . W T v W v = I , A T A = I , Z ≥ 0 , Z T 1 = 1 , α T 1 = 1 (5) where X v ∈ R d v × n is the data matrix in view v , the respectiv e projection matrices W v , v = 1 , ..., V target the consensus anchor guidance, A ∈ R d × l is the latent consensus anchor matrix (a learnable dictionary matrix). The number and size of the anchors, d and l are specified beforehand). The objective function aims to minimize the discrepancy between the data matrices and their reconstruction across all views. This term enforces that the data from each view is well represented by the anchor and projection matrices, ensuring consistency between multiple vie ws. The constraints ensure meaningful representations: The projection matrices W v and the anchor matrix A are or- thonormal. The consensus graph matrix Z must be non-ne gati ve ( Z ≥ 0) and its ro ws are normalized such that Z T 1 = 1 , where 1 is a vector of ones. In addition, the weights α v are normalized, where α T 1 = 1. Once the consensus graph matrix Z is obtained, its singular vectors are used as the consensus spectral embedding. The final clustering is achie ved by applying the k -mean algorithm to these embeddings. This framework integrates the reconstruction of data from multiple views, anchor learning and spectral clustering into a unified model. 6.7 E ffi cient Orthogonal Multi-view Subspace Clustering (OMSC) In [ 86 ], the authors propose a method titled E ffi cient Orthogonal Multi-view Subspace Clustering (OMSC), which ad- dresses the challenges of clustering multi-view data in large-scale scenarios. The main contributions can be outlined as follows: • Integrated appr oach : OMSC combines anchor selection, graph construction and clustering in a cohesi ve frame- work. This integration ensures that these components reinforce each other, resulting in more robust and adaptiv e anchor representations and cluster assignments. • Learning Orthogonal Bases : In contrast to traditional approaches that predefine anchors using k -means or uniform sampling, OMSC introduces a mechanism to learn high-quality orthogonal bases within the unified model. This approach increases the algebraic structure and improves the accuracy of clustering. • High scalability : The model achie ves near linear complexity with respect to the size of the dataset, making it suitable for clustering tasks with e xtremely lar ge datasets. This e ffi cienc y results from the joint modeling process and a carefully dev eloped alternativ e optimization strategy . 30 By addressing these critical aspects, OMSC provides a scalable and e ff ective solution for clustering multi-view data in real-world, lar ge-scale applications [ 86 ]. The objecti ve function is gi ven by: min α, W v , A , Z , G , F V X v = 1 α 2 v || X v − W v A Z || 2 2 + λ || Z − G F || 2 2 s . t . W T v W v = I , A T A = I , Z ≥ 0 , Z T 1 = 1 , α T 1 = 1 , G T G = I , F i j ∈ { 0 , 1 } , k X i = 1 F i j = 1 (6) where A ∈ R d × l is the latent consensus anchor matrix, Z is the consensus graph matrix between data and their anchors, G ∈ R l × k is the centroid matrix ( k being the number of clusters) and F ∈ R k × n is an indicator matrix providing the data partition, F i j = 1 if the i-th instance is assigned to the k -th cluster and 0 otherwise. • The first term, P V v = 1 α 2 v || X v − W v A Z || 2 2 , minimizes the reconstruction error for each view v . Where X v is the data matrix for the view v , W v is the projection matrix, A is the latent consensus anchor matrix and Z is the consensus graph matrix. The weights α 2 v balance the contribution of the indi vidual views. • The second term, λ || Z − G F || 2 2 , ensures that the consensus graph matrix Z matches the cluster representation formed by the centroid matrix G and the cluster indicator matrix F . The parameter λ controls the strength of the regularization. 6.8 Anchor -Based Multi-V iew Subspace Clustering with Graph Learning (AMVSCGL) The article [ 76 ] introduces a nov el method to address the challenges in Multi-vie w Subspace Clustering (MVSC), a key problem in pattern recognition and data mining. This method generates a joint coe ffi cient matrix rather than constructing large, vie w-specific graphs, which enhances scalability and reduces memory consumption. Additionally , it incorporates a graph learning term that combines both global and local information from multiple vie ws, making it more e ffi cient and adaptable for large datasets. In contrast, [ 86 ] relies on predefined orthogonal anchors and constructs a ffi nity graphs for each vie w , which can be computationally expensi ve and less scalable for large datasets. Furthermore, [ 76 ] is parameter - free, making it easier to implement, while [ 86 ] requires the manual tuning of hyperparameters, such as the number of anchors. The objectiv e function is as follows. min α, A v , Z V X v = 1         α v || X v − A v Z T || 2 2 + λ n X i = 1 k X j = 1 || X v ( i , :) − A v (: , j ) || 2 2 + α γ v         s . t . A T v A v = I k , Z ≥ 0 , Z T 1 = 1 , γ < 0 (7) where k is the number of clusters, A v is the matrix of anchors in view v (a learnable matrix), Z is the consensus graph matrix between data and their anchors. • First term: α v || X v − A v Z T || 2 2 . This term measures the reconstruction error between the data in vie w v ( X v ) and the approximation of the data using the anchor matrix A v and the consensus graph matrix Z . The weight α v controls the contribution of each vie w to the overall objecti ve. • Second term: λ P n i = 1 P k j = 1 || X v ( i , :) − A v (: , j ) || 2 2 . This term ensures that the data points X v ( i , :) are well approxi- mated by the anchors A v (: , j ) from the corresponding view . The regularization parameter λ controls the strength of this term and helps to maintain the consistency of the anchors across the di ff erent vie ws. • Third term: α γ v . This term introduces a penalty based on the value of the weight α v for each vie w . The exponent γ is a negati ve constant, which usually leads to a regularization e ff ect that favors smaller values of α v . This helps to control the influence of the individual views on the ov erall model so that the method can handle di ff erent contributions of the vie ws more flexibly . 31 6.9 Self-T aught Multi-View Spectral Clustering (SMSC) The work [ 78 ] focuses on the relax-and-discretize strate gy , which uses predefined similarity graphs and learns a consen- sual Laplacian embedding for clustering. The main contributions of this work are as follo ws: • Problem addressed: The work addresses the problem of information loss in MVC methods that arises from the independent processing of similarity graphs and the subsequent graph partitioning steps. This approach is often ine ffi cient and reduces the quality of clustering. • Proposed framework: The authors present the Self-taught Multi-view Spectral Clustering (SMSC) framework. This method considers both the manifold structure induced by Laplacian embeddings and the cluster information embedded in the discrete indicator matrix, which enables learning an optimal consensus similarity graph. • Graph Fusion Schemes: T wo graph fusion schemes are presented: – Con vex combination scheme: This approach combines the similarity graphs from di ff erent views by a con ve x combination. – Centroid Graph Fusion Scheme: In this method, the similarity graphs are merged taking into account the centroid of each view’ s data representation of the indi vidual views. • Self-taught Mechanism: The self-taught mechanism integrates manifold structure and clustering information to learn an optimal consensus similarity graph for graph partitioning and thus improv e clustering results. These contributions make SMSC a powerful and e ffi cient approach by ov ercoming the challenges of information loss and optimizing the learning process for clustering from multiple views. The objectiv e function of this approach is gi ven by: min α, S , P , Y || V X v = 1 α v S v − S || 2 2 + λ 1 T r ( P T L P ) + λ 2 || P − Y ( Y T Y ) − 1 || 2 2 s . t . P T P = I , S ≥ 0 , S T 1 = 1 , α 1 = 1 , Y i j ∈ { 0 , 1 } , k X i = 1 Y i j = 1 (8) where S denotes the consensus graph matrix, P denotes the consensus spectral representation matrix, where Y denotes the indicator matrix. The terms that constitute the objective function, as gi ven by Eq. 8 , are described as follo ws: • The first term , || P V v = 1 α v S v − S || 2 2 , penalizes the di ff erence between a weighted sum of vie w-specific matrices S v and the consensus matrix S . The weights for each view are labeled α v , and the target encourages the consensus matrix S to be close to the weighted sum of view-specific matrices. • The second T erm , λ 1 T r ( P T L P ), regularizes the projection matrix P by encouraging it to align with the Lapla- cian of the graph L , where λ 1 is a regularization parameter . This ensures that the data points are mapped in such a way that the structure defined by the Laplacian matrix is preserved. • The third term , λ 2 || P − Y ( Y T Y ) − 1 || 2 2 , enforces consistency between the projection matrix P and a matrix deriv ed from Y , where Y is an indicator matrix that provides the cluster assignments. The regularization parameter λ 2 controls the weighting of this term and promotes the alignment between P and the cluster structure represented by Y . The authors also propose another learning model that releases the estimation of the blending weights α v . They propose to minimize the following objecti ve function with the concept of automatic view weighting: min α, S , P , Y V X v = 1 || S v − S || 2 2 + λ 1 T r ( P T L P ) + λ 2 || P − Y ( Y T Y ) − 1 || 2 2 s . t . P T P = I , S ≥ 0 , S T 1 = 1 , Y i j ∈ { 0 , 1 } , k X i = 1 Y i j = 1 (9) 32 6.10 Multi-view Clustering Using Unified Graph Learning and Spectral Represen- tation (MCGLSR) Most existing multi-vie w clustering methods follow a sequential process in three steps: estimating individual or consis- tent similarity matrices, performing spectral embedding, and then partitioning into clusters. These methods often reach their limits when integrating all the necessary components. T o address this issue, the authors in [ 79 ] present a novel approach to multi-view clustering which integrates multiple components into a single frame work, ov ercoming the weak- nesses of previous methods. It jointly solves the consistent similarity matrix for all vie ws, the spectral representation, the soft cluster assignments and the weights of the views within a single criterion The k ey innov ations are as follows: • The method eliminates the need for an additional clustering step as it resolves the clustering assignments directly . • The soft cluster assignments are directly linked to the representation of the views, which improves consistency across all views. This approach is characterized by its ability to jointly optimize multiple components of the clustering process without the need for an additional partitioning step. This pro vides a more e ffi cient and integrated solution compared to conv entional methods. The proposed objectiv e function is giv en by: min S v , S ∗ , P ∗ , H V X v = 1 n T r ( K v − 2 K v S v + S vT K v S v ) + || S v || 2 2 + λ 1 || S ∗ − S v || 2 2 o + λ 2 || S ∗ − H P ∗ T || 2 2 + λ 3 T r ( H T L ∗ H ) + λ 4 T r ( P ∗ T L ∗ P ∗ ) , (10) where λ 1 , λ 2 and λ 3 and λ 4 are regularization parameters. K v ∈ R n × n is the kernel data matrix in vie w v , S v ∈ R n × n is the graph matrix in view v , S ∗ ∈ R n × n the consensus graph matrix, H ∈ R n × k is the matrix of soft cluster assignments, and P ∗ ∈ R n × n is the consensus spectral representation. The terms that constitute the objective function, as given by Eq. 10 , are described as follows: • First term (for each view v ): Tr ( K v − 2 K v S v + S vT K v S v ) This term calculates the trace of the di ff erence between the kernel matrix K v and the graph matrix S v , which measures the alignment between the data and the graph for each view . • Second term (graph matrix regularization): || S v || 2 2 This regularization term applies the ℓ 2 norm to the graph matrix S v , which promotes sparsity and reduces ov erfitting. • Third term (consistency between graphs): λ 1 || S ∗ − S v || 2 2 This term ensures that the consensus graph matrix S ∗ is similar to each individual graph matrix S v , enforcing consistency between the di ff erent vie ws. • Fourth term (regularization of the consensus repr esentation): λ 2 || S ∗ − H P ∗ T || 2 2 This term forces the con- sensus graph matrix S ∗ to be close to the product of the soft cluster assignments H and the consensus spectral representation P ∗ T , which strengthens the relationship between the graph and the clustering. • Fifth term (smoothness in the consensus repr esentation): λ 3 T r ( H T L ∗ H ) This term regulates the cluster as- signment matrix H by ensuring a smoothing of the cluster labels via the Laplacian matrix L ∗ . • Sixth term (regularization of the spectral r epresentation): λ 4 T r ( P ∗ T L ∗ P ∗ ) This term regulates the consensus spectral representation P ∗ by ensuring smoothing over the Laplacian matrix L ∗ and preserving the structure of the data across the views. 7 Dataset Exploration In this section, we provide a comprehensiv e overvie w of the key datasets that are important for the ev aluation of multi- view clustering methods. W e provide relev ant information such as the number of samples, the number of views and 33 the number of clusters. T able 3 giv es an overvie w of the common datasets including image-based and feature-based datasets. Indeed, features are usually extracted either by deep neural networks or manually created features. The choice of feature extraction method plays a crucial role in the e ff ecti veness of the clustering process. Deep neural networks have pro ven to be powerful tools for feature extraction in v arious domains due to their ability to automatically learn hierarchical representations from raw data. Notable architectures such as conv olutional neural net- works (CNNs) and recurrent neural networks (RNNs) hav e been extensi vely used for extracting discriminative features from images in multi-view clustering scenarios ([ 148 , 149 ]). These networks show that they are able to capture compli- cated patterns and relationships within the data and thereby improve the quality of the extracted features for subsequent clustering tasks. In contrast, handcrafted features [ 150 , 151 , 152 , 153 ] inv olve a careful design process that utilizes expertise to dev elop features that contain relev ant information for clustering purposes. The handcrafted feature selection process requires a sophisticated understanding of the characteristics of the dataset, and the features chosen should encapsulate the inherent structures of the data to enable e ff ectiv e clustering. It is important to note that the number of views within a given dataset is a v ariable parameter that v aries in di ff erent studies and applications. Determining the optimal number of views is a crucial consideration that a ff ects the overall performance of the multi-view clustering algorithm. Researchers have applied various strategies to solve this problem. They range from empirical selection based on expert knowledge to automated methods guided by statistical criteria [ 154 , 36 ]. T able 3: Exploring V ariability in Multi-V iew Clustering Datasets. Dataset Description Samples Views Clusters Caltech101 [ 155 ] 6 The Caltech101 dataset contains images of objects belonging to 101 categories. Here are two of the most frequently used subcategories: - Caltech101-7 is a subset with 7 object categories 1474 6 7 - Caltech101-20 is a subset with 20 object categories 2386 6 20 SUN-RGBD [ 156 ] Large-scale indoor scene understanding dataset with RGB-D images 10335 45 2 Animal [ 157 ] Subset of the Animals with Attributes dataset (A wA) dataset containing images of animals with attribute annotations 11673 20 4 A W A [ 157 ] The Animals with Attributes dataset (A wA) containing images of animals with attribute annotations 30475 50 6 NUS-WIDE [ 158 ] Dataset with web images and associated object labels 30000 31 5 Y outubeFace 7 Dataset focused on face recognition with video frames and face labels 101499 31 5 BBC-Sport 8 Collection of articles from the sports section of the BBC website relating to sports news in fiv e subject areas, including athletics, cricket, football, rugby , and tennis. Each article has two views [ 75 ]. The first view consists of 3183 features, and the second view consists of 3203 features. 544 2 5 Continued on next page 6 https: // data.caltech.edu / records / mzrjq-6wc02 7 https: // www .cs.tau.ac.il / wolf / ytf aces / 8 http: // mlg.ucd.ie / datasets / bbc.html 34 T able 3 – Continued from pre vious page Dataset Description Samples Views Clusters COIL-20 9 Contains 1,440 grayscale images, divided into 20 classes, each class consisting of 72 images. Each image was taken from three di ff erent [ 79 ]: intensity feature (1024), LBP feature (3304) and Gabor feature (6750). 1440 3 20 UCI-digits 10 Comprises 2,000 data points divided into 10 digit classes, each class containing 200 handwritten digits. The dataset includes 3 views [ 159 ]: profile correlation (216), Fourier coe ffi cient (76) and Karhunen-Loev e coe ffi cient (64) 2000 6 10 ORL 11 Consists of 40 individuals, with each individual represented by 10 di ff erent images. Each image is characterized by three vie ws [ 75 ]: intensity feature (4096), LBP feature (3304) and Gabor feature (6750). 400 3 40 Outdoor Scene Consist of 2688 images. These images are di vided into 8 groups. Each image is characterized by 4 [ 160 ]: GIST features (512), color moment features (432), HOG features (256), and LBP features (48). 2688 4 8 MSRCv1 Consists of 210 instances from Microsoft Research in Cam- bridge divided into 7 groups. Each image is characterized by 5 views [ 161 ] : GIST features (512), color moment features (24), CENTRIST features (254), SIFT features (512), and LBP features (256). 210 5 7 CO VIDx 12 Consists of 13892 samples divided into 5458 instances corre- sponding to the pneumonia class, 468 instances correspond- ing to the COVID19 class, and 7966 instances corresponding to the normal class. Each image is characterized by 3 views: ResNet50 (2048), ResNet101 (2048), and DenseNet169 feature vector (1664). 13892 3 2 8 Conclusion In this comprehensive surv ey we systematically explored the landscape of multi-view clustering, providing a structured and insightful journey through various aspects of the field. W e started with an introduction where we highlighted the increasing importance of multi-view clustering in modern machine learning and data analytics and explained both the challenges and motiv ations. The following sections covered the basics of multi-view clustering, classical methods, kernel-based approaches, subspace-based methods and depth-based techniques, providing a di ff erentiated understanding of each method. W e also looked at graph-based multi-vie w clustering and introduced novel algorithms that can overcome specific challenges of multi-view clustering such as missing data, noise suppression and computational e ffi ciency . The section on multi-view clustering with missing or incomplete data looked at recent developments in data analysis, focusing on the challenges posed by data incompleteness in clustering scenarios. A careful examination of typical approaches was conducted in a formal mathematical sense, shedding light on the mathematical framew ork underlying these methods. This rigorous analysis contributes to a deeper understanding of the 9 https: // www .cs.columbia.edu / CA VE / software / softlib / coil-20.php 10 http: // archiv e.ics.uci.edu / ml / datasets / Multiple + Features 11 http: // cam-orl.co.uk / facedatabase.html / 12 https: // github .com / lindawangg / CO VID-Net / blob / master / docs / CO VIDx.md 35 methods used in multi-view clustering and focuses on the formal aspects of their mathematical foundations. In addition, a comprehensi ve ov erview of the k ey datasets that are critical to the e valuation of multi-view clustering methods is pro vided. This includes a concise summary of essential details co vering common feature extraction methods and a critical examination of ho w to determine the optimal number of views. T o conclude, this revie w not only summarizes existing knowledge but also lays the foundation for future advances in multivie w clustering research. By systematically organizing and presenting the v arious facets of the field, it serves as a valuable resource for researchers seeking to foster a deeper understanding of the complexity and potential of the multivie w clustering landscape. By the end of this overvie w , it is clear that multi-view clustering is a dynamic and ev olving field with a variety of perspectiv es. 9 Futur e Directions Multi-view clustering has achieved significant progress in overcoming various challenges related to the integration of di ff erent data views. These advances have laid the foundation for robust methods and various applications, but much remains to be done to tackle unsolved problems and exploit new opportunities. The following key areas represent both milestones achiev ed and exciting prospects for future research: • Integration with deep learning: Multi-vie w clustering has already benefited significantly from the integra- tion of deep learning techniques. Methods such as autoencoders and graph neural networks have enabled more flexible and accurate learning of representations [ 123 ]. Howev er , challenges such as the need for e ff ective un- supervised architectures and scalable models remain. Future research should focus on developing architectures that e xploit correlations between multiple views, taking into account adv ances in unsupervised deep learning and self-supervised paradigms. • Adaptive and self-learning mechanisms: Recent developments have introduced adapti ve models that are able to dynamically weight views based on their contributions, improving robustness to noisy or incomplete views [ 17 ]. Howev er , these methods often rely on hand-crafted assumptions. Future work could focus on self-learning framew orks that are able to automatically discover optimal representations and weighting strategies, and lev erage advances in meta-learning and reinforcement learning to refine their adaptability . • Scalability to large data sets: Progress has already been made in scaling multi-vie w clustering algorithms using techniques such as anchor-based methods and distributed computing systems [ 17 , 86 ]. Howev er , the exponential growth in data volume and complexity requires further innov ation. Research needs to focus on lightweight algorithms with linear or sublinear complexity that are capable of processing streaming or real-time multi-view data in distributed en vironments. • Heterogeneity of views: Existing multi-view clustering methods hav e addressed heterogeneity by using tech- niques such as kernel-based approaches and joint latent space learning [ 122 ]. Despite these e ff orts, dealing with di ff erent distributions, dimensions, and noise levels between views remains a challenge. Future research should explore robust frameworks that can dynamically adapt to heterogeneous data features while ensuring interpretabil- ity and e ff ectiv eness. • Interpr etability and explainability: Multi-view clustering models increasingly use latent representation learn- ing to improve performance, but this often reduces transparency . T o impro ve interpretability , frame works such as Self-taught Multi-vie w Spectral Clustering (SMSC) [ 78 ] inte grate manifold structure with clustering information to create a consensus similarity graph that matches the cluster representations with the input graphs. Similarly , Multi-view Clustering using Unified Graph Learning and Spectral Representation (MCGLSR) [ 79 ] ensures con- sistency between views while resolving cluster mappings directly . Despite these adv ances, it remains a challenge to develop fully interpretable multi-view clustering models. Future directions should focus on developing mod- els that not only provide accurate clustering results, but also provide actionable insights into the relationships between the views and the clustered data. Significant progress has been made in o vercoming the main challenges of multi-vie w clustering, such as data integration and scalability . Howe ver , further progress in the areas of interpretability , heterogeneity and adaptive learning are crucial 36 for further development. The integration of emerging trends such as deep learning, explainable AI and domain-specific adaptation will play a crucial role in improving the relev ance and robustness of multi-view clustering methods for tackling complex real-world problems. In recent years, domain-specific kno wledge, especially in areas such as bioinformatics and medical diagnostics, has been successfully applied to improve clustering performance [ 162 , 59 ]. Nev ertheless, frame works for the seamless inte- gration of domain-specific priorities into clustering algorithms have not yet been su ffi ciently explored. Future research could focus on dev eloping automated approaches to encode domain knowledge using tools such as knowledge graphs and ontology-based constraints to improv e the accuracy and applicability of clustering. Refer ences [1] W eixuan Liang, Sihang Zhou, Jian Xiong, Xinwang Liu, Siwei W ang, En Zhu, Zhiping Cai, and Xin Xu. Multi- view spectral clustering with high-order optimal neighborhood laplacian matrix. IEEE T ransactions on Knowl- edge and Data Engineering , 34(7):3418–3430, 2022. [2] Y ouwei Liang, Dong Huang, and Chang-Dong W ang. Consistency meets inconsistency: A unified graph learning framew ork for multi-vie w clustering. In 2019 IEEE International Conference on Data Mining (ICDM) , pages 1204–1209, 2019. [3] Miaomiao Li, Xinwang Liu, Y i Zhang, and W eixuan Liang. Late fusion multivie w clustering via min-max optimization. IEEE T ransactions on Neural Networks and Learning Systems , 35(7):9417–9427, 2024. [4] Eric Bruno and Stéphane Marchand-Maillet. Multivie w clustering: A late fusion approach using latent models. page 736 – 737, 2009. Cited by: 86. [5] Lin Li, Xiaojun Zhou, Zhiqiang Lu, Dongxiao Li, Qinxu Xu, and Li Song. Joint learning of latent representation and global similarity for multi-vie w image clustering. In Pr oc. SPIE 12083, Thirteenth International Confer ence on Graphics and Imag e Pr ocessing (ICGIP 2021) , page 120830H, 2022. [6] Aiping Huang, W eiling Chen, Tiesong Zhao, and Chang W en Chen. Joint learning of latent similarity and local embedding for multi-view clustering. IEEE T ransactions on Image Pr ocessing , 30:6772–6784, 2021. [7] Xin Xu and Y uan Tian. A survey of multi-view machine learning. Neur al Computing and Applications , 26(3):665–681, 2015. [8] Mahesh P . Kumar and Raghav endra Udupa. A co-regularized approach to multi-view spectral clustering. In Pr oceedings of the 28th International Conference on International Confer ence on Mac hine Learning (ICML ’11) , volume 1, page 33, 2011. [9] Xiaojun Cai, Feiping Nie, and Heng Huang. Heterogeneous image feature integration via multi-modal spec- tral clustering. In Proceedings of the 24th International Confer ence on Neural Information Processing Systems (NeurIPS) , pages 350–358, 2011. [10] M. Greenacre, P .J.F . Groenen, T . Hastie, et al. Principal component analysis. Natur e Reviews Methods Primers , 2:100, 2022. [11] L. van der Maaten and G. Hinton. V isualizing data using t-sne. Journal of Machine Learning Resear ch , 9:2579– 2605, 2008. [12] Chang Xu, Dacheng T ao, and Chao Xu. A survey on multi-vie w learning, 2013. [13] Ben Y ang, Xuetao Zhang, Zhiping Lin, Feiping Nie, BD Chen, and Fei W ang. E ffi cient and robust multi view clustering with anchor graph regularization. IEEE T ransactions on Cir cuits and Systems for V ideo T echnology , 32:6200–6213, 09 2022. [14] Qiyuan Ou, Siwei W ang, Sihang Zhou, Miaomiao Li, Xifeng Guo, and En Zhu. Anchor-based multivie w subspace clustering with div ersity regularization. IEEE MultiMedia , 27(4):91–101, 2020. 37 [15] Shibing Zhou, Xi W ang, Mingrui Y ang, and W ei Song. Multi-view clustering with adaptive anchor and bipartite graph learning. Neur ocomputing , 611:128627, 2025. [16] Jing Li, Qianqian W ang, Ming Y ang, Quanxue Gao, and Xinbo Gao. E ffi cient anchor graph factorization for multi-view clustering. IEEE T ransactions on Multimedia , 26:5834–5845, 2024. [17] Siwei W ang, Xinwang Liu, Xinzhong Zhu, Pei Zhang, Y i Zhang, Feng Gao, and En Zhu. Fast parameter- free multi-view subspace clustering with consensus anchor guidance. IEEE T ransactions on Image Pr ocessing , 31:556–568, 2022. [18] Y ongyong Chen, Xiaolin Xiao, and Y icong Zhou. Jointly learning k ernel representation tensor and a ffi nity matrix for multi-view clustering. IEEE T ransactions on Multimedia , 22(8):1985–1997, 2019. [19] Y ongyong Chen, Xiaolin Xiao, Chong Peng, Guangming Lu, and Y icong Zhou. Low-rank tensor graph learning for multi-view subspace clustering. IEEE T ransactions on Cir cuits and Systems for V ideo T ec hnology , 2021. [20] W ei Xia, Quanxue Gao, Qianqian W ang, and Xinbo Gao. T ensor completion-based incomplete multivie w clus- tering. IEEE T ransactions on Cybernetics , 52(12):13635–13644, 2022. [21] L. R. T ucker . Some mathematical notes on three-mode factor analysis. Psychometrika , 31(3):279–311, Sep. 1966. [22] T . G. K olda and B. W . Bader . T ensor decompositions and applications. SIAM Re view , 51(3):455–500, Aug. 2009. [23] Z. Fang, X. Y ang, L. Han, and X. Liu. A sequentially truncated higher order singular value decomposition-based algorithm for tensor completion. IEEE T ransactions on Cybernetics , 49(5):1956–1967, May 2019. [24] H. Zhu, S. Klus, and T . Sahai. A dynamic mode decomposition approach for decentralized spectral clustering of graphs. In Proceedings of the IEEE Confer ence on Contr ol T echnology and Applications (CCTA) , pages 1202– 1207, Aug. 2022. [25] Jun Liu, Shuiwang Ji, and Jie Y e. Sleec: Sparse learning with e ffi cient euclidean constraints for low-resource multivie w learning. In Pr oceedings of the 30th International Confer ence on International Conference on Mac hine Learning (ICML ’13) , volume 28, pages 1491–1499, 2013. [26] Y an Y an, Zhen Zhang, and Jie Y ang. Multi-view learning: Theoretical foundations and empirical study . IEEE T ransactions on Cybernetics , 46(12):2725–2738, 2016. [27] Xianhua Cai, Feiping Nie, and Heng Huang. Heterogeneous image feature integration via multi-modal multi- kernel learning. In Pr oceedings of the 30th International Conference on International Confer ence on Machine Learning (ICML ’13) , volume 28, pages 1309–1317, 2013. [28] X. Liu et al. E ffi cient and e ff ectiv e regularized incomplete multi-view clustering. IEEE T rans. P attern Anal. Mach. Intell. , 43(8):2634–2646, Aug 2020. [29] Z. Lv , Q. Gao, X. Zhang, Q. Li, and M. Y ang. V iew-consistenc y learning for incomplete multivie w clustering. IEEE T rans. Image Pr ocess. , 31:4790–4802, 2022. [30] Y alan Qin, Xinpeng Zhang, Shui Y u, and Guorui Feng. A survey on representation learning for multi-view data. Neural Networks , 181:106842, 2025. [31] Muhammad Haris, Y usliza Y uso ff , Azlan Mohd Zain, Abid Saeed Khattak, and Syed Faw ad Hussain. Breaking down multi-view clustering: A comprehensiv e revie w of multi-view approaches for complex data structures. Engineering Applications of Artificial Intelligence , 132:107857, 2024. [32] Sura Raya, Mariam Orabi, Imad Afyouni, and Zaher Al Aghbari. Multi-modal data clustering using deep learning: A systematic revie w . Neur ocomputing , 607:128348, 2024. [33] Uno Fang, Man Li, Jianxin Li, Longxiang Gao, T ao Jia, and Y anchun Zhang. A comprehensive surv ey on multi- view clustering. IEEE T ransactions on Knowledge and Data Engineering , 35(12):12350–12368, 2023. [34] Jie W en, Zheng Zhang, Lunke Fei, and et al. A survey on incomplete multivie w clustering. IEEE T ransactions on Systems, Man, and Cybernetics: Systems , 53(2):1136–1149, 2022. 38 [35] Pierre De Handschutter, Nicolas Gillis, and Xavier Siebert. A survey on deep matrix factorizations. Computer Science Revie w , 42, 2021. [36] Gao Chao, Shiliang Sun, and Jinbo Bi. A survey on multi-view clustering. IEEE T ransactions on Artificial Intelligence , 2(2):146–168, April 2021. [37] Lele Fu, Pengfei Lin, Athanasios V . V asilakos, and Shiping W ang. An overvie w of recent multi-view clustering. Neur ocomputing , 402:148–161, 2020. [38] Y an Y ang and Hao W ang. Multi-view clustering: A survey . Big Data Mining and Analytics , 1(2):83–107, 2018. [39] Baochen Sun, Jiashi Feng, and Kate Saenko. Return of frustratingly easy domain adaptation. In Pr oceedings of the 30th International Conference on International Conference on Machine Learning (ICML’13) , volume 28, pages 897–904, 2013. [40] V ikas Sindhwani, Abhishek Bhattacharya, and Soumen Rakshit. Information theoretic co-clustering. In Pr oceed- ings of the 24th Confer ence on Uncertainty in Artificial Intelligence (U AI’08) , pages 504–511, 2008. [41] Jun Liu, Shuiwang Ji, and Jie Y e. Sleec: Sparse learning with e ffi cient euclidean constraints for low-resource multivie w learning. In Pr oceedings of the 30th International Confer ence on International Conference on Mac hine Learning (ICML ’13) , volume 28, pages 1491–1499, 2013. [42] Junpeng T an, Y ukai Shi, Zhijing Y ang, Caizhen W en, and Liang Lin. Unsupervised multi-view clustering by squeezing hybrid knowledge from cross view and each vie w . IEEE T ransactions on Multimedia , 23:2943–2956, 2020. [43] Miaomiao Li, Xinwang Liu, Lei W ang, Y ong Dou, Jianping Y in, and En Zhu. Multiple kernel clustering with local kernel alignment maximization. In Pr oceedings of the T wenty-F ifth International Joint Confer ence on Artificial Intelligence (IJCAI-16) . IJCAI / AAAI, 2016. [44] Siwei W ang, Xinwang Liu, Li Liu, Sihang Zhou, and En Zhu. Late fusion multiple kernel clustering with proxy graph refinement. IEEE T ransactions on Neural Networks and Learning Systems , 2021. [45] Y ang W ang, Xuemin Lin, Lin W u, W enjie Zhang, Qing Zhang, and Xiaodi Huang. Rob ust subspace clustering for multi-view data by exploiting correlation consensus. IEEE T ransactions on Image Pr ocessing , 24(11):3939– 3949, 2015. [46] Man-Sheng Chen, Ling Huang, Chang-Dong W ang, and Dong Huang. Multi-view clustering in latent embedding space. Pr oceedings of the AAAI Confer ence on Artificial Intelligence , 34(04):3513–3520, Apr . 2020. [47] Ghufran Ahmad Khan, Jie Hu, Tianrui Li, Bassoma Diallo, and Hongjun W ang. Multi-view data clustering via non-negati ve matrix factorization with manifold regularization. International Journal of Machine Learning and Cybernetics , pages 1–13, 2022. [48] Dacheng Zheng, Zhiwen Y u, W uxing Chen, W eiwen Zhang, Qiying Feng, Y ifan Shi, and Kaixiang Y ang. Mul- tivie w ensemble clustering of hyper graph p-laplacian regularization with weighting and denoising. Information Sciences , 681:121187, 2024. [49] Xinyu Pu, Baicheng Pan, and Hangjun Che. Robust low-rank graph multi-view clustering via cauchy norm minimization. Mathematics , 11(13), 2023. [50] Bing Liu, Anzhu Y u, Xuchu Y u, Ruirui W ang, Kuiliang Gao, and W enyue Guo. Deep multivie w learning for hyperspectral image classification. IEEE T ransactions on Geoscience and Remote Sensing , 59(9):7758–7772, Sep. 2021. [51] Pooja Gupta, Anurag Goel, Angshul Majumdar , Emilie Chouzenoux, and Giov anni Chierchia. Deconfclus- ter: Deep con volutional transform learning based multiview clustering fusion framework. Signal Processing , 224:109597, 2024. [52] Xingchao W ang, W eidi Zhao, and Jiashi W u. Multi-view generati ve adversarial networks. In Proceedings of the Eur opean Conference on Computer V ision (ECCV’18) , 2018. 39 [53] Chang T ang, Xinzhong Zhu, Xinwang Liu, Miaomiao Li, Pichao W ang, Changqing Zhang, and Lizhe W ang. Learning a joint a ffi nity graph for multi vie w subspace clustering. IEEE T ransactions on Multimedia , 21(7):1724– 1736, 2018. [54] Han Zhang, Danyang W u, Feiping Nie, Rong W ang, and Xuelong Li. Multilev el projections with adaptive neighbor graph for unsupervised multi-view feature selection. Information Fusion , 70:129–140, 2021. [55] Y i Xu and Guoqing Niu. Research on multi-view clustering algorithm based on sequential three-way decision. Applied Soft Computing , 158:111590, 2024. [56] H. A. Al-kuhali, M. Shan, M. A. Hael, and et al. Multivie w clustering of multi-omics data integration by using a penalty model. BMC Bioinformatics , 23(1):288, 2022. [57] Qianyi Zhan and W ei Hu. An epilepsy detection method using multi view clustering algorithm and deep features. Computational and Mathematical Methods in Medicine , 2020, 2020. [58] Bastian Pfeifer , Marcus D. Bloice, and Michael G. Schimek. Parea: Multi-view ensemble clustering for cancer subtype discov ery . Journal of Biomedical Informatics , 143:104406, 2023. [59] Y ixue Feng, Mansu Kim, Xiaohui Y ao, Kefei Liu, Qi Long, and Li Shen. Deep multivie w learning to identify imaging-driv en subtypes in mild cognitiv e impairment. BMC Bioinformatics , 23, 2022. [60] So Y eon Kim and Kyung-Ah Sohn. Multi-view network-based social-tagged landmark image clustering. In 2017 IEEE International Confer ence on Image Pr ocessing (ICIP) , pages 3680–3684, 2017. [61] Chao Lan, Y uhao Y ang, Xiaoli Li, Bo Luo, and Jun Huan. Learning social circles in ego-networks based on multi-view network structure. IEEE T ransactions on Knowledge and Data Engineering , 29(8):1784–1797, 2017. [62] Xiaowen Dong, Pascal Frossard, Pierre V andergheynst, and Nikolai Nefedov . Clustering on multi-layer graphs via subspace analysis on grassmann manifolds. In 2013 IEEE Global Confer ence on Signal and Information Pr ocessing , pages 993–996, 2013. [63] Kai Chen, Y ujie Huang, and Ziyuan W ang. A multivie w approach to tracking people in cro wded scenes using fusion feature correlation. Communications in Computer and Information Science , 1911 CCIS:204–217, 2024. [64] T orben T eepe, Philipp W olters, Johannes Gilg, Fabian Herzog, and Gerhard Rigoll. Lifting Multi-V iew Detection and Tracking to the Bird’ s Eye V iew . In 2024 IEEE / CVF Conference on Computer V ision and P attern Recognition W orkshops (CVPRW) , pages 667–676, Los Alamitos, CA, USA, June 2024. IEEE Computer Society . [65] Jingwen Li, W ei W u, Dan Zhang, Dayong Fan, Jianwu Jiang, Y anling Lu, Ertao Gao, and T ao Y ue. Multi- pedestrian tracking based on kc-yolo detection and identity validity discrimination module. Applied Sciences , 13(22), 2023. [66] Zhi-Hua Zhou and Min-Ling W ang. Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Pr ocessing Systems (NeurIPS) , pages 1609–1616, 2007. [67] Angela Serra, Paola Galdi, and Roberto T agliaferri. Chapter 13 - multi view learning in biomedical applications. In Robert Kozma, Cesare Alippi, Y oonsuck Choe, and Francesco Carlo Morabito, editors, Artificial Intelligence in the Age of Neural Networks and Br ain Computing , pages 265–280. Academic Press, 2019. [68] Y u-Ling Hsueh, W en-Nung Lie, and Guan-Y ou Guo. Human behavior recognition from multivie w videos. Infor- mation Sciences , 517:275 – 296, 2020. [69] Jinhong Y u, Kun Sun, Kunqian Li, Chuan T ang, and Ruyi Feng. T owards accurate image matching by exploring redundancy between multiple descriptors. IEEE Geoscience and Remote Sensing Letters , 20, 2023. [70] Jin Chen, Huafu Xu, Jingjing Xue, Quanxue Gao, Cheng Deng, and Ziyu Lv . Incomplete multi-view clustering based on hypergraph. Information Fusion , 117:102804, 2025. [71] Zengfa Dou, Nian Peng, W eiming Hou, Xianghua Xie, and Xiaoke Ma. Learning multi-lev el topology repre- sentation for multi-view clustering with deep non-neg ativ e matrix factorization. Neural Networks , 182:106856, 2025. 40 [72] Jie Zhou and Runxin Zhang. A weighted multi-view clustering via sparse graph learning. Cluster Computing , 27(10):13517–13530, 2024. [73] Xingwang Zhao, Shujun W ang, Xiaolin Liu, and Jiye Liang. Multi-view clustering via dynamic unified bipartite graph learning. P attern Recognition , 156:110715, 2024. [74] Hengkang W ang, Han Lu, Ju Sun, and Sandra E. Safo. Interpretable deep learning methods for multiview learning. BMC Bioinformatics , 25(1):69, 2024. [75] Ao Li, Cong Feng, Y uan Cheng, Y ingtao Zhang, and Hailu Y ang. Incomplete multivie w subspace clustering based on multiple kernel lo w-redundant representation learning. Information Fusion , 103, 2024. [76] Chao Su, Haoliang Y uan, Loi Lei Lai, and Qiang Y ang. Anchor-based multi-vie w subspace clustering with graph learning. Neur ocomputing , 547:126320, 2023. [77] Erlin Pan and Zhao Kang. High-order multi-view clustering for generic data. Information Fusion , 100:101947, 2023. [78] Guo Zhong and Chi-Man Pun. Self-taught multi-view spectral clustering. P attern Recognition , 138:109349, 2023. [79] F . Dornaika and S. El Hajjar . Single phase multi-view clustering using unified graph learning and spectral repre- sentation. Information Sciences , 645, 2023. [80] F . Dornaika and S. El Hajjar . Direct multi-view spectral clustering with consistent kernelized graph and conv olved nonnegati ve representation. Artificial Intelligence Review , 56(10):10987–11015, 2023. [81] Jie Chen, Zhu W ang, Hua Mao, and Xi Peng. Low-rank tensor learning for incomplete multivie w clustering. IEEE T ransactions on Knowledge and Data Engineering , 35(11):11556 – 11569, 2023. [82] Shirui Luo and Xiaochun Cao. Multivie w subspace dual clustering. IEEE T ransactions on Neural Networks and Learning Systems , 33(12):7425–7437, 2022. [83] Xiangyu Liu and Peng Song. Incomplete multi-view clustering via virtual-label guided matrix factorization. Expert Systems with Applications , 210, 2022. [84] Lijuan W ang, Lin Zhang, Ming Y in, Zhifeng Hao, Ruichu Cai, and W en W en. Double embedding-transfer-based multi-view spectral clustering. Expert Systems with Applications , 210, 2022. [85] Jun Y in, Runcheng Cai, and Shiliang Sun. Anchor-based incomplete multi-view spectral clustering. Neur ocom- puting , 514:526–538, 2022. [86] Man-Sheng Chen, Chang-Dong W ang, Dong Huang, Jian-Huang Lai, and Philip S. Y u. E ffi cient orthogonal multi-view subspace clustering. In Proceedings of the 28th ACM SIGKDD Confer ence on Knowledge Discovery and Data Mining , KDD ’22, page 127–135. Association for Computing Machinery , 2022. [87] Changan Y uan, Y onghua Zhu, Zhi Zhong, W ei Zheng, and Xiaofeng Zhu. Robust self-tuning multi-vie w cluster- ing. W orld W ide W eb , 25(2):489–512, 2022. [88] Zhao Kang, Zhiping Lin, Xiaofeng Zhu, and W enbo Xu. Structured graph learning for scalable subspace cluster - ing: From single view to multi view . IEEE T ransactions on Cybernetics , 52:8976–8986, 2021. [89] S El Hajjar , F Dornaika, and F Abdallah. Multi-view spectral clustering via constrained nonne gati ve embedding. Information Fusion , 2021. [90] Xiao Y u, Hui Liu, Y an W u, and Caiming Zhang. Fine-grained similarity fusion for multi-view spectral clustering. Information Sciences , 568:350–368, 2021. [91] Mitsuhiko Horie and Hiroyuki Kasai. Consistency-aw are and inconsistency-aware graph-based multi-view clus- tering. In 2020 28th Eur opean Signal Pr ocessing Conference (EUSIPCO) , pages 1472–1476. IEEE, 2021. [92] Hongwei Y in, W enjun Hu, Fanzhang Li, and Jungang Lou. One-step multi-view spectral clustering by learning common and specific nonnegati ve embeddings. International Journal of Machine Learning and Cybernetics , 12(7):2121–2134, 2021. 41 [93] Peng Chen, Liang Liu, Zhengrui Ma, and Zhao Kang. Smoothed multi-vie w subspace clustering. In International Confer ence on Neural Computing for Advanced Applications , pages 128–140. Springer , 2021. [94] Guoli Niu, Y oulong Y ang, and Liqin Sun. One-step multi-view subspace clustering with incomplete views. Neur ocomputing , 438:290–301, 2021. [95] Deyan Xie, Xiangdong Zhang, Quanxue Gao, Jiale Han, Song Xiao, and Xinbo Gao. Multivie w clustering by joint latent representation and similarity learning. IEEE T ransactions on Cybernetics , 50(11):4848–4854, 2020. [96] Zhao Kang, Guoxin Shi, Shudong Huang, W enyu Chen, Xiaorong Pu, Joey T ianyi Zhou, and Zenglin Xu. Multi- graph fusion for multi-view spectral clustering. Knowledge-Based Systems , 189:105102, 2020. [97] Hao W ang, Y an Y ang, and Bing Liu. Gmc: Graph-based multi-vie w clustering. IEEE T ransactions on Knowledge and Data Engineering , 32(6):1116–1129, 2020. [98] Zhanxuan Hu, Feiping Nie, W ei Chang, Shuzheng Hao, Rong W ang, and Xuelong Li. Multi-view spectral clustering via sparse graph learning. Neur ocomputing , 384:1–10, 2020. [99] T . Zhou, Changqing Zhang, Xi Peng, H. Bhaskar , and J. Y ang. Dual shared-specific multivie w subspace cluster- ing. IEEE T ransactions on Cybernetics , 50:3517–3530, 2020. [100] Shudong Huang, Zhao Kang, Iv or W Tsang, and Zenglin Xu. Auto-weighted multi-vie w clustering via kernelized graph learning. P attern Recognition , 88:174–184, 2019. [101] Guoqing Chao, Jiangwen Sun, Jin Lu, An-Li W ang, Daniel D Langleben, Chiang-Shan Li, and Jinbo Bi. Multi- view cluster analysis with incomplete data to understand treatment e ff ects. Information Sciences , 494:278–293, 2019. [102] Hao W ang, Linlin Zong, Bing Liu, Y an Y ang, and W ei Zhou. Spectral perturbation meets incomplete multi-view data. In International Joint Confer ence on Artificial Intelligence , pages 3677–3683, 2019. [103] Xi Peng, Zhenyu Huang, Jiancheng Lv , Hongyuan Zhu, and Tian yi Joey Zhou. Comic: Multi-view clustering without parameter selection. international confer ence on machine learning , 2019. [104] L. Xing, B. Chen, S. Du, Y . Gu, and N. Zheng. Correntropy-based multiview subspace clustering. IEEE T rans- actions on Cybernetics , pages 1–14, 2019. [105] K. Zhan, C. Zhang, J. Guan, and J. W ang. Graph learning for multivie w clustering. IEEE T ransactions on Cybernetics , 48(10):2887–2895, 2018. [106] A vrim Blum and T om Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual confer ence on Computational learning theory (COLT’98) , pages 92–100. A CM, 1998. [107] Rong Xia, Y un Pan, Hanjiang Lai, and Congcong Liu. Multi-view spectral clustering with high-order graphs. IEEE T ransactions on Cybernetics , 44(9):1619–1631, 2014. [108] Kamalika Chaudhuri, Sham M Kakade, Karen Liv escu, and Karthik Sridharan. Multi-view clustering via canoni- cal correlation analysis. In Pr oceedings of the 26th Annual International Conference on Machine Learning , pages 129–136, 2009. [109] Jiayu Li, Chris H. Q. Ding, and Licheng T ang. Multi-view clustering via joint non-negati ve matrix factorization. Knowledge-Based Systems , 144:159–171, 2018. [110] Xingyu Xu, Jie Zhao, Jianqiu W ang, and Y ebin W ang. Multi-view spectral embedding. P attern Recognition , 94:88–98, 2019. [111] H. Chen and X. Liu. An improved multi-view spectral clustering based on tissue-like p systems. Scintific Repport , 12:18616, 2022. [112] Hao Cai, Bo Liu, Y anshan Xiao, and LuY ue Lin. Semi-supervised multi-view clustering based on orthonormality- constrained nonnegati ve matrix factorization. Information Sciences , 536:171–184, 2020. [113] Xiaofeng Zhu, Shichao Zhang, W ei He, Rongyao Hu, Cong Lei, and Pengfei Zhu. One-step multi-view spectral clustering. IEEE T ransactions on Knowledge and Data Engineering , 31(10):2022–2034, 2018. 42 [114] Zhenwen Ren, Haoran Li, Chao Y ang, and Quansen Sun. Multiple kernel subspace clustering with local structural graph and low-rank consensus k ernel learning. Knowledge-Based Systems , 188:105040, 2020. [115] Sally El Hajjar , F adi Dornaika, Fahed Abdallah, and Hichem Omrani. Multi-view spectral clustering via integrat- ing label and data graph learning. In International Conference on Ima ge Analysis and Processing , pages 109–120. Springer , 2022. [116] Changqing Zhang, Qinghua Hu, Huazhu Fu, Pengfei Zhu, and Xiaochun Cao. Latent multi-view subspace clus- tering. In Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , pages 4279–4287, 2017. [117] Ghufran Ahmad Khan, Jie Hu, Tianrui Li, Bassoma Diallo, and Shengdong Du. Multi-view subspace clustering for learning joint representation via low-rank sparse representation. Applied Intelligence , 2023. [118] Aparajita Khan and Pradipta Maji. Multi-manifold optimization for multi-view subspace clustering. IEEE T rans- actions on Neural Networks and Learning Systems , 2021. [119] I.M. James. The T opology of Stiefel Manifolds . Cambridge books online. Cambridge University Press, 1976. [120] Guojun Du, Lei Zhou, Y uchao Y ang, and et al. Deep multiple auto-encoder-based multi-view clustering. Data Science and Engineering , 6:323–338, 2021. [121] Jiao W ang, Bin W u, Zhenwen Ren, and Y unhui Zhou. Decomposed deep multi-view subspace clustering with self-labeling supervision. Information Sciences , 653, 2024. [122] Kai Li, Hongfu Liu, Y ulun Zhang, Kunpeng Li, and Y un Fu. Self-guided deep multivie w subspace clustering via consensus a ffi nity regularization. IEEE T ransactions on Cybernetics , 52(12):12734–12744, 2022. [123] Pengfei Zhu, Xinjie Y ao, Y u W ang, Binyuan Hui, Dawei Du, and Qinghua Hu. Multiview deep subspace cluster- ing networks. IEEE T ransactions on Cybernetics , 54(7):4280–4293, 2024. [124] Zonghan W u, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Y u. A comprehensiv e surve y on graph neural networks. IEEE T ransactions on Neural Networks and Learning Systems , 32(1):4–24, Jan 2021. [125] Meng Jiang. Transfer learning across graph con volutional networks: Methods, theory , and applications. ACM T ransactions on Knowledge Discovery fr om Data , 18(1):1–23, 2023. [126] T abea Rebafka. Model-based clustering of multiple networks with a hierarchical algorithm. Statistics and Com- puting , 34(1):32, 2024. [127] Z. Liu, C. Chen, X. Y ang, J. Zhou, X. Li, and L. Song. Heterogeneous graph neural networks for malicious account detection. In CIKM , pages 2077–2085, 2018. [128] S. W u, Y . T ang, Y . Zhu, L. W ang, X. Xie, and T . T an. Session-based recommendation with graph neural networks. In AAAI , volume 33, pages 346–353, 2019. [129] A. Pucci, M. Gori, M. Hagenbuchner , F . Scarselli, and A. Tsoi. Inv estigation into the application of graph neural networks to lar ge-scale recommender systems. Syst. Sci. , 32(4):17–26, 2006. [130] Manuel Graña and Igone Morais-Quilez. A revie w of graph neural networks for electroencephalography data analysis. Neur ocomputing , 562:126901, 2023. [131] Xiaowei Zhang, Zhiming Zheng, Dongming Gao, and et al. Multi-view consistent generativ e adversarial networks for compositional 3d-aware image synthesis. International Journal of Computer V ision , 131(11):2219–2242, 2023. [132] Ian Goodfello w , Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, Da vid W arde-Farley , Sherjil Ozair , Aaron Courville, and Y oshua Bengio. Generative adv ersarial networks. Communications of the ACM , 63(11):139–144, Nov ember 2020. [133] Y ang W ang, Lin W u, Xuemin Lin, and Junbin Gao. Multivie w spectral clustering via structured low-rank matrix factorization. IEEE transactions on neural networks and learning systems , 29(10):4833–4843, 2018. 43 [134] Ke wei T ang, Kaiqiang Xu, Zhixun Su, and Nan Zhang. Multi-view subspace clustering via consistent and di verse deep latent representations. Information Sciences , 651, 2023. [135] Zonghan Shi and Haitao Zhao. Deep multi-vie w clustering based on reconstructed self-e xpressiv e matrix. Applied Sciences (Switzerland) , 13(15), 2023. [136] Li W ang, Ren-Cang Li, and W en-W ei Lin. Multiview orthonormalized partial least squares: Regularizations and deep extensions. IEEE T ransactions on Neur al Networks and Learning Systems , 34(8):4371 – 4385, 2023. [137] Zhanxuan Hu, Feiping Nie, Rong W ang, and Xuelong Li. Multi-view spectral clustering via integrating nonneg- ativ e embedding and spectral embedding. Information Fusion , 55:251–259, 2020. [138] Xiang Zhang, Shuhui W ang, and Dacheng T ao. Adapti ve similarity metric fusion for multi-view clustering. IEEE T ransactions on Cybernetics , 47(7):1746–1759, 2017. [139] Jie W en, K e Y an, Zheng Zhang, Y ong Xu, Junqian W ang, Lunke Fei, and Bob Zhang. Adaptive graph completion based incomplete multi-view clustering. IEEE T ransactions on Multimedia , 23:2493–2504, 2020. [140] Bassoma Diallo, Jie Hu, T ianrui Li, Ghufran Ahmad Khan, Xinyan Liang, and Hongjun W ang. Auto-attention mechanism for multi-view deep embedding clustering. P attern Recognition , page 109764, 2023. [141] W eixiang Shao, Lifang He, Chun-ta Lu, and S Y u Philip. Online multi-vie w clustering with incomplete vie ws. In 2016 IEEE International confer ence on big data (Big Data) , pages 1012–1017. IEEE, 2016. [142] W eixiang Shao, Lif ang He, and Philip S Y u. Multiple incomplete vie ws clustering via weighted nonne gativ e ma- trix factorization with regularization. In Joint Eur opean conference on machine learning and knowledg e discovery in databases , pages 318–334. Springer , 2015. [143] Shao-Y uan Li, Y uan Jiang, and Zhi-Hua Zhou. Partial multi-view clustering. Proceedings of the AAAI Confer ence on Artificial Intelligence , 28(1), Jun. 2014. [144] Handong Zhao, Hongfu Liu, and Y un Fu. Incomplete multi-modal visual data grouping. In IJCAI , pages 2392– 2398, 2016. [145] Peng Zhou, Xinwang Liu, Liang Du, and Xuejun Li. Self-paced adaptive bipartite graph learning for consensus clustering. ACM T ransactions on Knowledg e Discovery fr om Data , 17(5):1–35, 2023. [146] Changan Y uan, Zhi Zhong, Cong Lei, Xiaofeng Zhu, and Rongyao Hu. Adaptive reverse graph learning for rob ust subspace learning. Information Pr ocessing & Management , 58(6):102733, 2021. [147] Hao W ang, Y an Y ang, Bing Liu, and Hamido Fujita. A study of graph-based system for multi-vie w clustering. Knowledge-Based Systems , 163:1009–1019, 2019. [148] Y ann LeCun, Y oshua Bengio, and Geo ff rey Hinton. Deep learning. Nature , 521(7553):436–444, 2015. [149] Ian Goodfellow , Y oshua Bengio, Aaron Courville, and Y oshua Bengio. Deep learning (V ol. 1) . MIT press Cambridge, 2016. [150] Abderrahim Moujahid and Fadi Dornaika. Multi-scale multi-block covariance descriptor with feature selection. Neural Computing & Applications , 32:6283–6294, 2020. [151] Abdelmalik Moujahid and Fadi Dornaika. A pyramid multi-lev el face descriptor: application to kinship verifica- tion. Multimedia T ools and Applications , 78:9335–9354, 2019. [152] Rushi Lan, Y icong Zhou, and Y uan Y an T ang. Quaternionic local ranking binary pattern: a local descriptor of color images. IEEE T ransactions on Image Pr ocessing , 25(2):566–579, 2015. [153] Xiaoyang T an and Bill T riggs. Enhanced local texture feature sets for face recognition under di ffi cult lighting conditions. IEEE transactions on imag e pr ocessing , 19(6):1635–1650, 2010. [154] Renjie Lin, Shide Du, Shiping W ang, and W enzhong Guo. Multi-view clustering via optimal transport algorithm. Knowledge-Based Systems , 279:110954, 2023. 44 [155] Li Fei-Fei, R. Fergus, and P . Perona. Learning generative visual models from few training examples: An incre- mental bayesian approach tested on 101 object categories. In 2004 Conference on Computer V ision and P attern Recognition W orkshop , pages 178–178, 2004. [156] Shuran Song, Samuel P . Lichtenberg, and Jianxiong Xiao. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pages 567– 576, 2015. [157] Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE transactions on pattern analysis and machine intelligence , 36(3):453–465, 2013. [158] T at-Seng Chua, Jinhui T ang, Richang Hong, Haojie Li, Zhiping Luo, and Y antao Zheng. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the A CM international confer ence on image and video r etrieval , pages 1–9, 2009. [159] Canyi Lu, Shuicheng Y an, and Zhouchen Lin. Conv ex sparse spectral clustering: Single-view to multi-view . CoRR , abs / 1511.06860, 2015. [160] Amir Monadjemi, BT Thomas, and Majid Mirmehdi. Experiments on high resolution images towards outdoor scene classification. T echnical report. [161] John W inn and Nebojsa Jojic. Locus: Learning object classes with unsupervised segmentation. In T enth IEEE International Confer ence on Computer V ision (ICCV’05) V olume 1 , volume 1, pages 756–763. IEEE, 2005. [162] Adrian Benton, Huda Khayrallah, Biman Gujral, Dee Ann Reisinger , Sheng Zhang, and Raman Arora. Deep generalized canonical correlation analysis. In Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes W elbl, Alexis Conneau, Xiang Ren, and Marek Rei, editors, Pr oceedings of the 4th W orkshop on Representation Learning for NLP (RepL4NLP-2019) , pages 1–6, Florence, Italy , August 2019. Association for Computational Linguistics. 45

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment