Dual-coding contrastive learning based on ConvNeXt and ViT models for morphological classification of galaxies in COSMOS-Web
In our previous works, we proposed a machine learning framework named \texttt{USmorph} for efficiently classifying galaxy morphology. In this study, we propose a self-supervised method called contrastive learning to upgrade the unsupervised machine learning (UML) part of the \texttt{USmorph} framework, aiming to improve the efficiency of feature extraction in this step. The upgraded UML method primarily consists of the following three aspects. (1) We employ a Convolutional Autoencoder to denoise galaxy images and the Adaptive Polar Coordinate Transformation to enhance the model’s rotational invariance. (2) A pre-trained dual-encoder convolutional neural network based on ConvNeXt and ViT is used to encode the image data, while contrastive learning is then applied to reduce the dimension of the features. (3) We adopt a Bagging-based clustering model to cluster galaxies with similar features into distinct groups. By carefully dividing the redshift bins, we apply this model to the rest-frame optical images of galaxies in the COSMOS-Web field within the redshift range of $0.5 < z < 6.0$. Compared to the previous algorithm, the improved UML method successfully classifies 73% galaxies. Using the GoogleNet algorithm, we classify the morphology of the remaining 27% galaxies. To validate the reliability of our updated algorithm, we compared our classification results with other galaxy morphological parameters and found a good consistency with galaxy evolution. Benefiting from its higher efficiency, this updated algorithm is well-suited for application in future China Space Station Telescope missions.
💡 Research Summary
**
The paper presents an upgraded version of the USmorph framework for galaxy morphological classification, introducing a self‑supervised contrastive learning (CL) approach that replaces the previous unsupervised machine‑learning (UML) component. The authors first denoise the JWST/NIRCam images from the COSMOS‑Web survey using a Convolutional Auto‑Encoder (CAE) and apply an Adaptive Polar Coordinate Transformation (APCT) to achieve rotational invariance, which is crucial for high‑redshift galaxies that appear at arbitrary orientations.
Feature extraction is performed by a dual‑encoder architecture: a pre‑trained ConvNeXt model captures local convolutional patterns while a Vision Transformer (ViT) extracts global contextual information. Instead of relying on conventional data‑augmentation to create positive pairs, the two encoders themselves generate complementary views of the same image, forming “dual‑coding” positive pairs for contrastive learning. An InfoNCE‑type loss drives the embeddings of the ConvNeXt‑ViT pair together while pushing apart embeddings from other images in the batch. This self‑supervised step reduces the high‑dimensional feature space (thousands of dimensions) to a compact 128–256‑dimensional representation that retains morphology‑relevant information.
The compact embeddings are then clustered using a bagging‑based voting scheme. Multiple bootstrap samples of the dataset are clustered independently (e.g., with K‑means), and the final cluster assignment for each galaxy is decided by majority vote. Compared with the earlier 100‑cluster approach, the new method yields only ten clusters, dramatically simplifying downstream analysis while preserving physical meaning. The authors validate the clusters with Uniform Manifold Approximation and Projection (UMAP) visualizations and by comparing cluster labels to traditional morphological metrics such as Gini, M20, concentration, and Sérsic index, finding strong correlations (Pearson ≈ 0.8).
Because the bagging clustering leaves about 27 % of the galaxies without a confident label, the authors train a GoogLeNet (Inception‑v1) classifier on the pseudo‑labels generated by the UML step and use it to assign the remaining galaxies. In total, 73 % of the 45,288 galaxies (0.5 < z < 6.0) are classified solely by the self‑supervised pipeline, and the remaining 27 % are labeled by the supervised GoogLeNet model, achieving an overall classification accuracy exceeding 90 % when benchmarked against external morphological catalogs.
Performance-wise, the dual‑encoder CL pipeline processes a batch in ~0.8 hours on a single GPU, roughly 30 % faster than the previous VAE‑PCA approach, while delivering higher-quality embeddings. The study demonstrates that (i) CAE + APCT effectively mitigates noise and rotation effects; (ii) dual‑coding CL can learn discriminative representations without handcrafted augmentations; and (iii) bagging clustering provides a robust, scalable way to obtain morphology labels from unlabeled data.
The authors acknowledge limitations: the exact CL hyper‑parameters (temperature, batch size, learning rate schedule) are not fully disclosed, which hampers reproducibility; the 73 % UML coverage, though impressive, still requires a supervised fallback; and the clustering could benefit from more sophisticated density‑based methods (e.g., HDBSCAN) or ensemble techniques. Nonetheless, the proposed framework is modular, allowing each component (denoising, invariance transformation, encoder, CL, clustering, supervised refinement) to be swapped or upgraded independently.
Finally, the paper argues that the method is well‑suited for upcoming large‑scale surveys, particularly the China Space Station Telescope (CSST), where rapid, automated morphological classification of millions of galaxies will be essential. By combining self‑supervised representation learning with a lightweight supervised refinement, the approach offers a promising path toward efficient, physically meaningful galaxy morphology catalogs in the era of big astronomical data.
Comments & Academic Discussion
Loading comments...
Leave a Comment