Exploring Cross-model Neuronal Correlations in the Context of Predicting Model Performance and Generalizability
As Artificial Intelligence (AI) models are increasingly integrated into critical systems, the need for a robust framework to establish the trustworthiness of AI is increasingly paramount. While collaborative efforts have established conceptual foundations for such a framework, there remains a significant gap in developing concrete, technically robust methods for assessing AI model quality and performance. This paper introduces a novel approach for assessing a newly trained model’s performance based on another known model by calculating correlation between neural networks. The proposed method evaluates correlations by determining if, for each neuron in one network, there exists a neuron in the other network that produces similar output. This approach has implications for memory efficiency, allowing for the use of smaller networks when high correlation exists between networks of different sizes. Experiments on five fully connected networks and a two layer CNN trained on MNIST family datasets show that higher alignment with the CNN tracks stronger performance and smaller degradation under black box transfer based attacks. On ImageNet pretrained ResNets and DenseNets, partial layer comparisons recover intuitive architectural affinities, indicating that the procedure scales with reasonable approximations. These results support representational alignment as a lightweight compatibility check that complements standard accuracy, calibration, and robustness evaluations and enables early external validation of new models. Code is available at https://github.com/aheldis/Cross-model-Correlation.git.
💡 Research Summary
The paper addresses a pressing need in trustworthy AI: a lightweight, data‑independent metric that can be computed by an external auditor without access to a model’s training data or proprietary evaluation pipelines. The authors propose “cross‑model neuronal correlation,” a simple yet principled method that quantifies how similarly two neural networks encode a set of inputs at the level of individual neurons.
Methodology
- Probe Dataset – A small, unlabeled set of inputs (e.g., ten ImageNet validation images) is fed to both networks. For each neuron u in network F, an activation vector α_u is recorded across the probe samples; similarly for each neuron v in network G.
- Best‑Match per Neuron – For each u, the neuron v* in G that maximizes the absolute Pearson correlation |ρ(α_u,α_v)| is identified. The absolute value removes sign ambiguities caused by linear transformations or batch‑norm scaling.
- Depth‑Aware Penalty – To discourage matches across widely separated layers, the correlation is divided by 1 + |layer(u) − layer(v*)|, where layer(·) returns the integer depth index. This yields a per‑neuron score S(u;F→G).
- Symmetric Aggregation – The same procedure is performed from G to F, and the two directional averages are combined:
Corr(F,G) = ½
Comments & Academic Discussion
Loading comments...
Leave a Comment