An Explainable Multi-Task Similarity Measure: Integrating Accumulated Local Effects and Weighted Fréchet Distance

An Explainable Multi-Task Similarity Measure: Integrating Accumulated Local Effects and Weighted Fréchet Distance
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In many machine learning contexts, tasks are often treated as interconnected components with the goal of leveraging knowledge transfer between them, which is the central aim of Multi-Task Learning (MTL). Consequently, this multi-task scenario requires addressing critical questions: which tasks are similar, and how and why do they exhibit similarity? In this work, we propose a multi-task similarity measure based on Explainable Artificial Intelligence (XAI) techniques, specifically Accumulated Local Effects (ALE) curves. ALE curves are compared using the Fréchet distance, weighted by the data distribution, and the resulting similarity measure incorporates the importance of each feature. The measure is applicable in both single-task learning scenarios, where each task is trained separately, and multi-task learning scenarios, where all tasks are learned simultaneously. The measure is model-agnostic, allowing the use of different machine learning models across tasks. A scaling factor is introduced to account for differences in predictive performance across tasks, and several recommendations are provided for applying the measure in complex scenarios. We validate this measure using four datasets, one synthetic dataset and three real-world datasets. The real-world datasets include a well-known Parkinson’s dataset and a bike-sharing usage dataset – both structured in tabular format – as well as the CelebA dataset, which is used to evaluate the application of concept bottleneck encoders in a multitask learning setting. The results demonstrate that the measure aligns with intuitive expectations of task similarity across both tabular and non-tabular data, making it a valuable tool for exploring relationships between tasks and supporting informed decision-making.


💡 Research Summary

The paper addresses a fundamental gap in multi‑task learning (MTL): while many MTL methods assume that tasks are related, they rarely provide an interpretable measure of how and why tasks are similar. To fill this gap, the authors propose a model‑agnostic, explainable similarity metric that combines Accumulated Local Effects (ALE) plots—an XAI tool that captures the average marginal influence of each feature on a model’s predictions—with a weighted version of the Fréchet distance, a curve‑matching metric that respects both spatial proximity and the ordering of points along the curves.

Core methodology

  1. Task representation – After training each task (either in a single‑task or a joint MTL setting), the authors compute ALE curves for every feature of every task. ALE provides a global, unbiased view of feature impact, avoiding the bias of partial dependence plots.
  2. Weighting scheme – Two complementary weights are introduced: (a) a data‑distribution weight that reflects how many observations fall into each ALE interval, ensuring that densely populated regions dominate the distance; (b) a feature‑importance weight (FI) derived from any importance estimator (e.g., permutation importance, SHAP, model coefficients). The FI weights are normalized to sum to one across features.
  3. Fréchet distance adaptation – The discrete Fréchet distance is employed because it can be computed in O(pq) time for polygonal curves. The authors modify it by summing distances over all matched segments (instead of taking a maximum) and by incorporating the above weights, yielding a weighted Fréchet distance for each feature.
  4. Performance scaling – To prevent a poorly performing task from being deemed artificially similar to others, a scaling factor α based on a task‑level performance metric (e.g., MAE, R²) is multiplied into the final similarity score.
  5. Overall similarity – The final similarity between tasks i and j is defined as
    S(i,j) = α · Σ_k w_k · FI_k · d_Frechet(ALE_i^k , ALE_j^k)
    where k indexes features, w_k is the data‑distribution weight for feature k, FI_k is the normalized importance, and d_Frechet is the weighted Fréchet distance between the two ALE curves for that feature.

Experimental validation
Four datasets are used: (1) a synthetic dataset with known task clusters, (2) a Parkinson’s disease dataset, (3) a bike‑sharing usage dataset, and (4) the CelebA image dataset (evaluated via concept‑bottleneck encoders). In the synthetic case, the metric perfectly recovers the ground‑truth clusters. In the Parkinson’s data, tasks that share clinical symptoms (e.g., tremor, gait speed) exhibit low distances, matching medical intuition. In the bike‑sharing scenario, tasks grouped by similar temporal and weather patterns show high similarity, suggesting the metric can guide demand‑forecasting strategies. For CelebA, attributes such as “wearing glasses” and “mustache” produce ALE curves that are close, reflecting known visual correlations.

Insights and contributions

  • Provides a transparent similarity measure that explains why tasks are close (through feature‑level ALE comparisons).
  • Is model‑agnostic, applicable to tree‑based models, linear models, deep neural networks, and even concept‑bottleneck architectures.
  • Incorporates feature importance and data distribution, making the metric robust to noisy or sparsely populated regions.
  • Introduces a performance scaling factor to balance tasks of differing predictive quality.
  • Demonstrates applicability across tabular and non‑tabular data, showing the method’s versatility.

Limitations and future work
The approach assumes a common feature space or a pre‑defined mapping between features of different tasks; handling heterogeneous feature sets automatically remains an open problem. Computing ALE for high‑dimensional data can be costly, and the choice of FI estimator influences results. The authors suggest future research on automated feature alignment, dimensionality‑reduction techniques for ALE, and Bayesian integration of FI and distance measures.

Overall, the paper delivers a novel, explainable tool for quantifying task similarity in MTL, bridging the gap between performance‑driven multi‑task models and the need for human‑interpretable insights into task relationships.


Comments & Academic Discussion

Loading comments...

Leave a Comment