Training-free Graph-based Imputation of Missing Modalities in Multimodal Recommendation
Multimodal recommender systems (RSs) represent items in the catalog through multimodal data (e.g., product images and descriptions) that, in some cases, might be noisy or (even worse) missing. In those scenarios, the common practice is to drop items with missing modalities and train the multimodal RSs on a subsample of the original dataset. To date, the problem of missing modalities in multimodal recommendation has still received limited attention in the literature, lacking a precise formalisation as done with missing information in traditional machine learning. In this work, we first provide a problem formalisation for missing modalities in multimodal recommendation. Second, by leveraging the user-item graph structure, we re-cast the problem of missing multimodal information as a problem of graph features interpolation on the item-item co-purchase graph. On this basis, we propose four training-free approaches that propagate the available multimodal features throughout the item-item graph to impute the missing features. Extensive experiments on popular multimodal recommendation datasets demonstrate that our solutions can be seamlessly plugged into any existing multimodal RS and benchmarking framework while still preserving (or even widen) the performance gap between multimodal and traditional RSs. Moreover, we show that our graph-based techniques can perform better than traditional imputations in machine learning under different missing modalities settings. Finally, we analyse (for the first time in multimodal RSs) how feature homophily calculated on the item-item graph can influence our graph-based imputations.
💡 Research Summary
The paper addresses the pervasive issue of missing modalities in multimodal recommender systems, where items may lack visual, textual, or other side‑information. Traditional practice discards such items, worsening data sparsity and limiting the benefits of multimodal signals. The authors first formalize the problem, distinguishing it from generic missing‑feature scenarios in machine learning. They then reinterpret missing modality imputation as a graph‑based feature interpolation task on an item‑item co‑purchase graph derived from user‑item interactions.
Four training‑free imputation methods are proposed: (1) simple mean propagation, (2) weighted mean propagation using edge weights reflecting co‑purchase frequency, (3) Laplacian smoothing, and (4) homophily‑based propagation that leverages a pre‑computed feature homophily score to give higher weight to similar items. These methods operate as a pre‑processing step, requiring no additional model training and can be plugged into any existing multimodal recommender.
Extensive experiments were conducted on six Amazon sub‑datasets (Music, Beauty, Clothing, etc.) and a short‑video dataset, covering two‑modal and three‑modal settings. Fourteen state‑of‑the‑art multimodal recommenders—including VBPR, NGCF‑M, LightGCN‑M, FREEDOM, BM3, MGCN, MMSSL, LGMRec, DiffMM, and MENTOR—were evaluated. Missing‑modality rates were varied from 20 % to 80 %. Across all settings, the graph‑based imputations consistently outperformed traditional baselines such as zero‑fill, global mean, and random imputation, achieving average gains of 3–7 percentage points in Recall@20 and NDCG@20. Notably, when the missing rate was high (≥60 %), the performance gap between multimodal models and classic collaborative‑filtering baselines widened dramatically, sometimes more than doubling, demonstrating that effective imputation restores the advantage of multimodal signals.
The homophily‑based method yielded the strongest improvements, highlighting the importance of feature similarity among neighboring items. The authors quantified feature homophily by combining cosine similarity of modality embeddings with co‑purchase frequencies, showing a positive correlation between homophily levels and imputation effectiveness.
Implementation-wise, the authors released code compatible with popular frameworks such as Elliot and MMRec, confirming that the proposed pipeline can be seamlessly integrated into existing research and production pipelines without retraining.
In summary, the work introduces a novel, model‑agnostic, and computationally cheap solution to missing modalities in multimodal recommendation. By exploiting the inherent structure of the item‑item graph, it restores and even amplifies the benefits of multimodal features, offering a practical tool for both researchers and industry practitioners. Future directions suggested include extending the approach to dynamic or heterogeneous graphs and exploring meta‑learning strategies for adaptive imputation.
Comments & Academic Discussion
Loading comments...
Leave a Comment