Benchmarking Deep Neural Networks for Modern Recommendation Systems

February 20, 2026

Reading time: 5 minute

...

📝 Original Info

Title: Benchmarking Deep Neural Networks for Modern Recommendation Systems
ArXiv ID: 2512.07000
Date: 2025-12-07
Authors: ** - Abderaouf Bahia* (Computer Science and Applied Mathematics Laboratory (LIMA), University of El Tarf, Algeria) - Inoussa Mouicheb (Computer Science and Applied Mathematics Laboratory (LIMA), University of El Tarf, Algeria) - Ibtissem Gasmia (School of Computer Science, University of Windsor, Canada) *Corresponding author: a.bahi@univ-eltarf.dz — **

📝 Abstract

This paper presents a requirement-oriented benchmark of seven deep neural architectures, CNN, RNN, GNN, Autoencoder, Transformer, Neural Collaborative Filtering, and Siamese Networks, across three real-world datasets: Retail E-commerce, Amazon Products, and Netflix Prize. To ensure a fair and comprehensive comparison aligned with the evolving demands of modern recommendation systems, we adopt a Requirement-Oriented Benchmarking (ROB) framework that structures evaluation around predictive accuracy, recommendation diversity, relational awareness, temporal dynamics, and computational efficiency. Under a unified evaluation protocol, models are assessed using standard accuracy-oriented metrics alongside diversity and efficiency indicators. Experimental results show that different architectures exhibit complementary strengths across requirements, motivating the use of hybrid and ensemble designs. The findings provide practical guidance for selecting and combining neural architectures to better satisfy multi-objective recommendation system requirements.

💡 Deep Analysis

📄 Full Content

1 Benchmarking Deep Neural Networks for Modern Recommendation System Abderaouf Bahia*, Inoussa Mouicheb and Ibtissem Gasmia aComputer Science and Applied Mathematics Laboratory (LIMA) Faculty of Science and Technology, Chadli Bendjedid University, P.O. Box 73, El Tarf 36000, Algeria bSchool of Computer Science, University of Windsor, ON, Canada *Corresponding author: Abderaouf Bahi (a.bahi@univ-eltarf.dz) Abstract—This paper presents a requirement-oriented benchmark of seven deep neural architectures, CNN, RNN, GNN, Autoencoder, Transformer, Neural Collabo- rative Filtering, and Siamese Networks, across three real- world datasets: Retail E-commerce, Amazon Products, and Netflix Prize. To ensure a fair and comprehensive comparison aligned with the evolving demands of mod- ern recommendation systems, we adopt a Requirement- Oriented Benchmarking (ROB) framework that structures evaluation around predictive accuracy, recommendation diversity, relational awareness, temporal dynamics, and computational efficiency. Under a unified evaluation proto- col, models are assessed using standard accuracy-oriented metrics alongside diversity and efficiency indicators. Ex- perimental results show that different architectures exhibit complementary strengths across requirements, motivating the use of hybrid and ensemble designs. The findings provide practical guidance for selecting and combining neural architectures to better satisfy multi- objective rec- ommendation system requirements. Index Terms—Recommender Systems; Requirement- Oriented Benchmarking; Deep Learning; Neural Net- works; Accuracy; Diversity. I. INTRODUCTION Technological advancements and evolving consumer behavior have driven an unprecedented expansion of the digital marketplace in recent years. In the first quarter of 2023 alone, online transactions increased by more than 8% compared to the previous year, reaching over 540 million transactions and generating revenues exceeding 41 billion euros [1]–[3]. This rapid growth underscores not only the scale of modern e-commerce platforms but also raises a critical question: how can digital systems effectively sustain user engagement and conversion in in- creasingly competitive and data-intensive environments? Recommendation systems play a central role in ad- dressing this challenge. By leveraging large volumes of user interaction data—such as preferences, purchase histories, and behavioral patterns. These systems aim to deliver personalized content that enhances user experi- ence and drives sales [4], [5]. Beyond predictive accu- racy, modern recommendation systems are increasingly expected to satisfy additional requirements, including recommendation diversity, relational awareness, tempo- ral adaptability, and scalability. In particular, diversity has emerged as a key factor in mitigating informational lock-in, where users are repeatedly exposed to similar or overly popular items, thereby limiting discovery and long-term engagement [6]–[8]. Encouraging exploration through diverse recommendations has been shown to improve user satisfaction and retention [9]–[11]. Despite these advances, achieving an effective balance between accuracy and diversity remains a significant challenge [12], [13]. Moreover, different neural net- work architectures exhibit varying strengths in address- ing these requirements, depending on how they model relationships, sequential behavior, or latent represen- tations. While prior studies have explored individual neural architectures for recommendation tasks, evalu- ation practices often remain fragmented, focusing on isolated performance metrics without explicitly account- ing for the multi-objective nature of modern recom- mendation systems. To address this limitation, we adopt a Requirement-Oriented Benchmarking (ROB) perspec- tive, which frames recommendation evaluation around a set of core system requirements, including predictive accuracy, diversity, relational modeling capability, tem- poral adaptability, and computational efficiency. Rather than proposing new models or metrics, ROB provides a structured lens for systematically comparing existing architectures under a unified and application-relevant evaluation setting. Under ROB, this study presents a comprehensive benchmark of seven neural network architectures, Con- arXiv:2512.07000v2 [cs.IR] 17 Jan 2026 2 volutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Graph Neural Networks (GNNs), Au- toencoders, Transformers, Neural Collaborative Filtering (NCF), and Siamese Networks, for item–item recom- mendation tasks. Using three real-world datasets from retail e-commerce, online product platforms, and media consumption, the models are evaluated under a unified experimental protocol with respect to both predictive accuracy and recommendation diversity. The objective is to identify which architectures are best suited to specific system requirements and to provide practical guidance for designing recommendation systems

📄 Read Full PDF on ArXiv