Title: Benchmarking Deep Neural Networks for Modern Recommendation Systems
ArXiv ID: 2512.07000
Date: 2025-12-07
Authors: ** - Abderaouf Bahia* (Computer Science and Applied Mathematics Laboratory (LIMA), University of El Tarf, Algeria) - Inoussa Mouicheb (Computer Science and Applied Mathematics Laboratory (LIMA), University of El Tarf, Algeria) - Ibtissem Gasmia (School of Computer Science, University of Windsor, Canada) *Corresponding author: a.bahi@univ-eltarf.dz — **
📝 Abstract
This paper presents a requirement-oriented benchmark of seven deep neural architectures, CNN, RNN, GNN, Autoencoder, Transformer, Neural Collaborative Filtering, and Siamese Networks, across three real-world datasets: Retail E-commerce, Amazon Products, and Netflix Prize. To ensure a fair and comprehensive comparison aligned with the evolving demands of modern recommendation systems, we adopt a Requirement-Oriented Benchmarking (ROB) framework that structures evaluation around predictive accuracy, recommendation diversity, relational awareness, temporal dynamics, and computational efficiency. Under a unified evaluation protocol, models are assessed using standard accuracy-oriented metrics alongside diversity and efficiency indicators. Experimental results show that different architectures exhibit complementary strengths across requirements, motivating the use of hybrid and ensemble designs. The findings provide practical guidance for selecting and combining neural architectures to better satisfy multi-objective recommendation system requirements.
💡 Deep Analysis
📄 Full Content
1
Benchmarking Deep Neural Networks for
Modern Recommendation System
Abderaouf Bahia*, Inoussa Mouicheb and Ibtissem Gasmia
aComputer Science and Applied Mathematics Laboratory (LIMA)
Faculty of Science and Technology, Chadli Bendjedid University, P.O. Box 73, El Tarf 36000, Algeria
bSchool of Computer Science, University of Windsor, ON, Canada
*Corresponding author: Abderaouf Bahi (a.bahi@univ-eltarf.dz)
Abstract—This paper presents a requirement-oriented
benchmark of seven deep neural architectures, CNN,
RNN, GNN, Autoencoder, Transformer, Neural Collabo-
rative Filtering, and Siamese Networks, across three real-
world datasets: Retail E-commerce, Amazon Products,
and Netflix Prize. To ensure a fair and comprehensive
comparison aligned with the evolving demands of mod-
ern recommendation systems, we adopt a Requirement-
Oriented Benchmarking (ROB) framework that structures
evaluation around predictive accuracy, recommendation
diversity, relational awareness, temporal dynamics, and
computational efficiency. Under a unified evaluation proto-
col, models are assessed using standard accuracy-oriented
metrics alongside diversity and efficiency indicators. Ex-
perimental results show that different architectures exhibit
complementary strengths across requirements, motivating
the use of hybrid and ensemble designs. The findings
provide practical guidance for selecting and combining
neural architectures to better satisfy multi- objective rec-
ommendation system requirements.
Index Terms—Recommender Systems; Requirement-
Oriented Benchmarking; Deep Learning; Neural Net-
works; Accuracy; Diversity.
I. INTRODUCTION
Technological advancements and evolving consumer
behavior have driven an unprecedented expansion of the
digital marketplace in recent years. In the first quarter of
2023 alone, online transactions increased by more than
8% compared to the previous year, reaching over 540
million transactions and generating revenues exceeding
41 billion euros [1]–[3]. This rapid growth underscores
not only the scale of modern e-commerce platforms but
also raises a critical question: how can digital systems
effectively sustain user engagement and conversion in in-
creasingly competitive and data-intensive environments?
Recommendation systems play a central role in ad-
dressing this challenge. By leveraging large volumes
of user interaction data—such as preferences, purchase
histories, and behavioral patterns. These systems aim to
deliver personalized content that enhances user experi-
ence and drives sales [4], [5]. Beyond predictive accu-
racy, modern recommendation systems are increasingly
expected to satisfy additional requirements, including
recommendation diversity, relational awareness, tempo-
ral adaptability, and scalability. In particular, diversity
has emerged as a key factor in mitigating informational
lock-in, where users are repeatedly exposed to similar
or overly popular items, thereby limiting discovery and
long-term engagement [6]–[8]. Encouraging exploration
through diverse recommendations has been shown to
improve user satisfaction and retention [9]–[11].
Despite these advances, achieving an effective balance
between accuracy and diversity remains a significant
challenge [12], [13]. Moreover, different neural net-
work architectures exhibit varying strengths in address-
ing these requirements, depending on how they model
relationships, sequential behavior, or latent represen-
tations. While prior studies have explored individual
neural architectures for recommendation tasks, evalu-
ation practices often remain fragmented, focusing on
isolated performance metrics without explicitly account-
ing for the multi-objective nature of modern recom-
mendation systems. To address this limitation, we adopt
a Requirement-Oriented Benchmarking (ROB) perspec-
tive, which frames recommendation evaluation around
a set of core system requirements, including predictive
accuracy, diversity, relational modeling capability, tem-
poral adaptability, and computational efficiency. Rather
than proposing new models or metrics, ROB provides
a structured lens for systematically comparing existing
architectures under a unified and application-relevant
evaluation setting.
Under ROB, this study presents a comprehensive
benchmark of seven neural network architectures, Con-
arXiv:2512.07000v2 [cs.IR] 17 Jan 2026
2
volutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Graph Neural Networks (GNNs), Au-
toencoders, Transformers, Neural Collaborative Filtering
(NCF), and Siamese Networks, for item–item recom-
mendation tasks. Using three real-world datasets from
retail e-commerce, online product platforms, and media
consumption, the models are evaluated under a unified
experimental protocol with respect to both predictive
accuracy and recommendation diversity. The objective is
to identify which architectures are best suited to specific
system requirements and to provide practical guidance
for designing recommendation systems