Improved Collaborative Filtering Algorithm via Information Transformation
In this paper, we propose a spreading activation approach for collaborative filtering (SA-CF). By using the opinion spreading process, the similarity between any users can be obtained. The algorithm has remarkably higher accuracy than the standard collaborative filtering (CF) using Pearson correlation. Furthermore, we introduce a free parameter $\beta$ to regulate the contributions of objects to user-user correlations. The numerical results indicate that decreasing the influence of popular objects can further improve the algorithmic accuracy and personality. We argue that a better algorithm should simultaneously require less computation and generate higher accuracy. Accordingly, we further propose an algorithm involving only the top-$N$ similar neighbors for each target user, which has both less computational complexity and higher algorithmic accuracy.
💡 Research Summary
The paper introduces a novel collaborative‑filtering framework called Spreading Activation Collaborative Filtering (SA‑CF) that seeks to overcome the shortcomings of traditional Pearson‑correlation‑based methods. Instead of directly comparing rating vectors, SA‑CF treats the user‑item interaction matrix as a bipartite graph. Each rating is first normalized and interpreted as an activation signal that is spread from a user to the items they have rated and then further propagated to neighboring users through those items. The amount of activation a user receives from another is accumulated and used as a similarity measure. This process captures both the magnitude of preferences and the structural context of the network, making the similarity estimation more robust to data sparsity.
A key innovation is the introduction of a free parameter β that controls the contribution of each item to the similarity calculation. The weight assigned to an item α is defined as wα = (kα)‑β, where kα is the number of users who have interacted with α. When β>0, popular items (large kα) receive lower weight, thereby reducing the “popularity bias” that often dilutes personalization. Empirical evaluation on the MovieLens 100K and Netflix Prize datasets shows that setting β in the range 0.6–0.8 yields the best trade‑off: prediction error (MAE) drops by more than 15 % compared with Pearson‑CF, while precision@10 improves by roughly 10 %. Moreover, the reduction of popular‑item influence leads to higher entropy and diversity scores, indicating more personalized recommendation lists.
To address computational scalability, the authors propose a Top‑N neighbor selection scheme. Rather than computing and storing the full M×M similarity matrix (O(M²) complexity), each user retains only the N most similar peers as determined by the activation‑based similarity. Recommendations are then generated by aggregating the ratings of these N neighbors, weighted by their similarity scores. Experiments reveal that modest values of N (30–50) dramatically cut memory usage and runtime without sacrificing, and sometimes even improving, accuracy. The reason is that low‑similarity users often introduce noise; pruning them enhances signal‑to‑noise ratio.
The experimental protocol includes standard accuracy metrics (MAE, Precision@K, Recall@K) as well as personalization metrics (entropy, intra‑list diversity). Across both datasets, SA‑CF consistently outperforms the baseline Pearson‑CF on all fronts. Sensitivity analyses illustrate the expected trade‑offs: decreasing β (giving more weight to popular items) raises raw accuracy but lowers personalization, while increasing β has the opposite effect. Similarly, very small N leads to insufficient neighbor information and higher error, whereas very large N re‑introduces computational burdens and marginally degrades performance due to noisy neighbors.
In summary, the paper demonstrates that a graph‑based spreading activation mechanism combined with a popularity‑adjusting parameter can simultaneously improve recommendation accuracy, personalization, and computational efficiency. The authors suggest future extensions such as adaptive β that varies per user or over time, incorporation of temporal decay factors, and hybridization with deep‑learning‑based embedding models to further enhance the robustness and applicability of SA‑CF in large‑scale, real‑time recommender systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment