A Channel Coding Perspective of Collaborative Filtering

Reading time: 5 minute
...

📝 Original Info

  • Title: A Channel Coding Perspective of Collaborative Filtering
  • ArXiv ID: 0908.2494
  • Date: 2016-11-17
  • Authors: ** - 논문에 명시된 저자 정보는 제공되지 않았습니다. (일반적으로 해당 논문은 2021년 7월 12일에 초안이 작성된 것으로 보이며, 이후 학술지 혹은 arXiv에 제출되었을 가능성이 있습니다.) **

📝 Abstract

We consider the problem of collaborative filtering from a channel coding perspective. We model the underlying rating matrix as a finite alphabet matrix with block constant structure. The observations are obtained from this underlying matrix through a discrete memoryless channel with a noisy part representing noisy user behavior and an erasure part representing missing data. Moreover, the clusters over which the underlying matrix is constant are {\it unknown}. We establish a sharp threshold result for this model: if the largest cluster size is smaller than $C_1 \log(mn)$ (where the rating matrix is of size $m \times n$), then the underlying matrix cannot be recovered with any estimator, but if the smallest cluster size is larger than $C_2 \log(mn)$, then we show a polynomial time estimator with diminishing probability of error. In the case of uniform cluster size, not only the order of the threshold, but also the constant is identified.

💡 Deep Analysis

📄 Full Content

size and rank of the matrix. In [6], a lower bound is established on the number of samples needed by any algorithm. The order of this lower bound is shown to be achievable in [12]. In [14], the problem of matrix recovery from linear measurements (of which sampling is a special case) is considered and a new algorithm is proposed. In [4], the problem of matrix completion under bounded noise is considered. A semi-definite programming based algorithm is proposed and shown to have recovery error proportional to the noise magnitude.

In this paper, we take an alternative channel coding viewpoint of the problem. Our results differ from the above works in several aspects outlined below.

• We consider finite alphabet for the ratings and a different model for the rating matrix based on row and column clusters.

• We consider noisy user behavior, and our goal is not to complete the missing entries, but to estimate an underlying “block constant” matrix (in the limit as the matrix size grows).

• Since we consider a finite alphabet, even in the presence of noise, error free recovery is asymptotically feasible. Hence, unlike [4], which considers real-valued matrices, we do not allow any distortion.

We next outline our model and results.

We consider a finite alphabet for the ratings. In this section, we briefly outline our model and results without any mathematical details; the details can be found in subsequent sections.

To motivate our model, consider an ideal situation where every user rates every item without any noise.

In this ideal scenario, it is reasonable to expect that similar users rate similar items by the same value.

We therefore assume that the users (items) are clustered into groups of similar users (items, respectively).

The rating matrix in this ideal situation (say X with size m × n) is then a block constant matrix (where the blocks correspond to cartesian product of row and column clusters). The observations are obtained from X by passing its entries through a discrete memoryless channel (DMC) consisting of an erasure channel modeling missing data and a noisy DMC representing noisy user behavior. Moreover, the row and column clusters are unknown. The goal is to make recommendations by estimating X based on the observations. The performance metric we use is the probability of block error: we make an error if any of the entries in the estimate is erroneous. Our goal is to identify conditions under which error free recovery is possible in the limit as the matrix size grows large. Thus we view the recommendation system problem as a channel coding problem.

The cluster sizes in our model represent the resolution: the larger the cluster, the smaller are the degrees of freedom (or rate of the channel code). If the channel is more noisy and the erasures are high, then we can only support a small number of codewords. The challenge is to find the exact order. For our model, we show that if the largest cluster size (defined precisely in Section III) is smaller than C 1 log(mn),

where C 1 is a constant dependent on the channel parameters, then for any estimator the probability of error approaches one. On the other hand, if the smallest cluster size (defined precisely in Section III) is larger than C 2 log(mn), where C 2 is a constant dependent on the channel parameters, then we give a polynomial time algorithm that has diminishing probability of error. Thus we identify the order of the threshold exactly. In the case of uniform cluster size, the constants C 1 and C 2 are identical and thus in this special case, even the constant is identified precisely. Moreover, for the special case of binary ratings and uniform cluster size, the algorithm used to show the achievability part does nor depend on the cluster size, erasure parameter, and needs knowledge of a worst case parameter for the noisy part of the channel. These results are obtained by averaging over X (as per the probability law specified in Section II).

The achievability part of our result is shown by first clustering the rows and columns, and then estimating the matrix entries assuming that the clustering is correct. The clustering is done by computing a normalized Hamming metric for every pair of rows and comparing with a threshold to determine if the rows are in same cluster or not. The converse is proved by considering the case when the clusters are known exactly. Our results for the average case show that the threshold is determined by the problem of estimating entries, and relatively, clustering is an easier task (see Figure 1 for an illustration).

The precise model for X and the observations is stated in Section II. The case of uniform cluster size and binary ratings leads to sharper bounds and results. Hence results for this case are given in Section III.

The case of general alphabets and non-uniform cluster sizes is considered in Section IV. The conclusion is given in Section V, while all the proofs are collected together in Section VI.

Al

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut