In this paper we show that set-intersection is harder than distance oracle on sparse graphs. Given a collection of total size n which consists of m sets drawn from universe U, the set-intersection problem is to build a data structure which can answer whether two sets have any intersection. A distance oracle is a data structure which can answer distance queries on a given graph. We show that if one can build distance oracle for sparse graph G=(V,E), which requires s(|V|,|E|) space and answers a (2-\epsilon,c)-approximate distance query in time t(|V|,|E|) where (2-\epsilon) is a multiplicative error and c is a constant additive error, then, set-intersection can be solved in t(m+|U|,n) time using s(m+|U|,n) space.
Let G = (V, E) be a graph. The all-pairs shortest paths problem (APSP) requires to construct a data structure for a given graph G so that the exact distance between every two vertices on that graph can be retrieved efficiently. This problem is one of the most fundamental graph problems of computer science. Despite the importance of this problem, there is still no efficient solution for it using less than O(|V | 2 ) space. When the graph is dense, i.e., when |E| = O(|V | 2 ), this space is not much. But, for sparse graphs where |E| = O(|V |) this is extremely a lot of space.
Thorup and Zwick [1] explored an alternative for the APSP problem. They introduced a solution for the approximate distance oracle, which is a data structure that answers approximate distance queries in a graph. They achieved that, for any integer k ≥ 1, an undirected weighted graph with n vertices and m edges can be preprocessed in expected O(kmn 1/k ) time to construct a data structure of size O(kn 1+1/k ) that can answer any (2k -1)-approximate distance query in O(k) time. This means that the distance oracle answers distance queries with multiplicative error of 2k -1.
In this paper we show by a reduction from the set intersection problem that it is hard to build a (2ǫ, c)-approximate distance oracle where 2ǫ is a multiplicative error and c is a constant additive error.
In the set intersection problem, we are given a collection of sets which we can preprocess. Then, given two sets we need to answer quickly whether there is any intersection between the sets. This is a common problem in many fields, especially in retrieval algorithms and search engines. The formal definition of the problem is as follows: Definition 1. Let D be a database consisting of a collection of m sets drawn from universe U , S 1 , . . . , S m ⊆ U . Denote n to be the input size, i.e., n = m i=1 |S i |. The set intersection problem is to build a data structure that given a query of two indices i, j ≤ m, can answer if sets S i and S j have any intersection.
Cohen and Porat [2] showed how the set intersection problem can be solved in O( √ n) query time using O(n) space. Their solution is based on dividing the sets in the database D to large and non-large sets, where they define a large set to be a set which has more than √ n elements. They construct a set intersection matrix for the large sets in D, which is a matrix saving for each pair of sets if there is any intersection between them. They showed that the number of large sets is at most √ n, thus, this matrix costs √ n× √ n = O(n) bits space. Moreover, for each set in D they store a static hash table to retrieve in O(1) time if an element belongs to that set or not.
Given a query consisting of two indices i, j, if both S i and S j are large sets, the answer can be retrieved from the set intersection matrix in O(1) time. Otherwise, one of the sets is a non-large set, i.e., it has less than √ n elements. On this case, the answer can be retrieved by going over all the elements of the smaller set, checking for each one of them if it belongs to the other set in O(1) time. Because non-large sets have at most O( √ n) elements, this takes at most O( √ n) time. This solution can be easily extended to a tunable solution. If we define a large set to be a set with more than t elements, the number of large sets can be at most n t sets. Thus, the set intersection matrix costs O( n 2 t 2 ) space. Hence, this problem can be answered in O(t) query time using O( n 2 t 2 ) space. In this paper we show a reduction from the set intersection problem to distance oracle on sparse graphs. In Sect. 2 we show that if one can build a distance oracle using s(|V |, |E|) space with t(|V |, |E|) query time, which answers (2ǫ)-approximate distance queries, the set intersection problem can be solved in t(m + |U |, n) query time using s(m + |U |, n) space. In Sect. 3 we extend the reduction to a (2ǫ)-approximate distance oracle with constant additive error.
In the next theorem we claim that if one can build a distance oracle that answers (2ǫ)-approximate distance queries, the set intersection problem can be solved. Proof. For the set intersection problem we are given a database D consisting of m sets drawn from universe U , S 1 , . . . , S n ⊆ U . We denote n to be the input size, i.e., n = m i=1 S i . We construct a bipartite graph with two disjoint sets of vertices: V 1 with vertices for each set in D and V 2 with vertices for each element in U . Hence,
The edges between V 1 and V 2 are simple, if an element e ∈ U belongs to a set s then there is an edge between the corresponding vertices in the bipartite graph. Because this graph is a bipartite graph it is simple that the distance between each two vertices must be even. We can see that if two sets have any intersection between them, the distance between the corresponding vertices is 2. The number of edges on this graph is bounded by n. We construct a distance oracle for this graph which ans
This content is AI-processed based on open access ArXiv data.