Computer Science / Distributed Computing

MapReduce for Integer Factorization

February 23, 2026

Reading time: 5 minute

...

#Distributed Computing #Computer Science

📝 Original Info

Title: MapReduce for Integer Factorization
ArXiv ID: 1001.0421
Date: 2010-01-05
Authors: ** - 논문에 명시된 저자 정보가 제공되지 않았습니다. (원문에 저자명 미기재) **

📝 Abstract

Integer factorization is a very hard computational problem. Currently no efficient algorithm for integer factorization is publicly known. However, this is an important problem on which it relies the security of many real world cryptographic systems. I present an implementation of a fast factorization algorithm on MapReduce. MapReduce is a programming model for high performance applications developed originally at Google. The quadratic sieve algorithm is split into the different MapReduce phases and compared against a standard implementation.

💡 Deep Analysis

📄 Full Content

The security of many cryptographic algorithms relies on the fact that factoring large integers is a very computationally intensive task. In particular RSA [1] would be vulnerable if there was an efficient algorithm to factor semiprimes (products of two primes). This could have severe consequences, as RSA is one of the most widely used algorithms in electronic commerce applications [2].

There are many algorithms for integer factorization [3]. From the trivial trial division to the classical Fermat’s factorization method [4] and Euler’s factoring method [5] to the modern algorithms, the quadratic sieve [6] and the number field sieve [7]. In particular the number field sieve algorithm was used in 1996 to factor a 512 bit integer [8], the lowest integer length used in commercial RSA implementations. There have been several other big integers factored over the course of the last decade. I would like to point out that in those cases the feat was accomplished with tremendous effort developing the software and a very considerable investment in hardware [9], [10].

In what follows I will expose how MapReduce, a distributed computational framework, can be used for integer factorization. As an example I will show an implementation of the quadratic sieve algorithm. I will also compare in terms of performance and cost a conventional implementation with the MapReduce implementation.

I claim no participation in the development of the MapReduce framework. This section is basically a short extract of the original MapReduce paper by Jeff Dean and Sanjay Ghemawat [11]. MapReduce is a programming model inspired in computational programming. Users can specify two functions, map and reduce. The map function processes a series of (key, value) pairs, and outputs intermediate (key, value) pairs. The system automatically orders and groups all (key, value) pairs for a particular key, and passes them to the reduce function. The reduce function receives a series of values for a single key, and produces its output, which is sometimes a synthesis or aggregation of the intermediate values.

The canonical example of a MapReduce computation is the construction of an inverted index. Let’s take a collection of documents D = {D 0 , D 1 , …, D N } which are composed of words D 0 = (d 0,0 , d 0,1 , …, d 0,L0 ) , D 1 = (d 1,0 , d 1,1 , …, d 1,L1 ) and so on. We define a map function the following way:

that is, for a given document it processes each word in the document and outputs an intermediate pair. The key is the word itself, and the value is the location in the corpus, indicated as (document, position). The reduce function is defined as:

For a collection of pairs with the same key (the same word), it outputs a new pair, in which the key is the same, and the value is the aggregation of the intermediate values. In this case, the set of locations (document and position in the document) in which the word can be found in the corpus.

The MapReduce implementation automatically takes care of the parallel execution in a distributed system, data transmission, fault tolerance, load balancing and many other aspects of a high performance parallel computation. The MapReduce model escales seamlessly to thousands of machines. It is used continously for a multitude of real world applications, from machine learning to graph computations. And most importantly the effort required to develop a high performance parallel application with MapReduce is much lower than using other models, like for example MPI [12].

The Quadratic Sieve algorithm was conceived by Carl Pomerance in 1981. A detailed explanation of the algorithm can be found in [13]. Here we will just review the basic steps. Let N be the integer that we are trying to factor. We will attempt to find a, b such that:

Lets define:

) is a perfect square, then:

. That is, each component j of v i is the exponent of p j in the factorization of x i modulo 2. For example, for B = 4:

In conclussion, in order to find a subset of x 1 , …, x L which is a perfect square, we just need to solve the linear system:

. . .

3.2. Sieving for smooth numbers. Back to the original problem, we just need to find a convenient set {x 1 , x 2 , …, x L } such that {Q (x 1 ) , Q (x 2 ) , …, Q (x L )} are B-smooth numbers for a particular B. First of all, lets notice that we don’t need to consider every prime number ≤ B. If a prime p verifies: p | Q(x) for some x then:

Because N is a quadratic residue modulo p if and only if the Legendre symbol of n over p is 1. We will take a set of primes which verifies that property and we will call it factor base.

In order to consider smaller values of Q(x) we will take values of

Both B above and M here are chosen as indicated in [13].

In order to factor all the Q(x i ) we will use a method called sieving which is what gives the quadratic sieve its name. Notice that p

We can solve the equation Q(x) ≡ 0 mod (p) ⇔ x 2 -N ≡ 0 mod (p) efficiently and obtain two solutions s 1 ,

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on open access ArXiv data.

MapReduce for Integer Factorization

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

A Data-Parallel Version of Aleph

A Weakly-Robust PTAS for Minimum Clique Partition in Unit Disk Graphs

A taxonomic Approach to Topology Control in Ad-hoc and Wireless Networks

Start searching

No results found