An innovative platform to improve the performance of exact string matching algorithms

Reading time: 6 minute
...

📝 Abstract

Exact String Matching is an essential issue in many computer science applications. Unfortunately, the performance of Exact String Matching algorithms, namely, executing time, does not address the needs of these applications. This paper proposes a general platform for improving the existing Exact String Matching algorithms executing time, called the PXSMAlg platform. The function of this platform is to parallelize the Exact String Matching algorithms using the MPI model over the Master or Slaves paradigms. The PXSMAlg platform parallelization process is done by dividing the Text into several parts and working on these parts simultaneously. This improves the executing time of the Exact String Matching algorithms. We have simulated the PXSMAlg platform in order to show its competence, through applying the Quick Search algorithm on the PXSMAlg platform. The simulation result showed significant improvement in the Quick Search executing time, and therefore extreme competence in the PXSMAlg platform.

💡 Analysis

Exact String Matching is an essential issue in many computer science applications. Unfortunately, the performance of Exact String Matching algorithms, namely, executing time, does not address the needs of these applications. This paper proposes a general platform for improving the existing Exact String Matching algorithms executing time, called the PXSMAlg platform. The function of this platform is to parallelize the Exact String Matching algorithms using the MPI model over the Master or Slaves paradigms. The PXSMAlg platform parallelization process is done by dividing the Text into several parts and working on these parts simultaneously. This improves the executing time of the Exact String Matching algorithms. We have simulated the PXSMAlg platform in order to show its competence, through applying the Quick Search algorithm on the PXSMAlg platform. The simulation result showed significant improvement in the Quick Search executing time, and therefore extreme competence in the PXSMAlg platform.

📄 Content

Computer science applications play a significant role in many fields, such as DNA analysis, artificial intelligence, and information retrieval, among various others. String matching is an important issue in many of these applications. It is the process of finding the occurrence of a Pattern P into a Text T, wherein T is longer than P. This occurrence is either exactly matched or partially matched with the Pattern. Accordingly, string matching algorithms are divided into two main categories: Exact-String-Matching algorithms and approximate string matching algorithms. Exactstring-matching algorithms are concerned with the number of occurrences of the pattern into a given text, while approximate string matching algorithms are concerned with the similarity percentage between the pattern and the text or any part of the text [1] [2]. This paper concentrates on Exact-String-Matching algorithms, such as the Boyer-Moore, Horspool, and Quick Search algorithms [3].

Currently, the world is witnessing a revolution in hardware efficiency, where a normal laptop can have a multi-core processor. To take advantage of this revolution, most of the applications are used in parallel computing, wherein a problem is divided into smaller problems, which are then processed simultaneously. Moreover, many parallel paradigms and models have been developed and proposed. The Master/Slave paradigm is a widely used paradigm in parallel computing. It is a Multi-Processors paradigm containing several nodes, one node is the master and the other nodes are the slaves. The master node is responsible for maintaining global data structures and partitioning the overall computational problem into smaller sub-problems, which are handed to the slaves to process for computation. On the other hand, the Message Passing Interface (MPI) is one of the well-known parallel models used in parallel computing above the hardware and memory architectures. In this paper, we will use the MPI model along with the Master/Slave paradigm to develop a general parallel platform and improve the Exact-String-Matching algorithms’ performance [4] [5] [6].

Sunday [7] proposed and designed a new algorithm for string matching, which is faster than the Boyer-moor algorithm and is considered one of the fastest algorithms in the string matching field. Its time and space complexity are O(m + n) and O (n), respectively. In terms of detecting matches between two strings, the quick search algorithm looks similar to the Boyer-moor algorithm. However, the difference between them is that the quick search algorithm only uses the bad-character shift table while the Boyer-Moore uses both bad-character shift and good suffix shift tables. Moreover, this algorithm starts searching from the left-most character to the right [7].

The rest of this paper is arranged as follows. Section 2 discusses some of the related works. Section 3 discusses the proposed platform, highlights the border problem, and shows the proposed platform performance. Finally, the conclusion is stated in Section 4.

There have been several research works on parallel Extract-String-Matching algorithms. For example, Raju and Babu [8] proposed a parallel technique for string matching algorithm. They considered the linear array with a reconfigurable pipelined bus system (LARPBS) and 2D LARPBS for string matching in their work, which has many existing applications such as cellular automata, computational biology, and string database. The proposed method introduced increases the speedup of the string matching process using LARPBS. They obtained time complexity O (1) for the string matching on 2D LARPBS where no preprocessing is done to the text and the pattern [8].

Park and George [9] presented a dataflow schemes string matching algorithms parallelization. In their work, they covered exact matching and kmismatched problems, which they consider as subproblems in the string matching field. The time complexity of the proposed parallel algorithm was O((n/d)+α), 0 ≤ α ≤ m, where n and m are the length of the text and pattern with (n » m) and d is the number of streams used. The parallelism degree can be controlled by changing the value of the variable d, which is present in the input streams. Due to the onepass dataflow algorithms, there was no preprocessing and memory space used for this schema [9].

Exact-String-Matching is one of the main problems in many computer applications. One of the Exact-String-Matching problems is the slow matching process between the Pattern and the Text. Parallel computing is a key technique used to reduce the time of the Exact-String-Matching process. In this paper, we have exploited one of the Parallel computing models, namely, the MPI model, in order to provide a general platform to parallelize the Exact-String-Matching algorithms. The proposed platform, called Parallel-Exact-Strings-Matching algorithm (PXSMAlg), can be applied in all the Exact-String-Matching algorithms, such as Quick Search. The P

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut