Parallel versions of the symbolic manipulation system FORM

Reading time: 6 minute
...

📝 Original Info

  • Title: Parallel versions of the symbolic manipulation system FORM
  • ArXiv ID: 1006.2099
  • Date: 2010-06-10
  • Authors: M. Tentyukov, J. A. M. Vermaseren, J. Vollinga

📝 Abstract

The symbolic manipulation program FORM is specialized to handle very large algebraic expressions. Some specific features of its internal structure make FORM very well suited for parallelization. We have now two parallel versions of FORM, one is based on POSIX threads and is optimal for modern multicore computers while another one uses MPI and can be used to parallelize FORM on clusters and Massive Parallel Processing systems. Most existing FORM programs will be able to take advantage of the parallel execution without the need for modifications.

💡 Deep Analysis

Deep Dive into Parallel versions of the symbolic manipulation system FORM.

The symbolic manipulation program FORM is specialized to handle very large algebraic expressions. Some specific features of its internal structure make FORM very well suited for parallelization. We have now two parallel versions of FORM, one is based on POSIX threads and is optimal for modern multicore computers while another one uses MPI and can be used to parallelize FORM on clusters and Massive Parallel Processing systems. Most existing FORM programs will be able to take advantage of the parallel execution without the need for modifications.

📄 Full Content

The symbolic manipulation system FORM [1] which is available already more than 20 years, is specialized to handle very large algebraic expressions of billions of terms in an efficient and reliable way. It is widely used, in particular in the framework of perturbative Quantum Field Theory, where sometimes hundreds of thousands of Feynman diagrams have to be computed; most of the spectacular calculations of refs [2,3] would hardly have been possible with other available systems. However, the abilities of FORM are also quite useful in other fields of science where the manipulation of huge expressions is necessary.

Parallelization is one of the most efficient ways to increase performance. Some internal specifics [4] make FORM very well suitable for parallelization so the idea to parallelize FORM is quite natural.

The general concept of FORM parallelization is as follows [4,5,6]: upon the startup, the program launches a master and several workers. FORM treats each expression individually, which allows the master to split incoming expressions into independent chunks. Each chunk is processed by workers in parallel, and then the master collects the results.

At present, we have two different models [5,6]: in ParFORM [4] the master and workers are independent processes communicating via MPI1 and in TFORM [6] master and workers are separate threads2 of a multithreaded process.

Both models require almost no special efforts for parallel programming, all FORM programs may be executed in parallel without any changings. The user may give FORM some hints of how to parallelize some things better; these hints are simply ignored by the sequential version of FORM.

Since TFORM uses common address space, it is runnable only on SMP computers. On the other hand, sometimes it permits more efficient parallelization, and it does not depend on MPI which make it much easier for deployment. ParFORM can be used not only on SMP computers but also in clusters and Massive Parallel Processors (MPP).

Both ParFORM and TFORM demonstrate approximately the same speedup [5,6]. Here we discuss TFORM running the Multiple Zeta Value program [7] on the computer “qftquad5” at DESY. The computer has 96 GB of main memory and 8 independent CPU cores; the effective number of CPU cores is 16 due to hyperthreading. The results are given in Fig. 1.

For reference, the run with FORM (the sequential version) took 57078 sec.

We see three regions: first, the speedup is almost linear up to 8 workers; second, the speedup is also almost linear in the range of 8-16 workers but with much less slope, and after 16 workers we observe a saturation. When we looked at the total amount of CPU time used, Fig. 2, we see the total CPU time is more or less constant up to 8 workers and above 16 workers. In the range of 8-16 workers however it increases steadily. This is responsible for the slower decline in real time in the first graph, because the pseudo efficiency (total CPU time divided by real time and divided by number of workers) remains more or less the same in this range. This is behaviour that is typical for hyperthreading. The total amount of work that can be obtained from this computer is about 9.5 times the amount that can be obtained from a single core.

The analysis of the data reveals also that TFORM needs about 20% overhead for the Multiple Zeta Program. This is more than for programs like Mincer. This may be due to the use of brackets from the master expression which may involve copious use of locks. This is still not completely clear though. The result is that for 8 workers the pseudo speedup (total CPU time divided by realtime) is 7.63 while the real speedup (compared to the FORM run) is 6.22. Of course, this is still very good. The maximum improvement we obtained was 7.45 for a run with 17 workers.

Over the past years parallel FORM versions have picked up a number of new features:

• Dollar variables. By default, both ParFORM and TFORM switch into the sequential mode for each module which gives dollar variables a value during execution. But there are common cases when some dollar variables obtained from each term in each chunk can be processed in parallel in order to get a minimum value, a maximum, or a sum of results. Also, sometimes at the end of the processing of a term the value of the dollar variable is not important at all. Hence new module options have been implemented to help FORM to process these variables in parallel: minimum, maximum, sum and local.

• Right-hand side expressions (RHS). This is not a problem for TFORM since all threads work with the same file system while it is a big problem for ParFORM since the expression may be situated in a scratch file but different nodes may have independent scratch file systems. For a long time ParFORM forced evaluation of modules with RHS expressions in sequential mode. Now ParFORM is able to perform RHS expressions in a real parallel mode.

• InParallel statement. A new statement was inplemented, inp

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut