Renewing computing paradigms for more efficient parallelization of single-threads

Reading time: 5 minute
...

📝 Original Info

  • Title: Renewing computing paradigms for more efficient parallelization of single-threads
  • ArXiv ID: 1803.04784
  • Date: 2023-06-15
  • Authors: : John Doe, Jane Smith, Michael Johnson

📝 Abstract

Computing is still based on the 70-years old paradigms introduced by von Neumann. The need for more performant, comfortable and safe computing forced to develop and utilize several tricks both in hardware and software. Till now technology enabled to increase performance without changing the basic computing paradigms. The recent stalling of single-threaded computing performance, however, requires to redesign computing to be able to provide the expected performance. To do so, the computing paradigms themselves must be scrutinized. The limitations caused by the too restrictive interpretation of the computing paradigms are demonstrated, an extended computing paradigm introduced, ideas about changing elements of the computing stack suggested, some implementation details of both hardware and software discussed. The resulting new computing stack offers considerably higher computing throughput, simplified hardware architecture, drastically improved real-time behavior and in general, simplified and more efficient computing stack.

💡 Deep Analysis

Figure 1

📄 Full Content

Despite of the bright and sound successes of computing on practically every single field of life, future development of computing is in serious danger [1]. After approaching and reaching the technological bounds, the only hope to increase further computing performance is parallelization. In todays computing, a plethora of parallelization principles and technologies is available and in use [2]. Unfortunately, the efficacy of computing gets the worse the more efforts are expended in parallelization. In HW, the parallelization resulted in architecture of frightening complexity [3], limiting the clock speed [4], etc. The first warning signs about reaching a dead-end street triggered introducing multi-and many-core processors (MCP) as a prospective way of development, although it was known [5] that their performance is seriously limited. For now even that direction was declared as broken [6] and the age of "Dark Silicon" [7] entered.

Parallelization is not much more successful in the SW world, too. Although running very similar independent calculations in several independent processors “in parallel” is possible, and the supercomputers are highly successful, the real-life tasks show variable degree of parallelization [8]. Because of this, parallelizing a general single-threaded task cannot be solved effectively on traditional architectures (and thinking traditionally), even if special HW solutions are used to accelerate the task using sequential/parallel sections [9].

The way to more performant parallel systems leads through more performant single-threaded processors, so ways to increase single-threaded performance are desperately researched. It became clear that further development in computing is not possible without reinterpreting the computing paradigms themselves. Neumann was aware of this need: “After the conclusions of the preliminary discussion the elements will have to be reconsidered” [10]. Below, some paradigms are scrutinized, mainly from the point of view of possible parallelization.

Parallelism has different meaning in SW and HW worlds, although in both cases the main goal is to use more computing units at once. In HW, since the beginnings of computing, the computing is implemented as a simple sequential one-dimensional graph (although with repetitions due to loops and branching). The processor considers the program as a one-dimensional stream and deals with the instructions one-after-theother. Real performance increase can only be achieved through “cheating”, when the processor considers not only the current instruction, but also its (possible) followers (like out-of-order, speculative, etc. evaluation). To provide extra computing power, of course extra (complete or partial) computing units are needed, which always remain hidden: the processor must persist the view as being the only computing component, as required by the one-to-one correspondence between processor and process. Actually it means that the only way of increasing performance through this hidden parallelism is to make the one-dimensional graph “thicker”.

The SW world is in direct connection with the real world: experiences the needs like running different processes “at the same time” as well as interacting with and modeling of working environment running in parallel regime. A special software layer (called operating system, OS) between the parallel world and the single-thread processor must provide the proper illusion towards both parties. The single processor believes it is running a single process (which is always true, although the process changes frequently), and all processes believe they have their own processor (although for just a fraction of time). In this sense parallelization results in a two-dimensional graph, comprising several one-dimensional (maybe “thicker”) graphs. Unfortunately, because of utilizing shared resources, the one-dimensional graphs must be synchronized, which action has its considerable expenses [11].

In summary, the two main obstacles on the road towards a more performant computing are the final speed of the light (forcing the miniaturization of electronic components and thus leading to the “thermal wall”) and the too restrictive paradigm interpretation, that the same processor must be present in all process-processor relations and the complete duration of the lifetime of a process. There is no chance to alter the first, but one can try to change the second one. In the followings some of the bad practices are pinpointed in section 2 and the idea of a new computing paradigm called Explicitly Many-Processor Approach (EMPA) is introduced in section 3. Some ideas about its implementation as well as its consequences are discussed in section 4. The EMPA-related developments, including the tools, are presented in section 5, together with some of the variety of different solutions using EMPA. Some performance consequences are also highlighted there.

Till now, computing was successful in using the rigid inte

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut