Computer Science / Mathematical Software

Accelerating Scientific Computations with Mixed Precision Algorithms

February 23, 2026

Reading time: 5 minute

...

#Computer Science #Mathematical Software

📝 Original Info

Title: Accelerating Scientific Computations with Mixed Precision Algorithms
ArXiv ID: 0808.2794
Date: 2015-05-13
Authors: ** Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, Stanimire Tomov **

📝 Abstract

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.

💡 Deep Analysis

Deep Dive into Accelerating Scientific Computations with Mixed Precision Algorithms.

📄 Full Content

Accelerating Scientiﬁc Computations with Mixed Precision Algorithms Marc Baboulin1,2, Alfredo Buttari2, Jack Dongarra2,3,4, Jakub Kurzak2, Julie Langou2, Julien Langou5, Piotr Luszczek2, and Stanimire Tomov2 1Department of Mathematics, University of Coimbra, Coimbra, Portugal 2Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, Tennessee 3Oak Ridge National Laboratory, Oak Ridge, Tennessee 4University of Manchester, Manchester, UK 5Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado May 29, 2018 Abstract On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit ﬂoating point arithmetic, the perfor- mance of many dense and sparse linear algebra algorithms can be signif- icantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit ﬂoating point arithmetic, the perfor- mance of many dense and sparse linear algebra algorithms can be signif- icantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. 1 arXiv:0808.2794v1 [cs.MS] 20 Aug 2008 1 Introduction On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. There are two reasons for this. Firstly, 32-bit ﬂoating point arithmetic is usually twice as fast as 64-bit ﬂoating point arithmetic on most modern processors. Secondly the amount of bytes moved through the memory system is halved. In Table 1, we provide some hardware numbers that support these claims. On AMD Opteron 246, IBM PowerPC 970, and Intel Xeon 5100, the single precision peak is twice the double precision peak. On the STI Cell BE, the single precision peak is fourteen times the double precision peak. Not only single precision is faster than double precision on conventional processors but this is also the case on less mainstream technologies such as Field Programmable Gate Arrays (FPGA) and Graphical Processing Units (GPU). These speedup numbers tempt us and we would like to be able to beneﬁt from it. For several physics applications, results with 32-bit accuracy are not an option and one really needs 64-bit accuracy maintained throughout the compu- tations. The obvious reason is for the application to give an accurate answer. Also, 64-bit accuracy enables most of the modern computational methods to be more stable; therefore, in critical conditions, one must use 64-bit accuracy to obtain an answer. In this manuscript, we present a methodology of how to per- form the bulk of the operations in 32-bit arithmetic, then postprocess the 32-bit solution by reﬁning it into a a solution that is 64-bit accurate. We present this methodology in the context of solving a system of linear equations, be it sparse or dense, symmetric positive deﬁnite or nonsymmetric, using either direct or iterative methods. We believe that the approach outlined below is quite general and should be considered by application developers for their practical problems. 2 The Idea Behind Mixed Precision Algorithms Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be reﬁned to the point where dou- ble precision accuracy is achieved. The reﬁnement can be accomplished, for instance, by means of the Newton’s algorithm [47] which computes the zero of a function f(x) according to the iterative formula xn+1 = xn −f(xn) f ′(xn). (1) In general, we would compute a starting point and f ′(x) in single precision arithmetic and the reﬁnement process will be computed in double precision arithmetic. If the reﬁnement process is cheaper than the initial computation of the solu- tion then double precision accuracy can be achieved nearly at the same speed as the single precision accuracy. Sections 2.1 and 2.2 describe how this concept can 2 be applied to solvers of linear systems based on direct and iterative methods, respectively. 2.1 Direct Methods A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coeﬃcient matrix using Gaussian elim- inati

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Accelerating Scientific Computations with Mixed Precision Algorithms

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

The myth of equidistribution for high-dimensional simulation

Universal algorithms, mathematics of semirings and parallel computations

A Class of lattices and boolean functions related to a Manickam-Mikl'os-Singhi Conjecture

Start searching

No results found