Rotary Positional Embeddings (RoPE) have demonstrated exceptional performance as a positional encoding method, consistently outperforming their baselines. While recent work has sought to extend RoPE to higher-dimensional inputs, many such extensions are non-commutative, thereby forfeiting RoPE's shift-equivariance property. Spherical RoPE is one such non-commutative variant, motivated by the idea of rotating embedding vectors on spheres rather than circles. However, spherical rotations are inherently non-commutative, making the choice of rotation sequence ambiguous. In this work, we explore a quaternion-based approach -- Quaternion Rotary Embeddings (QuatRo) -- in place of Euler angles, leveraging quaternions' ability to represent 3D rotations to parameterize the axes of rotation. We show Mixed RoPE and Spherical RoPE to be special cases of QuatRo. Further, we propose a generalization of QuatRo to Clifford Algebraic Rotary Embeddings (CARE) using geometric algebra. Viewing quaternions as the even subalgebra of Cl(3,0,0), we extend the notion of rotary embeddings from quaternions to Clifford rotors acting on multivectors. This formulation enables two key generalizations: (1) extending rotary embeddings to arbitrary dimensions, and (2) encoding positional information in multivectors of multiple grades, not just vectors. We present preliminary experiments comparing spherical, quaternion, and Clifford-based rotary embeddings.
Rotary positional embeddings (RoPE) have proven remarkably effective in language modeling, prompting interest in adapting their success to higher-dimensional domains such as vision and multimodal learning [Siméoni et al., 2025]. Extending RoPE beyond 1D sequences requires careful consideration to preserve properties such as relative positional dependence (shift-equivariance) and reversibility [Liu andZhou, 2025, Su, 2021]. While early extensions sought to preserve strict equivariance [Yu et al., 2025, Schenck et al., 2025], recent findings suggest this property may not be essential for strong performance, opening the door to non-commutative generalizations [van de Geijn et al., 2025].
Spherical RoPE [van de Geijn et al., 2025] exemplifies this approach: the rotations are performed on a sphere -a space where rotations fail to commute -thus breaking strict equivariance. Its implementation uses Euler angles (yaw and roll matrices), which explicitly constrain the rotations to be around the principal axes. Rather utilizing the Euler angles, one could parameterize rotations with quaternions, an idea mused in Su [2021], but discarded due to their non-commutativity. This allows for simple parameterization of the axes of rotation.
In this work, we revisit Quaternion Rotary Embeddings (QuatRo). Quaternions offer a compact, stable representation of 3D rotations allowing us to parameterize the axes of rotation rather than the assumed principal axes of Spherical RoPE. We further generalize QuatRo to Clifford Algebraic Rotary Embeddings (CARE), leveraging the geometric algebra framework. By interpreting quaternion rotors as grade-2 blades of Cl(3, 0, 0) acting on grade-1 vectors, we derive a principled method for:
- Generalizing rotary embeddings to arbitrary dimensions. 2. Allowing embeddings to inhabit multivector spaces, enabling richer positional transformations across grades.
This generalization not only subsumes quaternion-based methods but also creates new possibilities for encoding positional structure in higher-dimensional and multimodal settings. While CARE generalizes QuatRo to higher-dimensional data such as video or point clouds, this work is still in progress, and experiments are currently restricted to 2D images.
Several recent efforts have sought to extend Rotary Positional Embeddings (RoPE) to higherdimensional data [Su et al., 2024]. The most general formulation to date is LieRE [Ostmeier et al., 2024], which models RoPE as a rotation of D-dimensional query sub-vectors via the exponential of a linear combination of skew-symmetric matrices. Under the standard proof of RoPE’s relative positional property, these generators must commute, as can be seen through the Baker-Campbell-Hausdorff formula. This constraint has led prior work to impose commutativity requirements on the rotation generators to preserve strict shift-equivariance [Yu et al., 2025, Schenck et al., 2025, Liu and Zhou, 2025].
However, recent studies have questioned the necessity of these constraints. In particular, van de Geijn et al. [2025] propose Spherical RoPE, which applies rotary encodings on the sphere-a setting where rotations do not commute-showing that performance can remain competitive despite breaking equivariance. Their approach parameterizes rotations using Euler angles, introducing potential issues such as gimbal lock and unintuitive composition behavior.
Our method, Quaternion Rotary Embeddings (QuatRo), builds on this line of work by replacing Euler angles with quaternion rotations. Quaternions can be viewed as a compact and numerically stable representation of 3D rotations, corresponding to the even subalgebra of Cl(3, 0, 0). While QuatRo can be interpreted as a special case of LieRE with fixed 3 × 3 skew-symmetric generators, our generalization-Clifford Algebraic Rotary Embeddings (CARE)-extends beyond LieRE’s scope. CARE treats rotary embeddings as Clifford rotors acting on multivectors, enabling both higherdimensional generalization and graded (multi-grade) positional encoding.
The relationship between LieRE and CARE is nuanced: in one view, LieRE can be seen as a restricted subclass of CARE where the generators are limited to certain skew-symmetric matrices; in another view, certain CARE configurations reduce to LieRE.
Due to space constraints, we focus on the specific algebraic tools and notation relevant to our method, and refer the reader to Roelfs and Keninck [2021] for comprehensive dives into geometric algebra and rotors and van de Geijn et al. [2025] for N -D positional encodings.
Quaternions form a four-dimensional non-commutative algebra over the real numbers, with basis {1, i, j, k} and multiplication rules:
with anti-commutativity for distinct basis elements, e.g., ji = -ij. A quaternion q = a 0 + a i i + a j j + a k k can be split into a scalar part a 0 and a vector part v = a i i + a j j + a k k.
Pure quaternions (zero scalar part) can represent 3D vectors, while unit quaternions represent 3D r
This content is AI-processed based on open access ArXiv data.