QUICK REVIEW

[Paper Review] Versor: A Geometric Sequence Architecture

Truong Minh Huy, Edward Hirst|arXiv (Cornell University)|Feb 10, 2026

Algebraic and Geometric Analysis0 citations

TL;DR

Versor introduces a CGA-based sequence architecture that uses Geometric Product Attention and Recursive Rotor Accumulator to achieve scale-generalizable, interpretable, and hardware-efficient sequence modeling, outperforming Transformers on several tasks.

ABSTRACT

A novel sequence architecture is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of traditional linear operations to achieve structural generalization and significant performance improvements on a variety of tasks, while offering improved interpretability and efficiency. By embedding states in the $Cl_{4,1}$ manifold and evolving them via geometric transformations (rotors), Versor natively represents $SE(3)$-equivariant relationships without requiring explicit structural encoding. Versor is validated on chaotic N-body dynamics, topological reasoning, and standard multimodal benchmarks (CIFAR-10, WikiText-103), consistently outperforming Transformers, Graph Networks, and geometric baselines (GATr, EGNN). Key results include: orders-of-magnitude fewer parameters ($200 imes$ vs. Transformers); interpretable attention decomposing into proximity and orientational components; zero-shot scale generalization (0.993 vs. 0.070 MCC for ViT); and featuring a Recursive Rotor Accumulator (RRA) for $O(L)$ linear temporal complexity in dynamical systems, and a Geometric Product Attention (GPA) mechanism for $O(L^{2})$ global relational modeling, allowing for task-specific architectural pruning or hybridization depending on the required scale. In out-of-distribution tests, Versor maintains stable predictions while Transformers fail catastrophically. Custom Clifford kernels achieve a cumulative over $100 imes$ speedup via bit-masked contraction and specialized Matrix Isomorphism kernels, reducing per-step latency to 1.05 ms and outperforming highly-optimized Transformer baselines.

Motivation & Objective

Motivate embedding symmetry priors directly into sequence models to overcome the “Euclidean Bottleneck.”
Propose a CGA-based sequence architecture operating in Cl4,1 to model SE(3)-equivariant relationships.
Demonstrate scale generalization, interpretability, and efficiency over standard Transformers and geometric baselines.
Showcase multimodal capabilities across chaotic dynamics, topology, vision, and language tasks.

Proposed method

Introduce Geometric Product Attention (GPA) that decomposes attention into scalar (proximity) and bivector (orientation) components.
Develop Recursive Rotor Accumulator (RRA) to achieve O(L) temporal complexity with state evolution on the Spin(4,1) manifold.
Enforce manifold constraints via Manifold Normalization to prevent drift and enable stable long-horizon dynamics.
Utilize hardware-optimized Clifford kernels (bit-masked and matrix isomorphism) for accelerated Clifford product computations.
Provide a software layout (gacore) with potential for dimensionally adapted Clifford algebras and future GAPU hardware proposals.

Figure 1 : The Versor Architecture. (Left) Geometric Product Attention (GPA). (Right) The Recursive Rotor Accumulator (RRA).

Experimental results

Research questions

RQ1Can Conformal Geometric Algebra enable SE(3)-equivariant sequence modeling without explicit structural encodings?
RQ2Does a CGA-based architecture generalize across scales and densities, preserving performance in long-horizon or out-of-distribution settings?
RQ3How do GPA's scalar and bivector components relate to learned proximity and orientation interactions in dynamic tasks?
RQ4Can Recursive Rotor Accumulator achieve linear-time recurrence while maintaining numerical stability in chaotic systems?
RQ5What hardware and software optimizations are necessary to achieve practical latency and parameter efficiency for Clifford-based sequence models?

Key findings

Versor uses orders-of-magnitude fewer parameters (≈200× fewer than Transformers) and achieves competitive to superior performance across tasks.
Geometric Product Attention decomposes into proximity (scalar) and orientation (bivector) components, enabling interpretable interaction laws.
Versor attains zero-shot scale generalization, e.g., MCC of 0.993 on topological connectivity tasks versus 0.070 for ViT.
Recursive Rotor Accumulator provides O(L) inference with O(1) memory, enabling long-horizon dynamics with thousands of steps.
Custom Clifford kernels yield substantial speedups (≈100× cumulative) and end-to-end latency around 1.05 ms, outperforming optimized Transformer baselines.
In out-of-distribution tests, Versor remains stable while Transformer baselines can fail catastrophically.

Figure 2 : Geometric Attention Decomposition: Separating Force from Torque. Points labeled B0–B4 represent the 5 gravitationally-interacting bodies; B0 is the focal body for this visualization. The axes ( $x_{1}$ , $x_{2}$ ) are the 2D physical coordinates of the simulation. Line weights are proport

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.