QUICK REVIEW

[論文レビュー] A more accurate rational non-commutative algorithm for multiplying 4x4 matrices using 48 multiplications

Jean‐Guillaume Dumas, Clément Pernet|arXiv (Cornell University)|Mar 19, 2026

Tensor decomposition and applications被引用数 0

ひとこと要約

新しい4x4x4:48の代数に対して、2の逆元を含む環上でより正確な変種を提案し、最大ノルム誤差の指数を約2.386に抑え、実用的な精度を改善。

ABSTRACT

We propose a more accurate variant of an algorithm for multiplying 4x4 matrices using 48 multiplications over any ring containing an inverse of 2. This algorithm has an error bound exponent of only log 4 $γ$$\infty$,2 $\approx$ 2.386. It also reaches a better accuracy w.r.t. max-norm in practice, when compared to previously known such fast algorithms. Furthermore, we propose a straight line program of this algorithm, giving a leading constant in its complexity bound of 387 32 n 2+log 4 3 + o n 2+log 4 3 operations over any ring containing an inverse of 2. Introduction: An algorithm to multiply two 4x4 complex-valued matrices requiring only 48 non-commutative multiplications was introduced in [16] 1 using a pipeline of large language models orchestrated by an evolutionary coding agent. A matrix multiplication algorithm with that many non-commutative multiplications is denoted by ___4x4x4:48___ in the sequel. An equivalent variant of the associated tensor decomposition defining this algorithm, but over the rationals (more precisely over any ring containing an inverse of 2), was then given in [8]. Most error analysis of sub-cubic time matrix multiplication algorithms [3, 4, 2, 1, 17] are given in the max-norm setting: bounding the largest output error as a function of the max-norm product of the vectors of input matrix coefficients. In this setting, Strassen's algorithm has shown the best accuracy bound, (proven minimal under some assumptions in [2]). In [6, 8], the authors relaxed this setting by shifting the focus to the 2-norm for input and/or output; that allowed them to propose a ___2x2x2:7___ variant with an improved accuracy bound. Experiments show that this variant performs best even when measuring the max-norm of the error bound. We present in this note a variant of the recent ___4x4x4:48___ algorithm over the rationals (again in the same orbit under De Groot isotropies [10]) that is more numerically accurate w.r.t. max-norm in practice. In particular, our new variant improves on the error bound exponent, from log 2 $γ$ $\infty$,2 $\approx$ 2.577 Consider the product of an M x K matrix A by a K x N matrix B. It is computed by a ___m, k, n___ algorithm represented by the matrices L, R, P applied recursively on ${\ell}$ recursive levels and the resulting m 0 x k 0 by k 0 x n 0 products are performed using an algorithm $β$. Here M = m 0 m ${\ell}$ , K = k 0 k ${\ell}$ and n = n 0 n ${\ell}$ . The accuracy bound below uses any (possibly different) p-norms and q-norms for its left-handside, ___$\bullet$___ p and right-hand side, ___$\bullet$___ q . The associated dual norms, are denoted by ___$\bullet$___ p $\star$ and ___$\bullet$___ q $\star$ respectively. Note that, these are vector norms, hence ___A___ p for matrix A in R mxn denotes ___Vect(A)___ p and is the p-norm of the mn dimensional vector of its coefficients, and not a matrix norm.

研究の動機と目的

max-norm設定における数値安定性を改善したサブ三次元的行列乗算を動機づける。
1/2を含む環上で、精度を改善した4x4x4:48アルゴリズムの変種を開発する。
実装のための具体的なLRP表現とストレートラインプログラムを提供する。
理論的誤差境界を既存アルゴリズムと比較し、実用的な精度を実証する。
複雑さの境界を提示し、代替基底バリアントを議論する。

提案手法

4x4行列乗算の二次結合をLRP（L, R, P）三重項として表現する。
(p,q)=(2,2)および(∞,2)に対する2ノルムの成長率γ_{p,q}を低減する特定のL, R, Pを用いた新しい4x4x4:48アルゴリズムの変種を導入する。
L, R, Pの明示的なストレートラインプログラム（表1）と対応する、Hadamard積を含む最適化された算術を導出する。
γ_{p,q}成長因子を用いた前提誤差の分析と、ノルムに関する境界f_{p,q}を提供する。
詳細な複雑さ境界を示す（ここでは45ではない）、付録Aに代替基底バリアントを示す。

実験結果

リサーチクエスチョン

RQ1このアルゴリズムを1/2を含む環上で48回の乗算を用いて実現する場合、最大で得られる精度向上はどれくらいか。
RQ2新しい変種は、(p,q)ノルム、特に(2,2)と(∞,2)における前方誤差境界にどう影響するか。
RQ3実装のための明示的なL, R, Pの表現と、それに対応するストレートラインプログラムは何か。
RQ4新しい変種は、理論的な成長因子と実用的な最大ノルム精度の両面で、既存の4x4x4:48方式とどう比較されるか。
RQ5このアルゴリズムの代替基底を採用することによる計算量の影響は何か。

主な発見

新しい変種は、前方誤差境界の指数をlog_4 gamma_{infty,2} ≈ 2.386として達成する。
gamma_2,2の成長因子を(1+sqrt(2))·64に低減し、従来の4x4x4:48方式より精度オーダを改善する。
複雑さ境界のリーディング定数は387/32で、洗練された漸近項とともに、おおよそ12.09375 n^{2+log_4 3} - 11.09375 n^2の演算を、(1/2)を含む環上で得る。
L, R, Pの実装可能なストレートラインプログラム（SLP）が提供され、実用的な実装とベンチマークを可能にする。
付録Aでは、同様の精度を持ち、境界の定数項を低減する別基底を提示する（8n^{2+log_4 3}+o(...))。
実験の結果、新しい4x4x4:48変種は、最大ノルム出力誤差の点で、従来の2x2x2:7および4x4x4:48方式よりも高精度であることが示された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。