QUICK REVIEW

[论文解读] A GPU-accelerated Nonlinear Branch-and-Bound Framework for Sparse Linear Models

Xiang Meng, Ryan Lucas|arXiv (Cornell University)|Feb 4, 2026

Stochastic Gradient Optimization Techniques被引用 0

一句话总结

This paper introduces a GPU-accelerated branch-and-bound framework for exact sparse linear regression with an ℓ0–ℓ2 penalty, using ADMM-based node relaxations and batched GPU parallelism to solve many subproblems simultaneously.

ABSTRACT

We study exact sparse linear regression with an $\ell_0-\ell_2$ penalty and develop a branch-and-bound (BnB) algorithm explicitly designed for GPU execution. Starting from a perspective reformulation, we derive an interval relaxation that can be solved by ADMM with closed-form, coordinate-wise updates. We structure these updates so that the main work at each BnB node reduces to batched matrix-vector operations with a shared data matrix, enabling fine-grained parallelism across coordinates and coarse-grained parallelism across many BnB nodes on a single GPU. Feasible solutions (upper bounds) are generated by a projected gradient method on the active support, implemented in a batched fashion so that many candidate supports are updated in parallel on the GPU. We discuss practical design choices such as memory layout, batching strategies, and load balancing across nodes that are crucial for obtaining good utilization on modern GPUs. On synthetic and real high-dimensional datasets, our GPU-based approach achieves clear runtime improvements over a CPU implementation of our method, an existing specialized BnB method, and commercial MIP solvers.

研究动机与目标

以 MIP 风格的公式化推进带 ℓ0–ℓ2 惩罚项的精确稀疏回归。
设计一个友好 GPU 的非线性 BnB 框架，在 BnB 节点之间以及每个节点内充分利用并行性。
开发适用于 GPU 的快速并行节点松弛和上界求解器。
实现暖启动和批处理处理，以提高对大规模 n、p 的可扩展性。

提出的方法

使用带 Big-M 约束的透视形式将问题重新表述为混合整数二阶锥规划（MISOCP）。
开发基于 ADMM 的节点松弛，将问题解耦为坐标逐步更新，从而实现高度并行的闭式解。
通过可处理的 ADMM 推导对偶问题计算对偶界，且为放松问题提供强对偶性。
实现批量化、面向 GPU 的并行化，处理多个 BnB 节点和坐标，同时在子问题之间进行批处理。
在活动支持集上使用批量近端梯度上界法快速生成可行解。
在父节点和子节点之间对 ADMM 迭代进行暖启动，以加速每个节点的求解，并重用预计算矩阵以降低成本。
提供一个专门的启发式方法，以高质量的支持集合初始化 BnB 树，从而改善初始上界。

实验结果

研究问题

RQ1一个 GPU 加速的 BnB 框架是否能够比 CPU 方法更高效地对高维稀疏回归问题实现最优性证实？
RQ2如何构造节点松弛和上界计算以最大程度地利用节点内外的 GPU 并行性？
RQ3在此 BnB 框架中影响 GPU 利用率和性能的实际设计考量（内存布局、批处理、负载均衡）有哪些？
RQ4所提出的 GPUBnB 与现有 BnB 方法和商业 MIP 求解器在运行时和可扩展性方面有何比较？

主要发现

GPUBnB 相对于 CPU 实现、现有专门的 BnB 方法以及商业 MIP 求解器，在合成数据与真实高维数据上的运行时间有显著提升。
对于大规模实例（n = 10^4，p = 10^5），GPUBnB 相对于基于 CPU 的实现（CPUBnB）获得约 35× 的加速，相较于 L0BnB 在未使用节点并行时也有约 6× 的提升。
在开启节点并行后，单位时间内解决的节点数可提高至多 25×，从而显著缩短整体求解时间。
商业求解器在非常大的实例上可能超出内存限制，凸显了采用 GPU 加速方法的实际可行性优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。