QUICK REVIEW

[论文解读] Open Problems in Applied Deep Learning

Maziar Raissi|arXiv (Cornell University)|Jan 26, 2023

Model Reduction and Neural Networks被引用 7

一句话总结

论文将 ML 开发循环框架化为双层优化问题，调查其在不同学习范式中的应用，并概述大量未解决的问题与自动化迭代步骤以降低计算量和碳足迹的方向。

ABSTRACT

This work formulates the machine learning mechanism as a bi-level optimization problem. The inner level optimization loop entails minimizing a properly chosen loss function evaluated on the training data. This is nothing but the well-studied training process in pursuit of optimal model parameters. The outer level optimization loop is less well-studied and involves maximizing a properly chosen performance metric evaluated on the validation data. This is what we call the "iteration process", pursuing optimal model hyper-parameters. Among many other degrees of freedom, this process entails model engineering (e.g., neural network architecture design) and management, experiment tracking, dataset versioning and augmentation. The iteration process could be automated via Automatic Machine Learning (AutoML) or left to the intuitions of machine learning students, engineers, and researchers. Regardless of the route we take, there is a need to reduce the computational cost of the iteration step and as a direct consequence reduce the carbon footprint of developing artificial intelligence algorithms. Despite the clean and unified mathematical formulation of the iteration step as a bi-level optimization problem, its solutions are case specific and complex. This work will consider such cases while increasing the level of complexity from supervised learning to semi-supervised, self-supervised, unsupervised, few-shot, federated, reinforcement, and physics-informed learning. As a consequence of this exercise, this proposal surfaces a plethora of open problems in the field, many of which can be addressed in parallel.

研究动机与目标

将机器学习管道表述为将内层模型训练与外层超参数优化分离的双层优化问题。
突出迭代步骤作为关键瓶颈并探索自动化或加速方法（AutoML、MLOps、参数共享）。
评估跨越多种学习范式（监督、半监督、无监督、少样本、联邦、强化、物理信息等）与架构（CNN、Transformer、视觉任务）的开放问题。
讨论多目标权衡、数据/增强策略以及模型开发的碳足迹。

提出的方法

提出一个统一的双层优化框架：min_alpha M_val(w*(alpha), alpha) subject to w*(alpha)=argmin_w L_train(w, alpha).
在 M_val 为离散、不可微或定义不清时，调查解法策略，包括网格/随机/贝叶斯搜索、强化学习、进化算法与参数共享。
通过参数共享实现记忆-计算权衡以加速内层优化；考虑 M_val 的可微代理以实现对 w 和 alpha 的联动优化。
讨论平衡性能与效率及部署约束的多目标外层优化和超超参数。
将数据增强与策略搜索考虑作为超参数空间的一部分。

Figure 1: The general mechanism for building machine learning solutions.

实验结果

研究问题

RQ1如何在不同学习范式中实现外层迭代步骤的自动化或加速？
RQ2可微代理的验证指标是否能实现高效的双层优化以进行超参数搜索？
RQ3在多目标双层设置中有哪些有效策略可在准确性与效率（FLOPs、延迟、内存）之间取得平衡？
RQ4数据增强策略与架构选择如何影响监督与非监督场景下的迭代过程？
RQ5将双层优化扩展到半监督、联邦、强化学习和物理信息学习时，会出现哪些尚未解决的问题？

主要发现

内层训练循环是在双层公式中的主要计算瓶颈。
记忆密集型的参数共享和热启动相比从头开始训练可以显著加速迭代。
可微代理的验证指标可以在不将内层问题求解完成的情况下实现基于梯度的双层优化。
需要为不可微分的指标（如准确率、mAP）开发可微近似以便实现类 AutoML 的过程。
从监督到物理信息学习的多个开放问题被识别，包括鲁棒性、可解释性、迁移性和多模态设置。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。