QUICK REVIEW

[论文解读] Unsolved Problems in ML Safety

Dan Hendrycks, Nicholas Carlini|arXiv (Cornell University)|Sep 28, 2021

Adversarial Robustness in Machine Learning参考文献 179被引用 88

一句话总结

本文概述了四个核心 ML 安全问题——鲁棒性、监控、对齐和系统性安全——并为每个问题提供具体的研究方向。

ABSTRACT

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.

研究动机与目标

动机需要开展前瞻性的 ML 安全研究以防止代价高昂的失败。
识别 ML 安全中的四个关键问题领域：鲁棒性、监控、对齐和系统性安全。
阐明动机并为在每个领域开始或继续研究提供具体方向。

提出的方法

界定四个 ML 安全问题领域并阐述它们的动机。
调研现有挑战并为每个领域提出广泛的研究方向。
提出基准、架构和评估方法以推动安全性。
讨论影响对齐的社会、监管和新兴能力相关因素。

实验结果

研究问题

RQ1ML 安全中四个主要未解决问题是什么，为什么现在至关重要？
RQ2哪些具体的研究方向可以推动鲁robustness?

主要发现

提出一个四问题安全路线图：鲁棒性、监控、对齐和系统性安全。
概述每个问题的可执行方向，包括基准、检测器和评估方法。
强调新兴能力和隐藏后门作为对齐与监控的核心关注点。
讨论前瞻性安全研究在塑造监管和降低部署风险方面的作用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。