QUICK REVIEW

[論文レビュー] The Fourth International Verification of Neural Networks Competition (VNN-COMP 2023): Summary and Results

Christopher Brix, Stanley Bak|arXiv (Cornell University)|Dec 28, 2023

Adversarial Robustness in Machine Learning被引用数 9

ひとこと要約

このレポートはVNN-COMP 2023を要約し、規則、ベンチマーク、参加ツール、AWSハードウェアでの評価、そして競技からの重要な教訓を詳述します。

ABSTRACT

This report summarizes the 4th International Verification of Neural Networks Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with the 35th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2023 iteration, 7 teams participated on a diverse set of 10 scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.

研究の動機と目的

ニューラルネット検証ツールの公正で客観的な比較を促進する。
入力を標準化するためにネットワークと仕様形式（ONNXおよびVNN-LIB）を標準化し、評価を自動化する。
再現可能な結果を得るための一様なハードウェアとパイプライン設定を提供する。
結果を収集・分析して現在のツールの強みと限界を特定する。
将来のVNN-COMPの iterationsに向けて得られた教訓を共有する。

提案手法

ネットワークにはONNX、仕様にはVNN-LIBを採用して入力を標準化する。
公正な比較のためコスト同等のハードウェアを提供するためにAWSインスタンスを使用する。
ベンチマークとツールの自動提出・テストパイプラインを実装する。
インスタンスごとにタイムアウトを定義し、正規化スコアリングを通じてベンチマークを集計する。
誤った結果を抑制するために罰則を適用し、反例の取り扱いを定義する。
結果を公開リポジトリおよびFoMLAS/CAVの場で提示する。

The Fourth International Verification of Neural Networks Competition (VNN-COMP 2023): Summary and Results

実験結果

リサーチクエスチョン

RQ1標準化されたフォーマットとハードウェア制約の下で、最先端のニューラルネット検証ツールはどのように比較されるのか？
RQ2多様な検証ベンチマークに対して現在のツールの実用的な長所と限界は何か？
RQ3ルール、ベンチマーク、ツールに関して、将来のVNN-COMPの iterationsで浮かび上がる教訓と改善点は何か？

主な発見

7つのチームが、10のスコア付きベンチマークと4つのスコアなしベンチマークに参加。
評価は等コストのAWSハードウェアと自動化パイプラインを使用して実施。
ルールは標準フォーマットと統一評価インターフェースの適用を強制。
反例の可能性を抑制するため誤った結果には罰則を適用。
結果の集計はベンチマークをインスタンス数に関係なく等ウェイトで加重。
本レポートは教訓と将来の改善の可能性について論じる。

Figure 1 : Accuracy Efficient Architecture for GTSRB and Belgium dataset

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。