QUICK REVIEW

[논문 리뷰] The Fourth International Verification of Neural Networks Competition (VNN-COMP 2023): Summary and Results

Christopher Brix, Stanley Bak|arXiv (Cornell University)|2023. 12. 28.

Adversarial Robustness in Machine Learning인용 수 9

한 줄 요약

이 보고서는 VNN-COMP 2023을 요약하며 규칙, 벤치마크, 참가 도구, AWS 하드웨어에서의 평가 및 대회에서 얻은 주요 교훈을 자세히 다룹니다.

ABSTRACT

This report summarizes the 4th International Verification of Neural Networks Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with the 35th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2023 iteration, 7 teams participated on a diverse set of 10 scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.

연구 동기 및 목표

신경망 검증 도구의 공정하고 객관적인 비교를 촉진한다.
네트워크 및 명세 형식을 표준화(ONNX 및 VNN-LIB)하고 평가를 자동화한다.
재현 가능한 결과를 가능하게 하는 일관된 하드웨어 및 파이프라인 구성을 제공한다.
결과를 수집·분석하여 현재 도구의 강점과 한계를 식별한다.
향후 VNN-COMP 버전에 대한 교훈을 공유한다.

제안 방법

입력을 표준화하기 위해 네트워크에 ONNX를 채택하고 명세에 VNN-LIB를 사용한다.
공정한 비교를 위한 비용 동등한 하드웨어를 제공하기 위해 AWS 인스턴스를 사용한다.
벤치마크와 도구를 위한 자동 제출 및 테스트 파이프라인을 구현한다.
인스턴스별 시간 초과를 정의하고 표준화된 점수로 벤치마크를 집계한다.
오탑을 방지하기 위해 잘못된 결과에 패널티를 부과하고 반례 처리 방법을 명확히 한다.
공개 저장소와 FoMLAS/CAV venues를 통해 결과를 제시한다.

The Fourth International Verification of Neural Networks Competition (VNN-COMP 2023): Summary and Results

실험 결과

연구 질문

RQ1표준화된 형식과 하드웨어 제약 하에서 최신 신경망 검증 도구는 어떻게 비교되는가?
RQ2다양한 검증 벤치마크에 걸친 현재 도구의 실제적인 강점과 한계는 무엇인가?
RQ3규칙, 벤치마크 및 도구에 관한 향후 VNN-COMP 버전에 대해 어떤 교훈과 개선이 나타나는가?

주요 결과

7개 팀이 10개 점수 벤치마크와 4개 비점수 벤치마크에 걸쳐 참여했다.
동일 비용의 AWS 하드웨어 및 자동화된 파이프라인을 사용한 평가.
규칙은 표준화된 형식과 일관된 평가 인터페이스를 강제했다.
반례 방지를 위해 잘못된 결과에 패널티가 적용됐다.
결과 집계는 벤치마크를 인스턴스 수에 관계없이 동등하게 가중했다.
이 보고서는 얻은 교훈과 향후 개선 가능성을 다룬다.

Figure 1 : Accuracy Efficient Architecture for GTSRB and Belgium dataset

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.