QUICK REVIEW

[논문 리뷰] Blockchain Large Language Models

Yu Gai, Liyi Zhou|arXiv (Cornell University)|2023. 04. 25.

Blockchain Technology Applications and Security인용 수 15

한 줄 요약

BlockGPT는 사전에 정의된 규칙 없이 거래 실행 흔적에서 학습하여 실시간으로 이상 Ethereum 거래를 탐지하는 대형 언어 모델을 훈련시키고, 높은 처리량을 달성하며 상위 비정상 거래들 사이에서 많은 공격을 식별합니다.

ABSTRACT

This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of BlockGPT through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, BlockGPT identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.

연구 동기 및 목표

블록체인/DeFi 거래에서 동적이고 확장 가능한 이상 탐지의 필요성을 동기를 부여한다.
사전에 정의된 취약 패턴 없이 거래 실행 흔적을 모델링하기 위한 자기감독 학습(self-supervised) 접근법을 제안한다.
트랜스포머 아키텍처와 호환되는 도메인 특정 인코딩 및 토크나이제이션 파이프라인을 개발한다.
대규모 Ethereum 데이터세트에서 이상 랭킹 도구로서 BlockGPT를 입증하고 실시간 성능을 평가한다.

제안 방법

호출(call), 상태(state), 로그(trace) 추적을 결합한 트리로 새로운 중간 추적 표현(ITR)을 구성한다.
ITR 노드를 도메인 특화 토큰으로 토큰화하여 고정 어휘를 형성한다.
로컬 토큰 임베딩을 토큰 임베딩, 트리 위치 임베딩, 컨텍스트 임베딩의 합으로 계산한다.
트리 인지 위치 인코딩을 갖는 트랜스포머 인코더를 적용하여 추적 임베딩을 학습한다.
인과적 언어 모델링 손실을 사용하여 BlockGPT를 비지도/자기지도 방식으로 학습시킨다.
거래의 흔적 로그 우도에 따라 거래를 순위화하고 가장 이상한 거래에 경보를 울린다.

실험 결과

연구 질문

RQ1사전에 정의된 취약 패턴 없이 Ethereum 거래 흔적에 대한 비지도/자기지도 학습이 이상하거나 악의적 행위를 탐지할 수 있는가?
RQ2ITR 표현으로 학습된 트랜스포머 기반 모델이 실시간으로 비정상 거래를 랭킹하는 데 얼마나 효과적인가?
RQ3대량의 DeFi 맥락에서 BlockGPT가 달성할 수 있는 처리량과 거짓 양성 특성은 무엇인가?

주요 결과

BlockGPT는 평균 처리량 2,284 ± 289 거래/초를 달성했다.
BlockGPT는 124건의 공격 중 상위 3개 이상 비정상 거래 중 49위를 차지했다.
BlockGPT는 공격 전체에서 가장 비정상으로 식별된 거래 20건, 두 번째로 비정상 20건, 세 번째로 비정상 7건을 식별했다.
대용량 DeFi 환경에서 BlockGPT는 0.01% 경보 임계값에서 절대 거짓양성률(0.097%)를 유지했고, 0.1% FPR에서 100일 거래 기준으로 약 매 10일마다 경보를 울릴 수 있었다.
BlockGPT는 단일 거래를 순위화하는 데 평균 0.16 ± 0.3초의 실시간 이상 탐지 능력을 보여준다.
이 연구는 EVM 추적에 맞춘 맞춤형 데이터 인코딩, 도메인 특화 토큰화, 트리 인코딩 방법을 도입하여 강력한 비지도 이상 탐지 성능을 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.