QUICK REVIEW

[論文レビュー] Neuro-Symbolic Generation and Validation of Memory-Aware Formal Function Specifications

Liao Zhang, Tong Chen|arXiv (Cornell University)|Mar 12, 2026

Formal Methods in Verification被引用数 0

ひとこと要約

この論文は、自然言語と関数シグネチャからメモリを意識した形式C関数仕様を生成するニューロ象徴的ワークフローを提案し、シンボリック証明器とCoqベースの反例誘導反証フレームワークを用いた反復的改良を行い、LeetCode由来のベンチマークで評価する。

ABSTRACT

Formal verification of memory-manipulating programs critically depends on precise function specifications that capture memory states written by experts. This requirement has become a major bottleneck as large language models (LLMs) increasingly generate low-level systems code whose correctness cannot be assumed. To enable scalable formal verification, we focus exclusively on function specification generation, deliberately avoiding the synthesis of complex loop invariants that are central to traditional verification pipelines. We propose a neuro-symbolic framework for automatically generating memory-aware formal function specifications for C programs from natural language problem descriptions and function signatures. The pipeline first produces candidate specifications via in-context learning, and then iteratively refines them using compiler diagnostics from symbolic provers and the verification toolchain. In particular, we validate candidate specifications by constructing a proof for the negation of the specification with concrete examples, enabling machine-checked rejection of plausible-but-incorrect specifications. To support systematic evaluation, we introduce LeetCode-C-Spec, a new benchmark of 200 C programming problems for generating memory-aware formal function specifications. Experiments show that iterative refinement substantially improves syntactic validity, while symbolic prover-based refutation significantly enhances correctness assessment by filtering false positives that LLM-only judges frequently accept. Our results demonstrate that combining neural generation with symbolic feedback provides an effective approach to formal specification synthesis for memory-safe systems software.

研究の動機と目的

Cプログラムのメモリを意識した関数仕様の作成に伴う形式検証のボトルネックを解消する。
自然言語と関数シグネチャからメモリを意識した関数仕様を生成・改良するニューロ象徴的な二段階ワークフローを開発する。
Coqとシンボリック証明器を用いたカウンターエグザンプル・ガイド付き反証により仕様の自動検証を可能にする。
メモリを意識した仕様生成のベンチマークとしてLeetCode-C-Spec-200を作成・公開する（200問）

提案手法

Two-phase pipeline: Phase 1 uses in-context learning with curated prompts to generate QCP-compatible memory-aware C function specifications.
Phase 2 iteratively refines specifications using compiler diagnostics from Coq and QCP, and counterexample-guided refutation to fix syntactic and semantic errors.
Validation combines LLM semantic checks with formal methods: generate negated postconditions, instantiate concrete Coq test cases, and attempt to discharge via Coq hammer with automated theorem provers.
Counterexample generation workflow converts natural-language examples to typed Coq representations, then to refutation test cases to formally refute incorrect specifications.
Dataset and transformation: LeetCode-C-Spec-200 (200 LeetCode problems) is constructed for memory-aware specification generation, with mappings to QCP predicates and Coq imports.
The approach emphasizes function-level specifications over loop invariants to enable test-based validation and layered verification.

実験結果

リサーチクエスチョン

RQ1Can memory-aware functional specifications for C be automatically generated from natural language problem descriptions and function signatures?
RQ2Does iterative refinement with symbolic feedback improve syntactic validity of specifications generated by LLMs?
RQ3Does Coq-based counterexample-guided refutation effectively filter false positives that LLM-only judgments accept?
RQ4Is LeetCode-C-Spec-200 a suitable benchmark for memory-aware formal specification generation in C?
RQ5What is the impact of combining neural generation with symbolic validation on correctness and verification readiness?

主な発見

Category	Count	Percentage (%)
Easy	62	31.0
Medium	97	48.5
Hard	41	20.5
Integer/Number	147	73.5
Array	75	37.5
String	53	26.5
Tree	41	20.5
Linked List	14	7.0

Iterative refinement substantially improves syntactic validity of LLM-generated specifications.
Symbolic prover-based refutation via Coq significantly enhances correctness assessment by filtering false positives.
The integrated neuro-symbolic workflow improves memory-aware function specification generation for C programs.
LeetCode-C-Spec-200 provides a challenging benchmark with diverse data structures and algorithmic complexity for evaluation.
Coq hammer and automated theorem provers help validate postconditions and detect specification errors.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。