[논문 리뷰] Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision
SOSecure uses retrieval of Stack Overflow security discussions after code generation to guide an LLM in revising potentially unsafe code, improving security without retraining the model.
Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development, yet their limited transparency in security reasoning and brittleness to evolving vulnerability patterns raise critical trustworthiness concerns. Models trained on static datasets cannot readily adapt to newly discovered vulnerabilities or changing security standards without retraining, leading to the repeated generation of unsafe code. We present a principled approach to trustworthy code generation by design that operates as an inference-time safety mechanism. Our approach employs retrieval-augmented generation to surface relevant security risks in generated code and retrieve related security discussions from a curated Stack Overflow knowledge base, which are then used to guide an LLM during code revision. This design emphasizes three aspects relevant to trustworthiness: (1) interpretability, through transparent safety interventions grounded in expert community explanations; (2) robustness, by allowing adaptation to evolving security practices without model retraining; and (3) safety alignment, through real-time intervention before unsafe code reaches deployment. Across real-world and benchmark datasets, our approach improves the security of LLM-generated code compared to prompting alone, while introducing no new vulnerabilities as measured by static analysis. These results suggest that principled, retrieval-augmented inference-time interventions can serve as a complementary mechanism for improving the safety of LLM-based code generation, and highlight the ongoing value of community knowledge in supporting trustworthy AI deployment.
연구 동기 및 목표
- Motivate trustworthy code generation by enabling inference-time safety without retraining.
- Leverage community knowledge from Stack Overflow to surface security concerns related to code patterns.
- Evaluate whether post-generation retrieval improves security outcomes while avoiding new vulnerabilities.
- Demonstrate interpretability, robustness, and safety alignment through a retrieval-based revision mechanism.
제안 방법
- Construct a security-focused Stack Overflow knowledge base filtered for explicit security discussions.
- Use BM25 to retrieve top-k (k=5) relevant security discussions based on lexical similarity to the generated code.
- Provide retrieved discussions as advisory context in a revision prompt without injecting code or enforcing changes.
- Prompt the LLM to decide if and how to revise the code, allowing no-change as a valid option.
- Evaluate using static analysis tools (CodeQL and Bandit) across multiple datasets to measure fix and introduction rates.
- Demonstrate that retrieval-based revision is model-agnostic and does not require retraining.

실험 결과
연구 질문
- RQ1Does retrieval-augmented revision using Stack Overflow discussions improve the security of LLM-generated code compared to prompting alone?
- RQ2Can inference-time guidance from community explanations reduce vulnerabilities without introducing new ones across multiple datasets and languages?
- RQ3Is the approach robust to evolving security practices without retraining the model?
주요 결과
| Dataset | Prompt-only | GPT-4+CWE | SOSecure | Δ Fix Rate | Intro Rate (SOSecure) | SALLM | LLMSecEval | LMSys | ||
|---|---|---|---|---|---|
| SALLM | 49.1% | 58.5% | 71.7% | +22.6% | 0.0% |
| LLMSecEval | 56.5% | 69.6% | 91.3% | +34.8% | 0.0% |
| LMSys | 37.5% | 45.8% | 96.7% | +59.2% | 0.0% |
- SOSecure improves fix rates substantially over prompt-only generation across all datasets (e.g., SALLM: +22.6 percentage points; LMSys: +59.2 points).
- SOSecure generally outperforms the vulnerability-label baseline GPT-4+CWE, indicating that explanations add value beyond labels.
- Across datasets, SOSecure introduces no new vulnerabilities as measured by static analysis.
- Ablation shows that retrieval of community discussions, not self-revisions alone, drives the large gains in security.
- On C code, SOSecure achieves a Fix Rate of 73.3% vs 53.3% (Prompt-only) and 60.0% (GPT-4+CWE), with no new vulnerabilities.
- The approach works across languages and suggests robustness to evolving security practices without retraining.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.