QUICK REVIEW

[Paper Review] Automatically Mitigating Vulnerabilities in Binary Programs via Partially Recompilable Decompilation

Pemma Reiter, Hui Jun Tay|arXiv (Cornell University)|Feb 24, 2022

Advanced Malware Detection Techniques2 citations

TL;DR

This paper proposes Partially Recompilable Decompilation (PRD), a novel method that decompiles only vulnerable functions from binary programs into recompilable source code, enabling automated patching via source-level repair tools. PRD achieves 92.9% test-equivalence and successfully decompiles 70–89% of individual functions, outperforming full-binary decompilation (1.7% success) and enabling APR tools to patch CGC binaries with performance matching full-source tools.

ABSTRACT

Decompilation is the process of translating compiled code into high-level code. Control flow recovery is a challenging part of the process. "Misdecompilations" can occur, whereby the decompiled code does not accurately represent the semantics of the compiled code, despite it being syntactically valid. This is problematic because it can mislead users who are trying to reason about the program. We present CFG-based program generation: a novel approach to randomised testing that aims to improve the control flow recovery of decompilers. CFG-based program generation involves randomly generating control flow graphs (CFGs) and paths through each graph. Inspired by prior work in the domain of GPU computing, (CFG, path) pairs are "fleshed" into test programs. Each program is decompiled and recompiled. The test oracle verifies whether the actual runtime path through the graph matches the expected path. Any difference in the execution paths after recompilation indicates a possible misdecompilation. A key benefit of this approach is that it is largely independent of the source and target languages in question because it is focused on control flow. The approach is therefore applicable to numerous decompilation settings. The trade-off resulting from the focus on control flow is that misdecompilation bugs that do not relate to control flow (e.g. bugs that involve specific arithmetic operations) are out of scope. We have implemented this approach in FuzzFlesh, an open-source randomised testing tool. FuzzFlesh can be easily configured to target a variety of low-level languages and decompiler toolchains because most of the CFG and path generation process is language-independent. At present, FuzzFlesh supports testing decompilation of Java bytecode, .NET assembly and x86 machine code. In addition to program generation, FuzzFlesh also includes an automated test-case reducer that operates on the CFG rather than the low-level program, which means that it can be applied to any of the target languages. We present a large experimental campaign applying FuzzFlesh to a variety of decompilers, leading to the discovery of 12 previously-unknown bugs across two language formats, six of which have been fixed. We present experiments comparing our generic FuzzFlesh tool to two state-of-the-art decompiler testing tools targeted at specific languages. As expected, the coverage our generic FuzzFlesh tool achieves on a given decompiler is lower than the coverage achieved by a tool specifically designed for the input format of that decompiler. However, due to its focus on control flow, FuzzFlesh is able to cover sections of control flow recovery code that the targeted tools cannot reach, and identify control flow related bugs that the targeted tools miss.

Motivation & Objective

To address the challenge of patching software vulnerabilities when source code is unavailable, especially in post-deployment binaries.
To overcome the limitations of full-binary decompilation, which fails due to scalability and recompilation issues.
To enable the use of high-fidelity, source-level Automated Program Repair (APR) tools on binary programs through partial, recompilable decompilation.
To validate that decompiled source from only a few functions can support effective, test-equivalent binary patching.
To demonstrate that PRD enables APR tools to achieve performance comparable to full-source tools on binary-only inputs.

Proposed method

Use binary fault localization (CGFL) to identify a small set of functions likely to contain vulnerabilities.
Apply a decompiler to lift only the suspect functions into high-level, recompilable C/C++ source code, focusing on type recovery and function boundaries.
Construct binary-source interfaces to enable integration between decompiled source and the original binary, preserving execution semantics.
Apply source-level APR tools (e.g., Prophet, GenProg) to generate patches on the decompiled source code.
Use binary rewriting and recompilation techniques to integrate the patched source back into the original binary, ensuring test-equivalence.
Leverage minimal type inference—only offset and referenced types are recovered—reducing reliance on complete, sound type inference.

Experimental results

Research questions

RQ1Can decompilation of individual functions from binaries produce recompilable source code without grammar or compilation restrictions?
RQ2To what extent can source-level APR tools be effectively applied to binary programs via partial decompilation?
RQ3How well does PRD preserve behavioral equivalence between original and patched binaries?
RQ4Can PRD enable APR tools to achieve performance on binaries comparable to their performance when operating on full source code?
RQ5How generalizable is PRD to real-world binaries, different programming languages (C/C++), and diverse vulnerability types?

Key findings

PRD successfully decompiled and recompiled 70–89% of individual functions when sufficient type recovery was achieved, compared to only 1.7% success rate for full C-binaries.
When decompilation succeeded, PRD produced test-equivalent binaries 92.9% of the time, confirming behavioral fidelity.
APR tools integrated with PRD mitigated 85 out of 148 vulnerabilities in DARPA CGC binaries, matching or exceeding the performance of full-source APR tools.
PRD-enabled APR tools sometimes produced higher-quality patches than those generated by top CGC teams, demonstrating competitive repair quality.
The approach generalizes across datasets, including CGC, Rode0Day, and MITRE CVE, and supports C++ and stripped binaries with decompiler support.
The method reduces reliance on complete type inference by focusing only on function offsets and referenced types, making it more scalable and practical.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.