Skip to main content
QUICK REVIEW

[Paper Review] LogFold: Compressing Logs with Structured Tokens and Hybrid Encoding

Shiwen Shan, Yintong Huo|arXiv (Cornell University)|Mar 21, 2026
Software System Performance and Reliability0 citations
TL;DR

LogFold introduces a skeleton-aware structured-token analysis and a type-aware hybrid encoding pipeline to compress logs, outperforming state-of-the-art baselines across 16 public datasets.

ABSTRACT

Logs are essential for diagnosing failures and conducting retrospective studies, leading many software organizations to retain log messages for a long time. Nevertheless, the volume of generated log data grows rapidly as software systems grow, necessitating an effective compression method. Apart from general-purpose compressors (e.g., Gzip, Bzip2), many recent studies developed log-specific compression algorithms, but they offer suboptimal performance because of (1) overlooking redundancies within certain complex tokens, and (2) lacking a fine-grained encoding strategy for diverse token types. This work uncovers a new redundancy pattern in structured tokens and proposes a new type-aware encoding strategy to improve log compression. Building on this insight, we introduce LogFold, a novel log compression method consisting of four components: a token analyzer to classifies tokens as structured, unstructured, or static types; a processor that mines recurring patterns within structured tokens based on their delimiter skeletons; a hybrid encoder that tailors data representation according to token types; and a packer that compresses the output into an archive file. Extensive experiments on 16 public log datasets demonstrate that LogFold surpasses state-of-the-art baselines, achieving average compression ratio improvements by 11.11%, with a compression speed of 9.842 MB/s. Ablation studies further indicate the importance of each component. We also conduct sensitivity analyses to verify LogFold's robustness and stability across various internal settings.

Motivation & Objective

  • Identify redundancies in structured tokens within logs to improve compression.
  • Propose a four-component pipeline (token analyzer, structured token processor, hybrid encoder, packer) for efficient log compression.
  • Develop a type-aware encoding strategy that tailors encoding to numeric, string, and mixed-type tokens.
  • Evaluate LogFold on diverse public log datasets and compare with state-of-the-art log compressors and general-purpose compressors.

Proposed method

  • Token Analyzer classifies tokens as structured, unstructured, or static for each log entry.
  • Structured Token Processor performs Delimiter Skeleton-aware Grouping and Pattern Mining to extract intra-token redundancies.
  • Hybrid Encoder applies optimized numeric encoding, dictionary encoding, and mixed-type encoding tailored to token types.
  • Packer aggregates intermediate outputs and applies a general-purpose compressor to produce the final archive.
  • Decompressor reverses the pipeline to ensure lossless recovery.
Figure 1. The general log compressor paradigm.
Figure 1. The general log compressor paradigm.

Experimental results

Research questions

  • RQ1RQ1: How well does LogFold improve log compression?
  • RQ2RQ2: How do different components contribute to LogFold’s effectiveness?
  • RQ3RQ3: How sensitive is LogFold to its internal parameter settings?
  • RQ4RQ4: How generalizable is LogFold across different zip tools with different compression levels?
  • RQ5RQ5: How does LogFold perform in log decompression?

Key findings

  • LogFold achieves an average compression ratio improvement of 11.11% over state-of-the-art baselines on 16 public datasets.
  • LogFold achieves a compression speed of 9.842 MB/s.
  • LogFold outperforms nine baseline compressors across the evaluation datasets and attains the best compression on 12 of 16 datasets.
  • Ablation studies show the contribution of each component (token analyzer, structured token processor, hybrid encoder, packer).
  • Sensitivity analyses confirm LogFold’s robustness and stability across internal settings.
Figure 2. Examples of structured tokens.
Figure 2. Examples of structured tokens.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.