Skip to main content
QUICK REVIEW

[論文レビュー] Introns and Templates Matter: Rethinking Linkage in GP-GOMEA

Johannes Koch, Tanja Alderliesten|arXiv (Cornell University)|Feb 2, 2026
Bioinformatics and Genomic Networks被引用数 0
ひとこと要約

The paper introduces two new linkage-learning measures for GP-GOMEA that account for introns and template structure, improving symbolic regression performance across multiple benchmarks. It also analyzes how these measures align with the template and compares static vs randomized linkage trees.

ABSTRACT

GP-GOMEA is among the state-of-the-art for symbolic regression, especially when it comes to finding small and potentially interpretable solutions. A key mechanism employed in any GOMEA variant is the exploitation of linkage, the dependencies between variables, to ensure efficient evolution. In GP-GOMEA, mutual information between node positions in GP trees has so far been used to learn linkage. For this, a fixed expression template is used. This however leads to introns for expressions smaller than the full template. As introns have no impact on fitness, their occurrences are not directly linked to selection. Consequently, introns can adversely affect the extent to which mutual information captures dependencies between tree nodes. To overcome this, we propose two new measures for linkage learning, one that explicitly considers introns in mutual information estimates, and one that revisits linkage learning in GP-GOMEA from a grey-box perspective, yielding a measure that needs not to be learned from the population but is derived directly from the template. Across five standard symbolic regression problems, GP-GOMEA achieves substantial improvements using both measures. We also find that the newly learned linkage structure closely reflects the template linkage structure, and that explicitly using the template structure yields the best performance overall.

研究の動機と目的

  • Motivate the need for better linkage learning in GP-GOMEA to produce smaller, interpretable symbolic expressions.
  • Investigate how inactive variables (introns) distort traditional mutual information-based linkage estimates.
  • Propose two new measures: one that masks introns in MI and one that uses template-driven grey-box linkage.
  • Evaluate the proposed measures on standard symbolic regression datasets to assess performance gains and learned linkage structure.

提案手法

  • Describe GP-GOMEA with fixed template mapping decision variables to tree nodes.
  • Introduce two new linkage learning approaches: masking introns in MI and template-based grey-box linkage using a node proximity measure.
  • Construct a Family of Subsets (FOS) via hierarchical clustering of a similarity matrix to form a Linkage Tree.
  • Compare multiple linkage measures against baselines through experiments with varying template heights and linear scaling.
  • Use IMS vs fixed-size population to isolate effects of linkage learning on performance.
  • Report training R^2 and bootstrapped confidence intervals to assess statistical significance.

実験結果

リサーチクエスチョン

  • RQ1How do introns affect mutual information-based linkage learning in GP-GOMEA?
  • RQ2Can masking introns in MI improve the accuracy of learned linkage structures?
  • RQ3Can a template-informed, grey-box linkage measure yield better performance than data-driven MI-based measures?
  • RQ4How do different linkage measures compare across standard symbolic regression benchmarks?
  • RQ5What is the relationship between learned linkage structures and the fixed template structure?

主な発見

  • Masking introns in MI (MI_masked) and a template-informed node proximity measure outperform traditional MI and random linkage in training accuracy.
  • The node proximity-based (Node) measure achieves the best performance across problems and settings, often faster than other methods.
  • Static linkage trees (reusing the same LT) perform worse than randomized LT-building, suggesting benefits from linkage variability across generations.
  • Performance generally improves with larger templates and with linear scaling enabled.
  • The newly learned linkage structures near the template linkage, and explicit template-based approaches yield the best overall results.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。