QUICK REVIEW

[論文レビュー] MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data

Tianyu Liu, Yuge Wang|arXiv (Cornell University)|Sep 29, 2023

Bioinformatics and Genomic Networks被引用数 12

ひとこと要約

MuSe-GNNは、マルチモーダル遺伝子グラフ間で重みを共有し、再構成、加重類似性、およびコントラスト学習を用いてデータセット横断の遺伝子機能類似性を捉える統一遺伝子埋め込みを学習します。組織とモダリティ横断でベースラインを上回ります。

ABSTRACT

Discovering genes with similar functions across diverse biomedical contexts poses a significant challenge in gene representation learning due to data heterogeneity. In this study, we resolve this problem by introducing a novel model called Multimodal Similarity Learning Graph Neural Network, which combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data. Leveraging 82 training datasets from 10 tissues, three sequencing techniques, and three species, we create informative graph structures for model training and gene representations generation, while incorporating regularization with weighted similarity learning and contrastive learning to learn cross-data gene-gene relationships. This novel design ensures that we can offer gene representations containing functional similarity across different contexts in a joint space. Comprehensive benchmarking analysis shows our model's capacity to effectively capture gene function similarity across multiple modalities, outperforming state-of-the-art methods in gene representation learning by up to 97.5%. Moreover, we employ bioinformatics tools in conjunction with gene representations to uncover pathway enrichment, regulation causal networks, and functions of disease-associated or dosage-sensitive genes. Therefore, our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.

研究の動機と目的

組織、シーケンス技術、および種を跨ぐ統一遺伝子表現を学習するために、マルチモーダル生物学データのデータヘテロジェネティを解消する。
複数の遺伝子共発現グラフを結合して共同埋め込み空間に統合する、重み共有GNNフレームワークを開発する。
加重類似性とコントラスト学習で埋め込みを正規化し、データセット間の遺伝子機能関係を明らかにする。
経路解析・疾患研究・遺伝子機能予測などの下流タスクにおける埋め込みの実用性向上を示す。

提案手法

各データセットについて高分散遺伝子グラフを構築し、HVGsとノイズ耐性処理（scTransform、CS-CORE、SPARK-X）を用いる。
データセット特異的および共有GT層を備えたクロスグラフTransformerを用いてグラフ間で重み共有を強制する。
データセット特異的MLPデコーダーでノード埋め込みをデコードし、共発現ネットワークを再構成する（Graph Auto-encoder風）
複数成分の損失で学習する：グラフ再構成BCE損失、共通HVGs間の加重コサイン類似性損失、およびグラフ間整合性の自己教師付きInfoNCE対比損失。
可変ウェイトλcを用いてBCE、加重コサイン類似性、InfoNCE損失を組み合わせる最終目的関数を採用する。
組織ごとに6つのパフォーマンス指標と複数の生物学的ベンチマークで評価する。

Figure 1: The workflow of MuSe-GNN, the visualization of gene embeddings, and the problems of two exitsing methods, GIANT and Gene2vec. (a) The process of learning gene embeddings by MuSe-GNN. Here we highlight the difference between single-cell data and spatial data, and the major applications of g

実験結果

リサーチクエスチョン

RQ1異種のマルチモーダル生物学グラフから統一遺伝子埋め込み空間を学習し、データセット間で機能類似性を保てるか？
RQ2データセット特異的グラフトランスフォーマー間での重み共有は、非共有ベースラインと比較してデータ間の遺伝子クラスタリングを改善するか？
RQ3加重類似性とコントラスト損失は、データセット間で既知の遺伝子機能グループと経路の回復を改善するか？
RQ4学習した埋め込みは組織と疾患を跨ぐ遺伝子機能予測や経路解析などの下流タスクを改善するか？

主な発見

方法	Heart	Lung	Liver	Kidney	Thymus	Spleen	Pancreas	Cerebrum	Cerebellum	PBMC
PCA	0.52	0.48	0.56	0.47	0.56	0.60	0.51	0.62	0.53	0.51
Gene2vec	0.40	0.37	0.33	0.29	0.21	0.31	0.24	0.27	0.31	0.19
GIANT	0.50	0.40	0.33	0.38	0.58	0.33	0.56	0.29	0.28	0.28
WSMAE	0.50	0.47	0.54	0.46	0.57	0.53	0.52	0.55	0.59	0.50
GAE	0.61	0.45	0.58	0.40	0.56	0.58	0.52	0.56	0.60	0.54
VGAE	0.64	0.32	0.33	0.38	0.56	0.31	0.33	0.41	0.33	0.47
MAE	0.36	0.47	0.50	0.45	0.41	0.52	0.39	0.50	0.49	0.50
scBERT	0.41	0.49	0.55	0.62	0.17	0.58	0.46	0.60	0.61	0.58
MuSe-GNN	0.77	0.96	0.92	0.89	0.89	0.94	0.80	0.95	0.90	0.92

MuSe-GNNは組織横断で6つの評価指標においてベースラインを上回り、総合評価で最大97.5%の改善を達成。
心臓および肺では、MuSe-GNNはそれぞれ2番目に優れた手法より20.1%および97.5%高い。
MuSe-GNNの埋め込みはマルチモーダルデータを共埋め込み空間に統合し、モダリティを横断して共有される機能的遺伝子クラスタを明らかにする。
EmbeddingsにおけるGOEAおよびIPA分析は、生物学的に意味のある経路、因果ネットワーク、および疾患関連を特定する。
MuSe-GNN埋め込みを用いた遺伝子機能予測は、Geneformerや生データなどのベースラインより高い精度を達成する。

Figure 2: The overall model architecture and the design of loss functions for MuSe-GNN. The color of nodes in the green block represents common/different genes across two datasets. The brown block represents the network architecture of MuSe-GNN, and the blue block represents different loss function

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。