QUICK REVIEW

[論文レビュー] FAIRT2V: Training-Free Debiasing for Text-to-Video Diffusion Models

Haonan Zhong, Wei Song|arXiv (Cornell University)|Jan 28, 2026

Generative Adversarial Networks and Image Synthesis被引用数 0

ひとこと要約

要約：FairT2Vは、アンカーを基点とした球面大 geodesic 変換によってプロンプト埋め込みのエンコーダー誘発の性別偏りを中和し、動的なデnoising スケジュールで時間的一貫性を preserving することで、テキストからビデオ拡散の出力をトレーニングなしでデバイアスするフレームワークを導入します。

ABSTRACT

Text-to-video (T2V) diffusion models have achieved rapid progress, yet their demographic biases, particularly gender bias, remain largely unexplored. We present FairT2V, a training-free debiasing framework for text-to-video generation that mitigates encoder-induced bias without finetuning. We first analyze demographic bias in T2V models and show that it primarily originates from pretrained text encoders, which encode implicit gender associations even for neutral prompts. We quantify this effect with a gender-leaning score that correlates with bias in generated videos. Based on this insight, FairT2V mitigates demographic bias by neutralizing prompt embeddings via anchor-based spherical geodesic transformations while preserving semantics. To maintain temporal coherence, we apply debiasing only during early identity-forming steps through a dynamic denoising schedule. We further propose a video-level fairness evaluation protocol combining VideoLLM-based reasoning with human verification. Experiments on the modern T2V model Open-Sora show that FairT2V substantially reduces demographic bias across occupations with minimal impact on video quality.

研究の動機と目的

テキストから動画拡散モデルにおける人口統計的バイアスの源を特定し、プロンプトの性別バイアスに焦点を当てる。
プロンプトの意味論とビデオ生成の時間的一貫性を保持するトレーニング不要のデバイアス除去手法を開発する。
人口バイアスの低減を定量化し、ビデオ中心の公正性評価プロトコルを用いてビデオ品質への影響を評価する。

提案手法

テキスト条件付け経路の性別バイアスを分析し、ニュートラルなプロンプトの性別寄与度を評価するスコアを定義する。
単位超球面上でニュートラルなデバイアスドプロンプト埋め込みを得るために、アンカー基準の球面測地変換を導入する。
多数派/少数派アンカーとの角度的近接度に基づいて適応デバイアンス強度 lambda* を計算し、人口統計軸に沿ってデバイアスを適用する。
早い identity-forming 手順のみでデバイアス埋め込みを適用する動的デノイジングスケジュールを用い、時間的一貫性を保持する。
VideoLLM を用いた公正性評価プロトコルを人間検証で補完してビデオレベルの公正性を評価する。
conditioning に CLIP ベースのテキストエンコーダを使用し、エンコーダ間のロバスト性（CLIP 対 T5）を検討する。

Figure 1 : Bias source analysis in text-to-video generation. Neutral prompts are encoded by the text encoder (e.g., CLIP) into embeddings aligned with gender-associated directions, revealing implicit demographic bias in the text-conditioning space.

実験結果

リサーチクエスチョン

RQ1テキストから動画拡散モデルの人口統計的バイアスはどこから生じるのか？
RQ2トレーニング不要な埋め込みレベルのデバイアス除去で、ビデオ品質を損なうことなく性別バイアスを低減できるか？
RQ3動的スケジューリングはT2V 出力のバイアス緩和と時間的一貫性にどのような影響を与えるか？
RQ4T2V 系統の効果的なビデオレベルの公正性評価プロトコルとは何か？
RQ5どのテキストエンコーダが意味的忠実度を損なうことなく頑健なデバイアス除去をサポートするか？

主な発見

T2V における人口統計的バイアスは主に、 neutral なプロンプトにも暗黙の性別関連を埋め込む事前学習済みテキストエンコーダから発生する。
FairT2V はアンカー基準の球面測地変換を用いて、職業特有の性別軸に沿ったニュートラル点へとプロンプト埋め込みを導くことで、エンコーダー誘発のバイアスを低減する。
動的デノイジングスケジュールは初期の identity-forming 手順にデバイアスを限定し、時間的一貫性を保持し、フレームレベルのアーティファクトを減少させる。
トレーニングなしのベースラインと比較して、FairT2V はバイアス低減とビデオ品質の維持（特に時間的一貫性指標）との間のバランスが改善される。
CLIP ベースの埋め込みは、T5 などの他の選択肢よりも、デバイアス除去の有効性とビデオ品質の安定したトレードオフを提供する。
VideoLLM と人間検証を組み合わせたビデオレベルの公正性評価は、フレームレベル手法を超える信頼性のあるバイアス評価を提供する。

Figure 2 : Gender-leaning scores ( Equation 5 ) from the CLIP text encoder for 16 occupations, using the prompt sets in Equation 3 .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。