QUICK REVIEW

[論文レビュー] Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

Xingyu Shen, Tommy Duong|arXiv (Cornell University)|Feb 23, 2026

Face recognition and analysis被引用数 0

ひとこと要約

この論文は、ひげ、白髪、メイク、しわといった簡易的な cosmetic な modifications が、 minor から adult へと AI の年齢推定を誤解させることができるかを、8つのモデルにまたがるスケーラブルな VLM ベースの攻撃シミュレーションで検証します。

ABSTRACT

Age estimation systems are increasingly deployed as gatekeepers for age-restricted online content, yet their robustness to cosmetic modifications has not been systematically evaluated. We investigate whether simple, household-accessible cosmetic changes, including beards, grey hair, makeup, and simulated wrinkles, can cause AI age estimators to classify minors as adults. To study this threat at scale without ethical concerns, we simulate these physical attacks on 329 facial images of individuals aged 10 to 21 using a VLM image editor (Gemini 2.5 Flash Image). We then evaluate eight models from our prior benchmark: five specialized architectures (MiVOLO, Custom-Best, Herosan, MiViaLab, DEX) and three vision-language models (Gemini 3 Flash, Gemini 2.5 Flash, GPT-5-Nano). We introduce the Attack Conversion Rate (ACR), defined as the fraction of images predicted as minor at baseline that flip to adult after attack, a population-agnostic metric that does not depend on the ratio of minors to adults in the test set. Our results reveal that a synthetic beard alone achieves 28 to 69 percent ACR across all eight models; combining all four attacks shifts predicted age by +7.7 years on average across all 329 subjects and reaches up to 83 percent ACR; and vision-language models exhibit lower ACR (59 to 71 percent) than specialized models (63 to 83 percent) under the full attack, although the ACR ranges overlap and the difference is not statistically tested. These findings highlight a critical vulnerability in deployed age-verification pipelines and call for adversarial robustness evaluation as a mandatory criterion for model selection.

研究の動機と目的

AI の年齢推定システムの低コスト・物理的 cosmetic 修飾への頑健性を評価する。
覆い隠し/回避リスクを、人口に依らない指標（Attack Conversion Rate, ACR）で定量化する。
cosmetic 攻撃下で専門的 CV 年齢推定モデルとゼロショット視覚言語モデルの耐性を比較する。
年齢検証導入時のモデル選択と安全評価に向けた実用的な指針を提供する。

提案手法

8つのデータセットから年齢10–21の顔画像329件のテストセットを構築する。
Gemini 2.5 Flash Image Editor を用いて、ひげ、白髪、メイク、しわの4つの cosmetic 攻撃をシミュレートし、非空集合をすべて作成（15個）する。
アーティファクトを最小化する優先度加重ピクセル-delta ブレンディング戦略を用いて組み合わせ攻撃を生成する。
8つのモデルを評価する（5つの CV アーキテクチャ：MiVOLO、Custom-Best、Herosan、MiViaLab、DEX；3つの VLM：Gemini 3 Flash、Gemini 2.5 Flash、GPT-5-Nano）。
攻撃後にベースラインの未成年を成人として再分類する割合として Attack Conversion Rate（ACR）を定義し、 ground-truth の年齢分布に依存しない。
各モデルと攻撃タイプごとに、平均年齢シフト Δȳ と攻撃後の MAE を ACR と併せて報告する。

実験結果

リサーチクエスチョン

RQ1低コストの cosmetic 修飾は、未成年である individual's を AI の年齢推定で成人として分類させるうえでどれほど効果的か？
RQ2視覚言語モデル（VLM）は、 cosmetic 攻撃下で専門的 CV 年齢推定モデルと比べて頑健性が異なるか？
RQ3複数の cosmetic 攻撃を組み合わせることの年齢推定と回避率への付加効果はどれくらいか？
RQ4どのようなモデル特性（基礎の未成年率、アーキテクチャ）が cosmetic 攻撃への脆弱性と相関するか？

主な発見

synthetic なひげのみで、8モデル全体で ACR が 28–69%、推定年齢を +1.6 〜 +5.1 年にシフトさせる。
4種すべての攻撃を組み合わせた場合、平均年齢シフトは +7.7 年、最大で 83% の ACR を達成し、基礎未成年予測の大半を回避。
視覚言語モデルは full attack で 59–71% の ACR、専門モデルは 63–83% の ACR に達するが、レンジは重なるものの差は統計的検定で評価されていない。
全攻撃のモデル平均 ACR は 68.9%（範囲 59.4–82.9%、GPT-5-Nano 59.4%、DEX 82.9%）。
しわとメイクは異なる効果を示す：しわは 19–34% の ACR、メイクは高い ACR（29–49%）を示すが、平均年齢シフトはほぼゼロであり、全体的な年齢の上昇というよりは境界を跨ぐ挙動を示す。
実年齢が若いほど回避は難しく、10–12 は回避が難しく、15–17 が脆弱性のピーク。基礎予測が18歳境界付近にあるため。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。