QUICK REVIEW

[论文解读] Sola-Visibility-ISPM: Benchmarking Agentic AI for Identity Security Posture Management Visibility

Gal Engelberg, Konstantin Koutsyi|arXiv (Cornell University)|Jan 11, 2026

Software System Performance and Reliability被引用 0

一句话总结

引入 Sola Visibility ISPM 基准测试，用于在 AWS、Okta 和 Google Workspace 的真实生产级 ISPM 任务中评估具备代理能力的 AI，Sola AI Agent 进行数据支撑的查询并给出可验证、基于证据的答案。

ABSTRACT

Identity Security Posture Management (ISPM) is a core challenge for modern enterprises operating across cloud and SaaS environments. Answering basic ISPM visibility questions, such as understanding identity inventory and configuration hygiene, requires interpreting complex identity data, motivating growing interest in agentic AI systems. Despite this interest, there is currently no standardized way to evaluate how well such systems perform ISPM visibility tasks on real enterprise data. We introduce the Sola Visibility ISPM Benchmark, the first benchmark designed to evaluate agentic AI systems on foundational ISPM visibility tasks using a live, production-grade identity environment spanning AWS, Okta, and Google Workspace. The benchmark focuses on identity inventory and hygiene questions and is accompanied by the Sola AI Agent, a tool-using agent that translates natural-language queries into executable data exploration steps and produces verifiable, evidence-backed answers. Across 77 benchmark questions, the agent achieves strong overall performance, with an expert accuracy of 0.84 and a strict success rate of 0.77. Performance is highest on AWS hygiene tasks, where expert accuracy reaches 0.94, while results on Google Workspace and Okta hygiene tasks are more moderate, yet competitive. Overall, this work provides a practical and reproducible benchmark for evaluating agentic AI systems in identity security and establishes a foundation for future ISPM benchmarks covering more advanced identity analysis and governance tasks.

研究动机与目标

在真实企业 IAM、IdP 与 SaaS 数据源中定义并实现 ISPM 可视化基准测试。
引入 Sola AI Agent 作为工具使用、数据支撑的 ISPM 查询助手。
提供可复现的评估框架，包含实时数据与结构化的证据产物。
对跨多平台的 77 个数据支撑的 ISPM 问题量化 agentic AI 的性能。
为更广泛的 ISPM 基准测试（包括治理与高级身份分析）奠定基础。

提出的方法

以实际企业三元组：AWS、Okta、Google Workspace 为基准测试的锚点。
将最佳实践的身份安全规则转化为数据边界的 ISPM 问题。
使用 Sola AI Agent 的快速路径与全路径执行模式来回答问题。
以端到端追踪、证据包和专家/LLM 评判进行评估。
以多维度指标呈现结果，包括准确性、成功率与推理质量。

Figure 1: SOLA AI Agent Conceptual Architecture

实验结果

研究问题

RQ1代理能力强的 AI 如何从真实企业身份数据中解释并回答 ISPM 可视性问题？
RQ2快速路径与全路径推理在 ISPM 任务中的优点与局限性是什么？
RQ3在卫生与清单问题方面，AWS、Google Workspace、Okta 的表现差异如何？
RQ4示例自适应与模式 grounding 在正确性与 grounding 上有多大影响？
RQ5该基准测试是否能为未来的 ISPM 任务提供可重复、基于证据的评估？

主要发现

该代理在 77 个问题上的总体专家级准确率为 0.84，严格成功率为 0.77。
AWS 卫生任务的专家级准确率最高（0.95）和成功率最高（0.90）。
Okta 卫生表现中等，专家级准确率 0.65，成功率 0.50。
Google Workspace 卫生准确率 0.75，成功率 0.71。
快速路径推理在整体上表现强劲，AWS 与清单域的结果更一致，但在不同域间波动较大。
全路径推理在各域上提供更稳定的正确性，尤其是对于复杂卫生任务。

Figure 2: Sola ISPM Visibility Evaluation Framework

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。