[论文解读] Leveraging Large Language Models (LLMs) for Process Mining (Technical Report)
论文研究利用大型语言模型(GPT-4 和 Bard)通过将日志和模型转化为文本抽象,并应用多种提示策略来分析过程挖掘工件。
This technical report describes the intersection of process mining and large language models (LLMs), specifically focusing on the abstraction of traditional and object-centric process mining artifacts into textual format. We introduce and explore various prompting strategies: direct answering, where the large language model directly addresses user queries; multi-prompt answering, which allows the model to incrementally build on the knowledge obtained through a series of prompts; and the generation of database queries, facilitating the validation of hypotheses against the original event log. Our assessment considers two large language models, GPT-4 and Google's Bard, under various contextual scenarios across all prompting strategies. Results indicate that these models exhibit a robust understanding of key process mining abstractions, with notable proficiency in interpreting both declarative and procedural process models. In addition, we find that both models demonstrate strong performance in the object-centric setting, which could significantly propel the advancement of the object-centric process mining discipline. Additionally, these models display a noteworthy capacity to evaluate various concepts of fairness in process mining. This opens the door to more rapid and efficient assessments of the fairness of process mining event logs, which has significant implications for the field. The integration of these large language models into process mining applications may open new avenues for exploration, innovation, and insight generation in the field.
研究动机与目标
- 推动将大型语言模型与传统和面向对象的过程挖掘工件相结合。
- 为过程挖掘工件开发文本抽象,使 LLM 能理解。
- 评估提示策略(直接回答、多提示回答、数据库查询生成)在 LLM 中的效果。
- 评估 LLM 对过程挖掘中的过程性和宣告性模型以及公平性概念的表现。
提出的方法
- 创建过程挖掘工件的文本编码(DFG、Petri 网、OC-DFG、DECLARE、时间特征)。
- 描述并实现提示策略:直接回答、多提示回答,以及生成数据库查询以用事件日志验证假设。
- 在不同上下文和工件类型下实验两种 LLM(GPT-4 和 Google Bard)。
- 讨论预处理和缩略策略,以适应复杂模型的上下文窗口约束(如 DECLARE 模型)。
- 概述特征提取方法,将事件日志转换为机器学习任务的数值输入(独热编码、聚合、序列、n-gram、嵌入)。
- 在过程挖掘中讨论公平性概念,包括个体、群体、过程性和反事实公平性,并讨论公开数据用于公平性评估的可用性。
实验结果
研究问题
- RQ1当文本编码后,LLMs 能否有效理解和推理传统与面向对象的过程挖掘工件?
- RQ2哪些提示策略能够为使用 LLM 的过程挖掘任务提供稳健分析和假设验证?
- RQ3LLMs 能在多大程度上支持过程模型(过程性、宣告性和面向对象)并评估事件日志中的公平性?
主要发现
- LLMs(GPT-4 与 Bard)对关键过程挖掘抽象具有稳健的理解能力,能够解释宣告性和过程性模型。
- LLMs 在面向对象的设置中表现良好,支持面向对象过程挖掘的进步。
- 文本抽象使 LLM 能推理直接后继图、Petri 网和 DECLARE/时态模型。
- 包括多提示和数据库查询生成在内的提示策略有助于对原始事件日志进行假设验证。
- LLMs 展现出评估过程挖掘中的公平性概念的能力,提示快速公平性评估的潜力。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。