[论文解读] SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses
SafeLoad 是一种可 admission 控制框架,通过混合全局与集群模型识别云数据仓库中的内存超载查询,具自调节配额机制,并推出新的 MO 标注基准 SafeBench。
Memory overload is a common form of resource exhaustion in cloud data warehouses. When database queries fail due to memory overload, it not only wastes critical resources such as CPU time but also disrupts the execution of core business processes, as memory-overloading (MO) queries are typically part of complex workflows. If such queries are identified in advance and scheduled to memory-rich serverless clusters, it can prevent resource wastage and query execution failure. Therefore, cloud data warehouses desire an admission control framework with high prediction precision, interpretability, efficiency, and adaptability to effectively identify MO queries. However, existing admission control frameworks primarily focus on scenarios like SLA satisfaction and resource isolation, with limited precision in identifying MO queries. Moreover, there is a lack of publicly available MO-labeled datasets with workloads for training and benchmarking. To tackle these challenges, we propose SafeLoad, the first query admission control framework specifically designed to identify MO queries. Alongside, we release SafeBench, an open-source, industrial-scale benchmark for this task, which includes 150 million real queries. SafeLoad first filters out memory-safe queries using the interpretable discriminative rule. It then applies a hybrid architecture that integrates both a global model and cluster-level models, supplemented by a misprediction correction module to identify MO queries. Additionally, a self-tuning quota management mechanism dynamically adjusts prediction quotas per cluster to improve precision. Experimental results show that SafeLoad achieves state-of-the-art prediction performance with low online and offline time overhead. Specifically, SafeLoad improves precision by up to 66% over the best baseline and reduces wasted CPU time by up to 8.09x compared to scenarios without SafeLoad.
研究动机与目标
- 将内存超载作为云数据仓库中的关键资源耗尽问题进行处理。
- 提供具有高精度、可解释性、效率、并且可适应检测内存超载查询(MO 查询)的进入控制框架。
- 发布一个包含真实工作负载数据的开源工业规模基准(SafeBench),用于训练和基准 MO 查询检测。
提出的方法
- 使用可解释的判别规则筛选内存安全查询。
- 采用全球模型与集群层级模型相结合的混合架构。
- 引入误判修正模块以细化 MO 查询识别。
- 应用自调节配额管理机制,动态调整每个集群的预测配额。
- 在保持高预测精度的同时,追求较低的在线和离线开销。
实验结果
研究问题
- RQ1基于查询特征和工作负载上下文,MO 查询在云数据仓库中可以多大程度上被识别?
- RQ2全局+集群混合建模方法是否比基线方案在 MO 查询检测上有提升?
- RQ3自调节配额机制是否能在跨集群的精度和资源利用率方面带来改进?
- RQ4SafeLoad 对浪费的 CPU 时间和整体系统效率有何影响?
- RQ5在像 SafeBench 这样的大规模带标签 MO 查询基准上,SafeLoad 的表现如何?
主要发现
- SafeLoad 以低在线和离线时间开销实现了最先进的预测性能。
- 相比最佳基线,精度提升最高可达 66%。
- 相比没有 SafeLoad 的场景,浪费的 CPU 时间最多减少 8.09 倍。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。