QUICK REVIEW

[论文解读] Password Cracking: The Effect of Bias on the Average Guesswork of Hash Functions.

Yair Yona, Suhas Diggavi|arXiv (Cornell University)|Aug 6, 2016

Advanced Malware Detection Techniques参考文献 26被引用 1

一句话总结

本文分析了密码偏倚与哈希函数设计对密码破解平均猜测次数的影响，表明偏倚密码和自适应哈希函数会显著增加猜测次数。本文推导了离线与在线攻击下猜测次数增长的紧致界，揭示用户数量的影响远大于密码偏倚的影响，并提出一种后门机制，在不增加猜测次数的前提下增强安全性。

ABSTRACT

In this work we analyze the average guesswork for the problem of hashed passwords cracking (i.e., finding a password that has the same hash value as the actual password). We focus on the following two cases: Averaging over all strategies of guessing passwords one by one for any hash function that has effective distribution (i.e., the fraction of mappings to any bin) which is i.i.d. Bernoulli(p), and averaging over all hash functions whose effective distribution is i.i.d. Bernoulli(p) for any strategy of guessing passwords one by one. For the case where the hash function is adaptively modified based on the passwords of the users we first find the average guesswork across users when the number of bins is 2 and the number of users equals ⌊2H(s)·m−1⌋, where 1/2 ≤ s ≤ 1 and m ≫ 1. It turns out that the average guesswork increases (as a function of m) at rate that is equal to H (s)+D (s||p) when (1− p) ≤ s ≤ 1, and 2 ·H (p)+D (1− p||p)−H (s) when 1/2 ≤ s ≤ (1− p). We then show that the average guesswork of guessing a password that is mapped to any of the assigned bins (an offline attack) grows like 2. We also analyze the effect of choosing biased passwords on the average guesswork and characterize the region in which the average guesswork is dominated by the guesswork of a password as well as the region in which the average guesswork is dominated by the above results. Moreover, we provide a concentration result that shows that the probability mass function of the guesswork is concentrated around its mean value. We also analyze the more prevalent case in which hash functions can not be modified based on the passwords of the users (i.e., users are mapped to bins randomly). We derive a lower and an upper bounds for the average guesswork both under offline and online attacks and show that the rate at which it increases under offline attacks is upper bounded by D (s||p), and lower bounded by D (1− s||p) when 1− p ≤ s ≤ 1 as well as 0 for 1/2 ≤ s ≤ 1− p, whereas under an online attack the rate is upper bounded by H (s) + D (s||p) when (1− p) ≤ s ≤ 1, and 2 · H (p) + D (1− p||p) − H (s) when 1/2 ≤ s ≤ (1− p), and lower bounded by H (s) + D (1− s||p). In addition, we show that the most likely average guesswork when passwords are drawn uniformly increases at rate H (p) − H (s) under an offline attack and at rate H (p) when cracking the password of any user. These results give quantifiable bounds for the effect of bias as well as the number of users on the average guesswork of a hash function, and show that increasing the number of users has a far worse effect than bias in terms of the average guesswork. Furthermore, we show that under online attacks the average guesswork is upper bounded by H (s) +D (s||p) when (1− p) ≤ s ≤ 1, and 2 · H (p) + D (1− p||p) − H (s) when 1/2 ≤ s ≤ (1− p), and lower bounded by H (s) +D (1− s||p) For keyed hash functions (i.e., strongly universal sets of hash functions) we show that when the number of users is ⌊2m−1⌋ and the hash function is adaptively modified based on the passwords of the users, the size of a uniform key required to achieve an average guesswork 2, α > 1, is α times larger than the size of a key that is drawn Bernoulli(p0) that achieves the same average guesswork, where p0 satisfies the equality 1+D (1/2||p0) = α. Finally, we present a “backdoor” procedure that enables to modify a hash function efficiently without compromising the average guesswork. This work relies on the observation that when the mappings (or the key) of a hash function are biased, and the passwords of the users are mapped to the least likely bins, the average guesswork increases significantly.

研究动机与目标

量化密码偏倚与哈希函数设计对密码破解平均猜测次数的影响。
建模自适应哈希函数（基于用户密码进行修改）在不同用户数量下的猜测次数影响。
为离线与在线攻击建立平均猜测次数的理论上下界，区分均匀与偏倚密码分布的情况。
探讨带密钥哈希函数（强通用集合）在控制猜测次数中的作用，并推导实现目标安全级别所需的密钥大小。
设计一种后门机制，在不增加密钥大小或损害安全性的前提下，通过将密码映射到最不可能的桶中来提高猜测次数。

提出的方法

使用独立同分布的伯努利(p)分布来建模有效哈希函数的行为，并分析不同策略下的平均猜测次数。
应用信息论工具，包括相对熵 D(·||·) 和熵 H(·)，推导猜测次数增长率的上下界。
分析两种攻击模型：离线攻击（攻击者可完全访问哈希值）与在线攻击（攻击者逐次查询一个密码）。
推导集中结果，表明猜测次数的概率质量函数在均值附近高度集中。
引入一种“后门”过程，通过将密码映射到最不可能的桶中，高效地修改哈希函数，从而在不改变密钥大小的情况下提高猜测次数。
使用均匀与偏倚密钥分布，比较实现目标猜测次数水平所需的密钥大小，表明所需密钥大小存在一个乘法因子 α 以实现期望的安全性。

实验结果

研究问题

RQ1在离线与在线攻击模型下，密码偏倚如何影响密码破解的平均猜测次数？
RQ2当用户数量增长时，平均猜测次数的增加速率是多少？其与密码偏倚的影响相比如何？
RQ3基于用户密码自适应修改的哈希函数如何影响平均猜测次数，特别是在用户被映射到最不可能的桶时？
RQ4在不同攻击模型下，带密钥哈希函数的平均猜测次数的紧致上下界是什么？
RQ5能否设计一种后门机制，在不增加密钥大小或损害安全性的前提下，提高猜测次数？

主要发现

在离线攻击下，当 1−p ≤ s ≤ 1 时，平均猜测次数的增长率上界为 D(s||p)，下界为 D(1−s||p)；当 1/2 ≤ s ≤ 1−p 时，下界为 0。
在在线攻击下，当 1−p ≤ s ≤ 1 时，平均猜测次数的增长率上界为 H(s) + D(s||p)；当 1/2 ≤ s ≤ 1−p 时，上界为 2·H(p) + D(1−p||p) − H(s)。
对于均匀抽取的密码，最可能的平均猜测次数在离线攻击下的增长速率为 H(p) − H(s)，在破解任意用户密码时的增长速率为 H(p)。
当用户数量为 ⌊2H(s)·m−1⌋ 时，平均猜测次数的增长速率在 1−p ≤ s ≤ 1 时为 H(s) + D(s||p)，在 1/2 ≤ s ≤ 1−p 时为 2·H(p) + D(1−p||p) − H(s)。
对于拥有 ⌊2m−1⌋ 个用户的带密钥哈希函数，若密钥来自伯努利(p₀)分布且满足 1 + D(1/2||p₀) = α，则其平均猜测次数与大小为 α 倍的均匀密钥相同。
后门过程通过将密码映射到最不可能的桶中，高效地提高了猜测次数，利用了哈希映射中的偏倚，而无需改变密钥或增加猜测次数。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。