[Paper Review] Log-normal statistics in e-mail communication patterns
This study reanalyzes e-mail communication patterns in a university population of 3,188 users, demonstrating that interevent times follow a single log-normal distribution and waiting times are best described by a superposition of two log-normals. Using Bayesian model selection, the authors reject truncated power-law fits with exponent ≈1 and critique Barabási's priority queuing model as theoretically implausible, concluding that a robust theoretical description of human e-mail dynamics remains an open problem.
Following up on Barabasi's recent letter to Nature [435, 207--211 (2005)], we systematically investigate the time series of e-mail usage for 3,188 users at a university. We focus on two quantities for each user: the time interval between consecutively sent e-mails (interevent time), and the time interval between when a user sends an e-mail and when a recipient sends an e-mail back to the original sender (waiting time). We perform a standard Bayesian model selection analysis that demonstrates that the interevent times are well-described by a single log-normal while the waiting times are better described by the superposition of two log-normals. Our analysis rejects the possibility that either measure could be described by truncated power-law distributions with exponent $α\simeq 1$. We also critically evaluate the priority queuing model proposed by Barabási to describe the distribution of the waiting times. We show that neither the assumptions nor the predictions of the model are plausible, and conclude that a theoretical description of human e-mail communication patterns remains an open problem.
Motivation & Objective
- To re-express and re-analyze e-mail communication patterns using rigorous statistical methods.
- To test whether interevent times and waiting times follow truncated power-law distributions with exponent ≈1, as claimed by Barabási.
- To evaluate the plausibility of Barabási's priority queuing model in explaining waiting time distributions.
- To determine whether log-normal distributions provide a better fit to empirical e-mail data than power-law models.
- To identify the fundamental limitations in current theoretical models of human communication dynamics.
Proposed method
- Applied Bayesian model selection to compare the fit of log-normal and truncated power-law distributions to interevent and waiting time data.
- Preprocessed e-mail data by merging e-mails sent within 5 seconds and with identical sizes to eliminate client- and server-side artifacts.
- Used maximum likelihood estimation to fit single and double log-normal distributions to the waiting time data.
- Evaluated the priority queuing model by simulating its transient and steady-state behavior and comparing predictions to empirical data.
- Performed statistical hypothesis testing to reject the power-law model with exponent ≈1 in favor of the double log-normal model.
- Analyzed the model's assumptions and predictions, particularly the convergence of queue priorities to a uniform distribution, to assess theoretical plausibility.
Experimental results
Research questions
- RQ1Do interevent times in e-mail communication follow a truncated power-law distribution with exponent ≈1, as previously claimed?
- RQ2Can the distribution of waiting times between e-mail replies be better described by a single log-normal or a superposition of two log-normals?
- RQ3Is Barabási's priority queuing model capable of reproducing the empirical distribution of waiting times in e-mail communication?
- RQ4Are the assumptions and long-term behavior of the priority queuing model consistent with observed human e-mail behavior?
- RQ5What statistical model provides the best fit to empirical e-mail communication patterns, and what does this imply for theoretical modeling of human dynamics?
Key findings
- The interevent times for e-mail users are best described by a single log-normal distribution, not a truncated power-law with exponent ≈1.
- The waiting time distribution is significantly better fit by a superposition of two log-normal distributions than by any power-law model.
- The empirical data reject the hypothesis of a power-law tail with exponent ≈1 beyond any reasonable doubt, based on Bayesian model comparison.
- The priority queuing model proposed by Barabási fails to reproduce the empirical data, as its steady-state behavior predicts a peak at τw = 1 that dominates the distribution, contradicting real e-mail behavior.
- The model's assumptions lead to implausible dynamics, such as queues converging to uniform priority distributions and immediate task execution with high probability, which do not reflect actual human communication patterns.
- The findings indicate that a theoretical description of human e-mail communication dynamics remains an open problem, as current models fail to capture the empirical data accurately.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.