Skip to main content
QUICK REVIEW

[Paper Review] Learning Influence Functions from Incomplete Observations

Xinran He, Ke Xu|arXiv (Cornell University)|Nov 7, 2016
Bayesian Modeling and Causal Inference19 citations
TL;DR

This paper proposes a framework for learning influence functions in social networks from incomplete cascade observations, where node activations are randomly missing. By modeling missing data as a transformed graph and using reachability-based feature parametrization with a modified loss function, the method achieves proper and improper PAC learnability for DIC, DLT, and CIC diffusion models, reducing estimation error by nearly 20% on real-world data despite significant missingness.

ABSTRACT

We study the problem of learning influence functions under incomplete observations of node activations. Incomplete observations are a major concern as most (online and real-world) social networks are not fully observable. We establish both proper and improper PAC learnability of influence functions under randomly missing observations. Proper PAC learnability under the Discrete-Time Linear Threshold (DLT) and Discrete-Time Independent Cascade (DIC) models is established by reducing incomplete observations to complete observations in a modified graph. Our improper PAC learnability result applies for the DLT and DIC models as well as the Continuous-Time Independent Cascade (CIC) model. It is based on a parametrization in terms of reachability features, and also gives rise to an efficient and practical heuristic. Experiments on synthetic and real-world datasets demonstrate the ability of our method to compensate even for a fairly large fraction of missing observations.

Motivation & Objective

  • Address the challenge of learning influence functions when node activations in cascades are incompletely observed, a common issue in real-world social networks.
  • Establish theoretical PAC learnability of influence functions under incomplete observations for widely used diffusion models like DIC and DLT.
  • Design an efficient, practical learning algorithm that compensates for missing activation data without requiring complete observations.
  • Extend theoretical guarantees to continuous-time models (CIC) and robustness to uncertainty in retention rates.
  • Demonstrate empirical effectiveness on synthetic and real-world datasets, showing significant improvement over baseline methods.

Proposed method

  • Model incomplete observations as complete observations in a modified graph, where edge weights are adjusted based on the retention rate r to preserve expected influence propagation.
  • Use a reachability feature-based parametrization of influence functions, inspired by Du et al. [3], to represent influence as a function of reachable nodes from seed sets.
  • Optimize a modified loss function based on Natarajan et al. [17] that accounts for missing activations by downweighting or adjusting contributions from unobserved nodes.
  • Prove proper PAC learnability for DIC and DLT models by reducing incomplete observation learning to complete observation learning in a transformed graph.
  • Establish improper PAC learnability for DIC, DLT, and CIC models via the reachability feature approach, even when marginalizing over hidden variables is computationally infeasible.
  • Provide sample complexity bounds that scale moderately with the inverse of the retention rate r, showing that missing data only moderately increases required sample size.

Experimental results

Research questions

  • RQ1Can influence functions be properly PAC-learned under incomplete observations where node activations are randomly missing?
  • RQ2Does the sample complexity of learning influence functions increase significantly when observations are incomplete?
  • RQ3Can an efficient, practical learning algorithm be designed for influence function learning under incomplete observations, especially for continuous-time models like CIC?
  • RQ4How robust is the method to uncertainty in the retention rate r, especially when r is not known exactly?
  • RQ5To what extent can the method compensate for large fractions of missing activations in real-world cascades?

Key findings

  • The paper establishes proper PAC learnability of influence functions under the DIC and DLT models with sample complexity ˜O(¯r²n³m/ε²), showing that incomplete observations only moderately increase required sample size.
  • For the CIC model, improper PAC learnability is achieved via reachability feature parametrization and a modified loss function, extending theoretical guarantees beyond discrete-time models.
  • The method reduces estimation error by nearly 20% compared to the best baseline on the MemeTracker real-world dataset, even with substantial missing data.
  • The approach remains robust to misestimation of the retention rate r, with performance stable under moderate uncertainty (e.g., η ≤ 0.2).
  • Theoretical results extend to cases where the true retention rate lies within a known interval I = [¯r(1−η), ¯r(1+η)], with an additive error term depending on η, which remains small for small uncertainty.
  • Empirical results confirm that performance is not significantly degraded when the true retention rate per node is independently perturbed around the estimated mean rate.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.