QUICK REVIEW

[Paper Review] Oracle inequalities for the Lasso in the high-dimensional multiplicative Aalen intensity model

Sarah Lemler|arXiv (Cornell University)|Jun 25, 2012

Statistical Methods and Inference1 references5 citations

TL;DR

This paper proposes a data-driven weighted Lasso procedure to estimate the conditional intensity in a high-dimensional multiplicative Aalen model, using two dictionaries for baseline hazard and relative risk approximation. It establishes non-asymptotic oracle inequalities in terms of empirical Kullback divergence, leveraging martingale empirical Bernstein inequalities and modified self-concordant functions.

ABSTRACT

In a general counting process setting, we consider the problem of obtaining a prognostic on the survival time adjusted on covariates in high-dimension. Towards this end, we construct an estimator of the whole conditional intensity. We estimate it by the best Cox proportional hazards model given two dictionaries of functions. The first dictionary is used to construct an approximation of the logarithm of the baseline hazard function and the second to approximate the relative risk. We introduce a new data-driven weighted Lasso procedure to estimate the unknown parameters of the best Cox model approximating the intensity. We provide non-asymptotic oracle inequalities for our procedure in terms of an appropriate empirical Kullback divergence. Our results rely on an empirical Bernstein's inequality for martingales with jumps and properties of modified self-concordant functions.

Motivation & Objective

To develop a robust estimator of the conditional intensity in high-dimensional survival data under a counting process framework.
To address the challenge of variable selection and estimation in high-dimensional covariate-adjusted survival models.
To construct a data-driven Lasso procedure that adapts to the underlying sparsity of the intensity model.
To establish non-asymptotic theoretical guarantees for the proposed estimator using empirical divergence measures.
To extend existing results in survival analysis by incorporating modified self-concordant functions and martingale concentration inequalities.

Proposed method

The method constructs an estimator of the conditional intensity by selecting the best Cox proportional hazards model from two dictionaries: one for the log-baseline hazard and one for the relative risk.
A novel data-driven weighted Lasso procedure is introduced to estimate the unknown parameters of the best approximating Cox model.
The procedure is analyzed using an empirical Bernstein inequality tailored for martingales with jumps, ensuring robust concentration bounds.
Theoretical analysis relies on properties of modified self-concordant functions to control the complexity of the estimation problem.
The estimation error is bounded in terms of an empirical Kullback divergence, providing a non-asymptotic performance guarantee.
The approach combines functional approximation with high-dimensional regularization to achieve optimal estimation in sparse settings.

Experimental results

Research questions

RQ1How can we consistently estimate the conditional intensity in a high-dimensional multiplicative Aalen model with covariates?
RQ2What is the optimal rate of convergence for a Lasso-based estimator in this nonparametric survival model?
RQ3Can we derive non-asymptotic oracle inequalities for a data-driven Lasso procedure in the context of counting processes?
RQ4How do martingale concentration inequalities with jumps contribute to the theoretical analysis of survival models?
RQ5What role do modified self-concordant functions play in controlling the estimation error in high-dimensional intensity models?

Key findings

The proposed weighted Lasso procedure achieves non-asymptotic oracle inequalities in terms of empirical Kullback divergence, ensuring optimal estimation performance.
The method provides theoretical guarantees that hold without requiring sub-Gaussian or bounded error assumptions, relying instead on martingale concentration.
The use of modified self-concordant functions enables tighter control over the complexity of the model space in high-dimensional settings.
The empirical Bernstein inequality for jump-martingales is instrumental in deriving sharp deviation bounds for the estimation error.
The data-driven weighting scheme enhances adaptivity, allowing the procedure to perform well under unknown sparsity levels.
The results are non-asymptotic and hold uniformly over the model space, making them applicable in finite-sample settings.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.