QUICK REVIEW

[Paper Review] Agnostic Lane Detection

Yuenan Hou|arXiv (Cornell University)|May 2, 2019

Autonomous Vehicle Technology and Safety13 references21 citations

TL;DR

This paper proposes an agnostic lane detection framework that treats lane detection as an instance segmentation task, enabling robust performance across variable lane counts and lane-changing scenarios. By combining multi-task learning (drivable area and lane point regression), feature pyramids, and a lightweight ENet backbone, the method achieves real-time inference with state-of-the-art efficiency and competitive accuracy on TuSimple and CULane benchmarks.

ABSTRACT

Lane detection is an important yet challenging task in autonomous driving, which is affected by many factors, e.g., light conditions, occlusions caused by other vehicles, irrelevant markings on the road and the inherent long and thin property of lanes. Conventional methods typically treat lane detection as a semantic segmentation task, which assigns a class label to each pixel of the image. This formulation heavily depends on the assumption that the number of lanes is pre-defined and fixed and no lane changing occurs, which does not always hold. To make the lane detection model applicable to an arbitrary number of lanes and lane changing scenarios, we adopt an instance segmentation approach, which first differentiates lanes and background and then classify each lane pixel into each lane instance. Besides, a multi-task learning paradigm is utilized to better exploit the structural information and the feature pyramid architecture is used to detect extremely thin lanes. Three popular lane detection benchmarks, i.e., TuSimple, CULane and BDD100K, are used to validate the effectiveness of our proposed algorithm.

Motivation & Objective

To address the limitations of conventional lane detection methods that assume a fixed number of lanes and fail during lane changes.
To improve generalization and robustness under challenging conditions such as occlusions, poor lighting, and ambiguous markings.
To achieve real-time inference by leveraging a lightweight network backbone (ENet) and efficient architecture design.
To exploit structural and contextual information from drivable areas and vanishing points through multi-task learning.
To enable detection of extremely thin lanes using a feature pyramid architecture.

Proposed method

The method decomposes lane detection into two sub-tasks: binary segmentation (lane vs. background) and instance classification (assigning each lane pixel to a unique lane instance).
A multi-task learning paradigm integrates three heads: binary segmentation, drivable area detection, and lane point regression to enhance structural awareness.
The feature pyramid network (FPN) is used to detect thin lanes by fusing multi-scale features from the backbone.
The ENet backbone is adopted for real-time performance, minimizing model parameters and inference time.
Instance-level predictions are generated via pixel embeddings followed by clustering to group pixels into distinct lane instances.
Loss functions combine binary cross-entropy for segmentation, focal loss for instance classification, and smooth L1 loss for point regression.

Experimental results

Research questions

RQ1Can lane detection be made robust to variable numbers of lanes and lane-changing scenarios by moving beyond fixed-class semantic segmentation?
RQ2How can structural and contextual cues from drivable areas and vanishing points improve lane detection performance?
RQ3Can a lightweight network like ENet achieve real-time inference while maintaining high accuracy on complex urban road scenarios?
RQ4To what extent does feature pyramid architecture improve detection of thin or fragmented lane markings?
RQ5Does multi-task learning with joint supervision from drivable areas and lane points enhance generalization and robustness?

Key findings

On the TuSimple benchmark, the proposed ENet-based model achieved 96.29% accuracy, comparable to more complex models like SCNN (96.53%) despite having only 0.98M parameters.
On CULane, the model achieved an F1-measure of 68.8% overall, outperforming ResNet-101 (70.8%) and SCNN (71.6%) in terms of parameter efficiency and inference speed.
The model demonstrated superior inference efficiency, with a running time of 13.4ms on CULane, significantly faster than SCNN (133.5ms) and ResNet-101 (171.2ms).
The model achieved strong performance in challenging categories such as night (61.4%), shadows (63.4%), and no-line conditions (42.9%), indicating robustness to visual degradation.
The ablation study confirmed that multi-task learning with drivable area and point regression improved performance across all CULane categories.
The feature pyramid architecture contributed to better detection of thin lanes, particularly in crowded and complex scenes.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.