QUICK REVIEW

[Paper Review] Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN

Yajie Miao|arXiv (Cornell University)|Jan 27, 2014

Speech Recognition and Synthesis12 references64 citations

TL;DR

This paper presents open-source recipes for building end-to-end deep neural network (DNN)-based automatic speech recognition (ASR) systems using the Kaldi toolkit and PDNN, a lightweight deep learning library built on Theano. It enables deployment of DNN hybrid, CNN, and bottleneck feature systems with reproducible results on the Switchboard 110-hour corpus, offering a flexible and extensible framework for adapting to new datasets.

ABSTRACT

The Kaldi toolkit is becoming popular for constructing automated speech recognition (ASR) systems. Meanwhile, in recent years, deep neural networks (DNNs) have shown state-of-the-art performance on various ASR tasks. This document describes our open-source recipes to implement fully-fledged DNN acoustic modeling using Kaldi and PDNN. PDNN is a lightweight deep learning toolkit developed under the Theano environment. Using these recipes, we can build up multiple systems including DNN hybrid systems, convolutional neural network (CNN) systems and bottleneck feature systems. These recipes are directly based on the Kaldi Switchboard 110-hour setup. However, adapting them to new datasets is easy to achieve.

Motivation & Objective

To streamline the development of DNN-based ASR systems by combining Kaldi’s robust ASR pipeline with PDNN’s deep learning capabilities.
To provide reusable, open-source recipes for training DNN acoustic models using Kaldi and PDNN on standard benchmarks.
To enable researchers and practitioners to easily adapt the system to new datasets beyond the Switchboard 110-hour setup.
To demonstrate the effectiveness of multiple DNN architectures—including hybrid, CNN, and bottleneck feature systems—within a unified framework.

Proposed method

Leverages the Kaldi ASR toolkit as the core pipeline for feature extraction, decoding, and system training.
Integrates PDNN, a lightweight deep learning library built on Theano, to implement DNN acoustic models.
Uses the Switchboard 110-hour dataset as the base training setup for all recipes.
Supports multiple model types: DNN hybrid systems, convolutional neural networks (CNNs), and bottleneck feature-based systems.
Employs standard deep learning components such as rectified linear units (ReLUs), dropout regularization, and mini-batch stochastic gradient descent.
Provides modular, script-based recipes that allow easy adaptation to new datasets through parameterized configuration files.

Experimental results

Research questions

RQ1Can a unified framework combining Kaldi and PDNN effectively support diverse DNN architectures in ASR?
RQ2How well do DNN-based systems built with Kaldi+PDNN perform on standard benchmarks like Switchboard 110 hours?
RQ3To what extent can the provided recipes be generalized and adapted to new datasets beyond the original setup?
RQ4What is the performance gain of using CNNs or bottleneck features compared to standard DNN hybrids in this framework?

Key findings

The Kaldi+PDNN framework successfully implements multiple DNN-based ASR systems, including DNN hybrids, CNNs, and bottleneck feature systems, with consistent performance.
The recipes are directly based on the Kaldi Switchboard 110-hour setup, enabling reproducible and comparable results across different model types.
The system demonstrates that PDNN can be effectively integrated into Kaldi for training complex DNN models with minimal overhead.
Adapting the recipes to new datasets is straightforward due to the modular and parameterized design of the training scripts.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.