Submissions to Scholars Junction will be closed starting Monday, December 21, as we begin migrating to a new platform.

    • Login
    View Item  
    •   Scholars Junction
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Scholars Junction
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Search

    My Account

    Login Register

    About

    About This Repository Deposit Your Work Policies and Terms of Use Contact Us More Scholarly Communication Services

    Browse

    Entire Repository Communities & Collections Issue Date Authors Titles Subjects This Collection Issue Date Authors Titles Subjects

    Linear Dynamic Model for Continuous Speech Recognition

    View/ Open
    etd-12072010-130251.pdf (2.329 Mb )
    Author
    Ma, Tao
    Item Type
    Dissertation
    Advisor
    Prasad, Saurabh
    Committee
    Fowler, James E.
    Baca, Julie
    Zhang, Haimeng
    Metrics
    
    Abstract
    In the past decades, statistics-based hidden Markov models (HMMs) have become the predominant approach to speech recognition. Under this framework, the speech signal is modeled as a piecewise stationary signal (typically over an interval of 10 milliseconds). Speech features are assumed to be temporally uncorrelated. While these simplifications have enabled tremendous advances in speech processing systems, for the past several years progress on the core statistical models has stagnated. Since machine performance still significantly lags human performance, especially in noisy environments, researchers have been looking beyond the traditional HMM approach. Recent theoretical and experimental studies suggest that exploiting frame-to-frame correlations in a speech signal further improves the performance of ASR systems. This is typically accomplished by developing an acoustic model which includes higher order statistics or trajectories. Linear Dynamic Models (LDMs) have generated significant interest in recent years due to their ability to model higher order statistics. LDMs use a state space-like formulation that explicitly models the evolution of hidden states using an autoregressive process. This smoothed trajectory model allows the system to better track the speech dynamics in noisy environments. In this dissertation, we develop a hybrid HMM/LDM speech recognizer that effectively integrates these two powerful technologies. This hybrid system is capable of handling large recognition tasks, is robust to noise-corrupted speech data and mitigates the ill-effects of mismatched training and evaluation conditions. This two-pass system leverages the temporal modeling and N-best list generation capabilities of the traditional HMM architecture in a first pass analysis. In the second pass, candidate sentence hypotheses are re-ranked using a phone-based LDM model. The Wall Street Journal (WSJ0) derived Aurora-4 large vocabulary corpus was chosen as the training and evaluation dataset. This corpus is a well-established LVCSR benchmark with six different noisy conditions. The implementation and evaluation of the proposed hybrid HMM/LDM speech recognizer is the major contribution of this dissertation.
    Degree
    Doctor of Philosophy
    Major
    Computer Engineering
    College
    Bagley College of Engineering
    Department
    Department of Electrical and Computer Engineering
    URI
    https://hdl.handle.net/11668/19306
    Collections
    • Theses and Dissertations
    Show full item record
    Mississippi State University Libraries
    395 Hardy Rd
    P.O. Box 5408, Mississippi State, MS 39762-5408
    (662) 325-7668
    (662) 325-0011
    (662) 325-8183
    Contact repository admin Report a problem Terms of use Privacy policy Accessibility MSU Legal
     

     

    Mississippi State University Libraries
    395 Hardy Rd
    P.O. Box 5408, Mississippi State, MS 39762-5408
    (662) 325-7668
    (662) 325-0011
    (662) 325-8183
    Contact repository admin Report a problem Terms of use Privacy policy Accessibility MSU Legal