|
IJCNN 2000 - Techniques for Combining Hidden Markov Models and Neural Networks for Speech Recognition: A Tutorial |
Speaker: Edmondo Trentin (trentin@fbk.eu)
Abstract
Hidden Markov models (HMM) represent the state-of-the-art approach
to Automatic Speech Recognition (ASR). HMMs are effective in laboratory
tests, but their applicability in real world environments is often constrained
by intrinsic limitations of the models, e.g. non-discriminative training,
a-priori assumptions on their underlying statistical properties, requirement
of a pre-defined feature space, etc. In this respect, Artificial Neural
Networks (ANN) are a promising alternative. Applied to ASR throughout a
decade, ANNs yielded interesting performance on reduced-scale tasks, but
they substantially failed in dealing with long time-sequences of speech
signals, due to the difficulty of modeling long-term time dependencies
with "conventional" ANNs. To overcome such problems, hybrid architectures
were proposed, combining HMMs and ANNs within unifying frameworks, exploiting
the advantages of both. Radically different techniques were introduced,
according to the specific role that the ANNs had to play within the hybrid
architecture. This tutorial reviews some basic concepts of ASR, HMMs and
conventional ANNs for ASR. Major HMM/ANN models for ASR are then surveyed,
discussing several architectures, training algorithms and experimental
results from literature and from our experience. Main classes of combined
HMM/ANN systems include: (1) connectionist emulation of HMMs; (2) connectionist
probability estimation for HMMs; (3) ANNs as acoustic front-ends for HMMs;
(4) connectionist feature extraction with joint HMM/ANN optimization; (5)
vector quantization for discrete HMMs via ANNs; (6) ANNs for "rescoring"
the N-best HMM hypothesis.
Outline of the Tutorial Technical Content
1. Introduction and overview
2. The ASR problem:
2.1 Qualitative definition of the problem
2.2 Application-oriented examples and open questions (review
of
basic concepts like speaker (in)dependence,
continuous speech
vs. isolated words, vocabulary
size, noise tolerance, etc.)
2.3 Formal definition as a classification
problem in terms of Bayes' decision
theory
2.4 Feature extraction
2.4.1 Example: Mel Frequency Scaled
Cepstral Coefficients
3. Acoustic modeling via HMM:
3.1 Informal introduction to HMMs
3.2 Formal definition of HMM (states, transitions, emission
and initial probabilities, etc.)
3.3 Discrete vs. Continue-density HMMs
3.4 HMMs: the "training" and "decoding" problems
(solution to these problems based
on the Baum-Welch and on
the Viterbi algorithms is summarized,
relying on the "Trellis"
structure)
3.5 Intrinsic limitations of HMMs (non-discriminative
training/decoding, maximum likelihood
criterion, fixed form of the
emission probability densities,
stochastic independence among acoustic
frames, markovian assumption on
the stochastic process involved,
requirement of a pre-defined feature
space, etc.).
4. Brief review of "conventional" ANNs for ASR
4.1 ANNS as labeled graphs
4.2 Learning as optimization of a criterion; generalization
4.3 Summary of major connectionist architectures for ASR
4.4 The problem of dealing with long-term time dependencies
in conventional ANNs
5. Combining HMMs and ANNs
5.1 Motivations and basic ideas
5.2 Classes of HMM/ANN hybrid systems for ASR:
5.2.1 ANNs that emulate HMMs (in
a historical perspective, we
start by reviewing the Viterbi Net and the Alpha Net, two
recurrent architectures that attempted to emulate simple
left-to-right HMMs for isolated words recognition).
5.2.2 connectionist probability
estimation for HMMs (basically
Bourlard and Morgan's approach, where MLPs are used to
estimate the posterior probability of HMM states instead
of the usual Gaussian emission probabilities; variants on
this approach are also discussed).
5.2.3 ANNs as acoustic front-ends
for HMMs (in speaker
normalization and channel compensation, ANNs are trained
to perform a transformation of the feature vectors to be
fed into the HMM; particular attention is paid to a spectral
mapping approach based on a mixture of recurrent ANNs).
5.2.4 connectionist feature extraction
with joint HMM/ANN
optimization (basically Y. Bengio's approach, where the
ANN is used as a feature extractor for a HMM, but both
models are jointly trained on a global optimization criterion;
a possible extension of this important, novel algorithm to
Bourlard's model is introduced, too).
5.2.5 vector quantization for
discrete HMMs via ANNs
(unsupervised, e.g. competitive, ANNs are used to discretize
the acoustic space in order to obtain a finite codebook
of prototypes for discrete HMMs).
5.2.6 Other approaches (other,
non-homogeneous architectures
are briefly reviewed, in particular ANNs for "rescoring"
the N-best hypothesis yielded by a standard HMM)
6. Conclusions
6.1 Summary of the tutorial
6.2 Emphasis on major topics
6.3 Some guidelines for future research
6.4 Conclusions
Schedule
This Tutorial was selected for being part of the Preliminary Technical
Program of IJCNN 2000, to be held in Como (Italy) from 24th to 27th of
July, 2000. It is scheduled as Tutorial #5 on Saturday Afternoon, 22 July
2000. It will take about four hours, including breaks and questions. Please
refer to the IJCNN
2000 Official Site (click "Technical program" on the left side of that
page) for up-to-date details and news.
Registration
To attend the Tutorial you previously need a regular registration for
the Conference. In addition, a specific registration for each Tutorial
is necessary. You can register to any tutorial you like by using the conference
registration form. The Tutorial will be held only if a minimum number of
registered attendees will be reached by 1 July, 2000. Please refer to the
IJCNN
2000 Registration page for detailed information, deadlines, fees and
student grants (on the same page you will also find tourist infos and Hotel
reservation forms), as well as for your actual registration.
I am organizing a Special
Session on Hybrid Systems for Automatic Speech Recognition at IJCNN
2000. View the session program.
Back
to Edmondo
Trentin's Home Page.