p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica}
span.s1 {color: #002486}

 Speech recognition is the translation, through
some methodologies, of human speech into text
by computers. In this research review we examine
three di erent methods that are used in
speech recognition field and we investigate the
accuracy they succeed in di erent data sets. We
analyze the state-of-art deep neural networks, that
have evolved into complex architectures and they
achieve significant results in many cases. Afterward,
we explain convolutional neural networks
and we explore their dynamic in this field. Finally,
we present the recent research in highway
deep neural networks that seem to be more flexible
for resource constrained platforms. Overall,
we critically try to compare these methods and
show their strengths and limitations. We conclude
that each method has its advantages but
also has its weaknesses and we use them for different
purposes.
I. Introduction
 Machine Learning (ML) is a field of computer science that
gives the computers the ability to learn through di erent
algorithms and techniques without being programmed. Automatic
speech recognition (ASR) is closely related with
ML because it uses methodologies and procedures of ML
1 , 2 , 3 . ASR has been around for decades but it was not
until recently that there was a tremendous development
because of the advances in both machine learning methods
and computer hardware. New ML techniques made
speech recognition accurate enough to be useful outside
of carefully controlled environments and so it could easily
be deployed in many electronic devices nowadays (i.e.
computers, smart-phones).
Speech is the most important mode of communication
between human beings and so from the early part of the
previous century, e orts have been made in order to make
computers do what only humans could perceive. Research
has been conducted through the past five decades and the
main reason was the desire of making tasks automated using
machines 2 . Many motivations from the field of machine
learning and the perspective of probabilistic modeling and
reasoning to the neural a ected the researchers and helped
to advance ASR.
The first single advance in the history of ASR occurred
at the early of 50’s with the introduction of the expectationmaximization
(EM) algorithm for training Hidden Markov
Models (HMMs). The EM technique gave the possibility to
develop the first speech recognition systems using Gaussian
Mixture Models (GMMs). Despite all the advantages of the
GMMs, they are statistically ine cient for modeling data
that lie on or near a nonlinear manifold in the data space.
This problem could be solved by artificial neural networks.
Most speech recognition systems use neural network and
hidden Markov model (NN/ HMM) hybrid architecture, first
investigated in the early 1990s 4 . However computer hardware
did not allow us to train our data with more complex
networks such as deep neural networks (DNNs) until the
early of 2000s. Over the last years the improvement of computer
hardware and the invention of new machine learning
algorithms made possible the training for DNNs. DNNs
with many hidden layers have been showed to outperform
GMMs on a variety of speech recognition benchmarks 5 .
Other more complex neural architectures such as recurrent
neural networks with long short-term memory units
(LSTM-RNNs) and convolutional neural networks (CNNs)
seem to have their benefits and applications.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now