Login with Facebook

Recurrent neural networks (RNN) for speech detection

RNN or recurrent neural network is all about connections along the temporal sequence. It is a class of artificial neural networks that permits the prior outputs to be used as inputs when it has the hidden states. Here, we will discuss how a recurrent neural network is used for speech detection.

Minimization of FER

You need to compare the performance of the different SAD criteria extraction techniques on a FER 2 minimization task. Check the quality of the optimization method by comparing the results obtained with the performance of a baseline DAS system. You can optimize different DSS systems using the functions of cost corresponding to the FER, i.e. (C1, C′1) with a coefficient α = 0.5 (cf. p. 60). The results obtained with and without a smoothing function (that is to say, no addition of buffers, or suppression of segments or short rests) are detailed in table 1.1 and in figure Fig. 1.1, where you can analyze the distribution of errors calculated per test file for each system. Table 1.1 - Performance is measured by FER with and without smoothing function (that is, no addition of buffers, no removal of segments, or short rests). RNN models are less dependent on the smoothing function than other models, and the CG-LSTM model provides the best performance.

Table: 1.1 

You can thus see that the use of the optimization process on targeted data results in much better performance than with a system generalist such as gmmSAD. It can also be noted that ANN-based systems are significantly more efficient than systems based on more traditional SAD criteria (gain of at least 17%). And among neural techniques, RNNs show themselves particularly suitable since, by allowing the temporal context to be used freely, these models are able to also learn a large part of the function smoothing to minimize the FER, which is not the case with MLP. Finally, you can find that the best performing recurring network is the model and which improves the average FER by 2% compared to the standard LSTM model. So you can compare the differences in performance between DSS systems based on CGLSTMs having as input the time signal, the complete log-spectrogram, or the MFCCs with a number of identical neurons. The performance differences are relatively small. These small performance gaps demonstrate the great versatility of CG-LSTMs that can be successfully trained directly on the raw signal despite the very high information redundancy of this signal and the need to exploit a broad temporal context since the audio signal in the framework of the program is sampled at 8kHz or 8000 points per second.

Figure 1.1 - Distributions of the FER calculated by file for the different DSS systems with and without smoothing. RNNs are able to largely learn the smoothing function necessary to minimize the FER. The DSS system producing the highest error rates bottom (leftmost distribution on the graph) is obtained using the CG-LSTM model.

However, it is important to note the difference in processing time, depending on the choice of representation of the entries. Thus, considering the performance as well in processing time than in SAD performance, the MFCCs appear as being the best choice of representation. Table 1.2 -


Table 1.2

You can also study the interest of each of the phases of process optimization to validate the choice of an alternate QPSO use strategy and of SMORMS3 to optimize a DSS system using a neural network. The table Tab. 3.3 presents the results obtained with different optimization processes using either QPSO or SMORMS3 alone or a combination of both. You call back that SMORMS3 is only used to optimize the weights of the neural network and that QPSO is only used to optimize the parameters of the decision function and to smooth when used after SMORMS3. For this test, the DSS system uses the CG-LSTM model to estimate the SAD criterion.

Table 1.3

You can see that the use alone of one or the other of the algorithms already gives very good results, but the alternate use of the two methods can significantly improve performance. In particular, use QPSO to optimize parameters of the MFCC extraction chain then allow the best started with gradient descent optimization. Likewise, you can still win a little by using QPSO after the gradient descent to readjust the parameters of the decision and smoothing functions.

Minimization of WER

The primary goal is being to minimize the WER of a PAR system; you can also determine the values ​​of the coefficient α, making it possible to minimize the WER when uses the cost functions C1 and C2. Thus the table Tab. 3.4 presents the actual WER obtained by the PAR system when segmenting the development game files with a DSS system (CG-LSTM) optimized using the cost function C1 (respectively C2) for different values ​​of the coefficient α. 

Table 3.4 - Impact of the coefficient α on the real WER obtained by the RAP system when the development game files are segmented with an optimized DSS system using the cost function C1 or C2. The real WER obtained using the cost function C3, which has no hyperparameter, is shown for comparison.

Table 3.4 

You can see that the two cost functions do not behave at all in the same way. For C1, the value of the coefficient α, which makes it possible to obtain the best error rate, is 0.2, which is mainly explained by the fact that the RAP inserts a lot of words, and it is, therefore, better to reduce the risk of false alarms even if it means losing a little speech signal. On the contrary, in the case of cost function C2, the segmentation references take into account that the signal causing insertions should be considered noise, so it is a lot more important to minimize the risk of losing the speech signal even if it means some false alarms. This is exactly what we observe since the coefficient α, which gives the best WER, is 0.95. Finally, the cost function C3 makes it possible to obtain a better WER on the development set than the other two cost functions regardless of the value of the coefficient α chosen. The advantage of using a function of cost such as C3 which takes into account the behavior of the PAR system which you want to improve performance and which is as close as possible to the metric aimed. You can also verify the correct correspondence of the cost function C3 with the actual WER obtained on the development game: for each system is mentioned in the table Tab. 3.4 By performing linear regression, one obtains a coefficient of determination of 0.98, which shows the good ability to predict WER by the cost function of imitation -WER C3 (see also the figure Fig. 1.2).

Figure 1.2 - Correspondence between the imitation -WER and the real WER calculated. The coefficient of determination of the regression linear is 0.98, which shows the good prediction capacity of the WER by the function of cost C3. Once the optimal value of the coefficient α was chosen, you can optimize the different DSS systems with the three cost functions then measure the performance obtained on the test set. The table Tab. 3.5 allows us to compare the WER obtained on the test set for all SAD systems. They are also compared to the reference DAS system gmmSAD, which was also used to extract the speech segments to train the PAR system.

You will see that the optimization of DSS systems on representative data with cost functions that come as close as possible to the metric of interest (choice of coefficient α for C1 and C2, and C3 by construction) allows to always reduce the WER when compared to the WER obtained with a segmentation generalist. You will also note that the more the cost function is similar to the metric target, the greater the gain, regardless of the DSS system considered. The cost function C3, which does not have any hyper-parameter to choose/adjust, is required as the cost function of choice for optimizing a DSS system so as to minimize the WER.

On the other hand, as you will observe in the first experiments, the model CG-LSTM was found to be the most efficient of all the DSS systems tested. Indeed, whatever the cost function is used, it is with this model that you obtain the WER the weakest. By coupling C3 with the CG-LSTM model, you will finally obtain a relative gain of 4.5% compared to the WER obtained with the gmmSAD DSS system that was used during the training of the PAR system. Table 3.5 - Impact of the cost function on the WER obtained with the different systems of SAD. The cost function C3 allows achieving the best performance regardless of the DSS system used. And the CGLSTM model achieves the best performance regardless of the cost function used. By coupling the two, you will obtain a relative gain of 4.5% on the WER compared to the general system practitioner used to train the RAP system.

Table 3.5

As shown in table Tab. 3.6, the reduction of the error rate essentially comes from a sharp reduction in the number of insertions (from 6.4% at 3.6%) between the gmmSAD reference system and the CG-LSTM-based system optimized with the cost function C3. Looking in detail at the signal segments that generated insertions but which were rejected as not being a speech by the CG-LSTM-based system, this model is capable of learning to distinguish speech that generates errors from a speech that does not. Table 3.6 - Detailed results of the best tuning for each of the DSS systems. The gains on the WER come mainly from a reduction in the number of insertions.

Table 3.6

As in the case of the FER minimization, you can study the impact of each of the phases of the optimization process on the final metric so as to verify the interest of a strategy of alternate use of QPSO and SMORMS3. The table Tab. 3.7 presents the WER obtained with different optimization processes using either QPSO or SMORMS3 alone or a combination of both. As for the minimization of FER, you can observe that it is interesting to use QPSO to optimize the parameters of the MFCC extraction chain before optimizing the RNN parameters using gradient descent. Likewise, you can further decrease the WER by using QPSO after the gradient descent to readjust the parameters of the decision and smoothing functions. Table 3.7 - Segmentation performance (WER) when you change the optimization process. For this test, the SAD system uses the model CG-LSTM to estimate the criterion of SAD.

Table 3.7


Table 3.8

You will observe that whatever the SAD system used, you manage to reduce the WER of a RAP system when compared to the WER obtained with a system generalist DBMS, even if it was used to build the RAP system. Among the different DSS tested techniques, the hierarchy of systems is the same as that observedspeech detection, and the CG-LSTM model remains the most efficient and enables a gain of more than 1 WER point on each of the languages processed.

Author: Vicki Lezama

Need a custom

We will write it for you.
Order now

Free Essay Examples

Free essays:

All you need to know about Neuroendocrinology
All you need to know about Big data management
All you need to know about digital special effects
All you need to know Technical Writing?
Basics the Game Theory in Cryptoeconomics
Business innovation ideas for making money
Biosensors for cancer diagnosis
Business Analysis: Pricing strategies and Demand Curve
Cognitive Computing- How does Cognitive Computing work?
Consciousness: characteristics and peculiarities
Conservation Economics
Cybersecurity in business: challenges, risks, and practices
Demographic trends and how they affect Economic Growth
Dance as an art form and entertainment
Discrimination Economics
Determinants of Wages
Everything you need to know about short-term memory
Economic and Policy Impacts of Demographics
Ethics: an essay on the understanding of evil
Emotions: what are they? Theories explained
Factors of Demographic Data Collection
Factors Affecting Purchasing Behavior
Financial Statement Analysis
Factors Influencing Interest and Exchange Rates
Government's Intervention in The Labor Market
Guide on the Pathways of the nervous system
Game theory in microeconomics
Globalization: definition, causes, social impact and risks
How Relativism Promotes Pluralism and Tolerance
How to use the audience’s feedback to write a news report
History of silent cinema
How news report can be strengthened through multimedia
Introduction to Population Problems
Imperfect Information and Asymmetric Information
Imperfect Information in Insurance
Introduction to Labor Markets
Journalism: What is News?
Journalism: Broadcast media and Television Presenters
Journalism: Sources of News
Journalism and Law
Key Determinants of National Income
Key Factors That Affect Pricing Decisions
Kinetic models in biology and Related fields
Know about the different forms of traditional African dances
Latest technology trends
Latest dance trends
Magnetoencephalography (MEG)
Microeconomic Analysis to the Demand for Labor
Neuromuscular disorders
National Economies, Fluctuation, and Responses to Fluctuations
Neurotransmitters: what they are and different types
Nanomedicines to target tumors
Objections to utilitarianism
Organizational motivation and its effects
Overcoming Hiring Challenges for Nonprofit Organization
Population Demographics
Recurrent neural networks (RNN) for speech detection
Russian School of Mathematics
Research and Development
Risk Sharing in Insurance and Asset Markets
Stochastic optimization methods in deep learning?
Structure of the nervous system
Structure of a Corporation
Schizoaffective disorder: how to live better with it
The climate change denial
The techniques of basic cinematography
The Endosymbiotic Theory
The Role of Internal Audit in Corporate Risk Management
Utilitarianism Vs. Kantianism
Understanding Auctions and Auction Theory: Part 2
Various theoretical perspectives of sociology
Virtual reality, what it is and how it works
What are the linear models in machine learning?
What is Convolutional Neural Network
4 Facts about Origin of Mathematics!
5 techniques to create an animation
10 emerging technologies according to World Economic Forum
10 strategies to maximize corporate profits
3d Model Of Building
6 Medical Technologies that revolutionized the healthcare in 2020
All you need to know about the ACA Code of ethics
Architecture and Democracy: An Introduction
Architecture and Democracy: Democratic Values
Architecture and Democracy: Democratic Procedures
All You Need to Know About a Synthesis Essay
An essential guide to understanding Film Theory
Application of Artificial Intelligence in Cyber Security
Applications of electrical engineering
Augmented reality: what it is, how it works, examples
Advantages And Disadvantages Of Social Networking
All you need to know about Cryptography
Applications of astrophysical science
All you need to know about architecture engineering
Applications of geological engineering
Artificial intelligence and medicine: an increasingly close relationship
An insight into Computational Biology
ACA code of conduct
A Rose for Emily
Applications of Mathematics in daily life
Architecture mistakes to avoid
All you need to know about Toxicology
All you need to know about Holistic Medicine
All you need to know about linguistics
An introduction to Linguistics and its subfields
All you need to know about Anxiety disorder
All you need to know about Drones
A Brief Insight into Political Science
Assumptions related to feminism
All you need to know about Byzantine emperors
All you need to know about labour economics
An insight into xenobots -the first-ever robots
An ultimate guide about Biomaterials
A Comprehensive Introduction to the Mona Lisa
Analysis methods of Transport through biological membranes
An ultimate guide about biochemical reactions
Analysis of brain signals
Artificial Gene Synthesis
Application to synthetic biology of CAD methods
All you need to know about metabolic pathways
Applications of BIOMEMS
All you need to know about the epidemiology
Asian vs. western leadership styles
All you need to know about Smart prosthesis
Analysis of Economy: Output of Goods and Services (GNP), and GDP on Economic success
A Guide to Pricing Strategies
An Overview Of Economic Studies
Analysis of Fiscal and Monetary Policies
Analysis of Business Cycles
Analysis of Consumption and Investment
A Look into Regression Analysis
Analysis of Household's Consumption and Savings Behavior
All you need to know about Capital Budgeting
All you need to know about risk management
Art looted in wartime.
Appropriate use of Data in Economics
All you need to know about reaction kinetics?
A historical overview of Financial Crises
All you need to know about management discipline?
An insight into the error-correction models
All you need to know about Data visualization
All you need to know about Work-family balance
All you need to know Technical Writing?
All you need to know about digital special effects
All you need to know about Big data management
All you need to know about Neuroendocrinology
How to Write a Personal Essay
Housing Needs in America
How to Write a Description Essay
How to Create an Excellent Scholarship Essay?
How to write a cause and effect essay
How to Hire the Best Essay Writing Service Provider?
How to Write a College Application Essay?
How to get the most out of your English lectures
How to write Expository Essay
How to succeed in your psychology class?
How to Write an Academic Essay in the Shortest Time?
History of Journalism
How Different Sectors are Using Artificial intelligence (AI)
How to write an informative essay
How to deliver persuasive essays?
How to Give a Convincing Presentation
How to write an essay on leadership?
Historical Art Still Around Today
Humanoid robot: what it is, how it works and price
History of Chemistry
Healthcare Advanced Computer Power: Robotics, Medical Imaging, and More
Healthcare AI: Game Changers for Medical Decision-Making and Remote Patient Monitoring
How to understand different types of English
How to Cope with Chronic Pain
How African American choreographers and dancers have influenced American dance
How mobile robot can do in logistics or in production
How To Become a Successful Entrepreneur
History of the Philosophy of Feminism
How is the climate changing?
How to Track Your Content Marketing ROI
How to Gun control In the USA?
Historical and contemporary role of labour in the modern world
How breast cancers are classified?
How the cells of our body communicate?
How the Lymphatic System Works?
How Digestive System Works
How to complete your capstone projects effectively?
How to write a research project
Healthcare technologies that help patients with better self-management
How to choose the topic of the senior capstone project
How to make your business survive at economic crisis
How can immigrants blend in the American society?
How does the economics of war affect society?
Hate speech on social media.
How to Build an Economic Model
How to start a healthcare startup?
How can financial illiteracy harm you?
How cancer is developed - Cancer biology
How to define the Enterprise Value
How to conduct economic research?
How women can manage sexual harassment
How to use quotes in your news reports?
How news report can be strengthened through multimedia
History of silent cinema
How to use the audience’s feedback to write a news report
How Relativism Promotes Pluralism and Tolerance
Introduction to Urban Studies
Importance of dance in education
InMoov: how to build an open source humanoid robot
Importance of KYC verification to making the Blockchain secure
Importance of Rhythm
Importance of dance student evaluation
I/O control methods -types and explanations
Identity theft: what to do?
Introduction to Utilitarianism
Importance of 3d Modelling in Architecture
Importance of online journalism
Image processing in medical diagnosis
Introduction to USA Politics
Introduction to Comparative Politics
International Relations as a Major in Political Science
Importance of modern trade policy
Introduction to Journalism
Introduction to Writing a TV Script
Introduction of Microfabrication techniques
Introduction to Microeconomics
Interaction of Consumer and Firm Choices in Markets
Importance of corporate sustainability
Issues in International Monetary Macroeconomics
Introduction to Statistics and Data for Economics
Introduction to Data and Probability for Economics
Introduction to the Game Theory
Introduction to Econometrics
Introduction to Economic Information
Introduction to Market Equilibrium
Introduction to Economic Models and Application
Introduction to Empirical Research
Introduction to Econometric Data
Importance of Critical Thinking, Principles, and Goals
Introduction to Identification and Causal inferences
Introduction to Econometric Application
Intermediaries and Government in Financial Crisis
Importance and seven principles of quality management
Illiteracy in the USA
Introduction to Economics of Law
Introduction to Coase Theorem
Introduction to Social Choice and Incarceration
Intellectual Property and Product Liability
Investment in Human Capital
Introduction to Labor Markets
Imperfect Information in Insurance
Imperfect Information and Asymmetric Information
Introduction to Population Problems
The Looming Energy Crisis in America
Top benefits of performance-based engineering
The More Languages You Know, The More Times You Are a Man
Things to consider while writing an Argumentative Essay
Top Ways to Improve Your Academic Writing Skills
Tips to Excel in Creative Writing
The origins of films in the early 19th century
Top career options in Architecture
The Elevator Pitch
Top finance trends 2020
The basic Structure and functionality of robots
The Way to Success
The election system of the President in the United States of America
Two-party System in United States of America
Top trends in urban design
The history and theory of African American filmmaking
Top benefits of creative writing
Tinnitus Guide: Common Symptoms and Treatment Options
The language of dance
The digital image processing management
Top famous politicians of the World
Top methods of political science!
The history of the feminist movement
The blood flow in cardiovascular system - Biofluid Mechanics
The best of Leonardo Da Vinci
The Structure and Function of Macromolecules
The structure of cell: a research on the bricks of the human body!
Tissue and organ construction: Adhesion and recognition between cells
The kinetics of the transformation processes
The Modeling of Biological Systems
Tips for writing a great thesis statement
The Defense mechanisms against infections
The impact of the technological innovations in medicine
Top journalism trends to know about
The relation between mass media & politics
Theranostics: Diagnosis and Care through Nanoparticles
The practical Applications of X-rays
The applications of Ultrasound in medicine
Transfer mechanisms of genetic information in Bacteria
The regulation of cellular metabolism in the diagnosis
The Principles of MRI Contrast agents
The technical basis of optical coherence imaging
The New Media: Emerging Trends
The Structure of Interest Rates and the Yield Curve
Technological perspectives and reflections of neural engineering
Types of bioreactors and their applications
The Role of Government Policy in Improving Economic Outcomes
Types of corporate responsibility
The Role of IMF in International Monetary Macroeconomics
Tools for investment decision making
The concept of Organizational Culture and its applications
The Conduct of Monetary and Fiscal Policy
The Basics of Financial Accelerator Models
Tips for labeling medical devices- Medical Entrepreneurship
The different medical imaging techniques
The Economics of Uncertainty – Introduction
Theories of Public Policy
The Game Theory in Social Media
The political theory of Thomas Hobbes
The Use of Law on Economics and Vice Versa
The Role of Internal Audit in Corporate Risk Management
The Endosymbiotic Theory
The techniques of basic cinematography
The climate change denial
What is a Definition Essay?
What are diagnostic essays?
What is the relation between art structural engineering?
What is a Narrative Essay
What are robotics and intelligence systems?
What are the benefits of studying health sciences?
What is artificial intelligence and why it matters?
What is comparative Literature?
Why study neuroscience
What is Wi-Fi and how does it works
What is French history famous for?
What are Humanistic Studies?
What is covered in Biophysics?
What is modern journalism?
What is Virtualization? Benefits & Applications
What are modern public relations?
What is plasma physics?
What is teacher preparation?
What is rapid prototyping for 3D printing?
What is contemporary European Politics?
Why should you learn American Ballet?
What is engineering physics?
What is the purpose of African American Literature?
Ways to learn the Rhythm
What is digital art used for?
What are Enzymes and how do they work
Who is the father of political science?
Why Study Political Science - Job?
What is the Philosophy of Feminism?
What is a quantum computer?
Ways B2B Startups Streamline Their Conversion Strategies
Why do biomedical signals need processing?
What are the long term effects of climate change?
Why study labour relations
What is Holoprosencephaly?
What is antisocial disorder?
What are the important principles of evolution?
What is the cytoplasm and its function?
What is biopolymers?
What Makes a Good Leader
Women empowerment in modern generation
What is the history of political thought?
What is Gene recombination
What is synthetic biology
What is business cost analysis?
What is Inflation
What are the consequences of unemployment?
What is lithotripsy and its types?
What is transition elastography?
What is the purpose of deep brain stimulation?
What is a Brain-Computer Interface (BCI)
What is neuroethics?
What is Market and Supply and Demand
What is optogenetics?
What are the techniques to record brain activity?
What happens if the interest rate increases?
What is immunotherapy?
What is the economic role of the financial market?
What are the factors behind illegal immigration?
What is the lymphocyte activation?
What is financial market and its types?
What is the structure of financial markets?
What are the methods of measuring business performance?
What is the Credit market?
What is business ethics and code of ethics
What are the Causes of financial instability?
What is MBA with Concentrations
What is regenerative medicine?
What is Population ecology?
What is Microfinance: evolution, and practices?
What is biotechnology and its applications?
What are Workplace diversity and its benefits?
What is the difference between a leader and a manager?
What Is Branding and best branding Business strategies?
Why are microelectronics important?
What are biologic drugs.
What is the Foreign Exchange market?
What is the role of scientific research in times of crisis?
What are the risks of international trade?
What is financial management?
What is gene therapy?
What is education economics?
What is regression analysis, and why should you use it?
What Is Technology Marketing And How Should It Work?
What is Management Accounting
What are the methods of valuation of companies?
What is Immune System and Immunotherapy?
What is big data analytics?
What is the 7 layers of OSI model?
What is Neuroplasticity?
What are Sculpture art and its types?
What are the different genres of films?
What is Transcranial magnetic stimulation (TMS)?
What is TES-Transcranial electrical stimulation?
What is Relativism?
What is Vaccine skepticism, and what to do about it?
What happens in the brain when learning?
What is the deep neural network?
What is Convolutional Neural Network
What are the linear models in machine learning?