Site icon Shahad's Blogs

Language Model Integration in Encoder-Decoder Speech Recognition

photo of a motherboard

Photo by Mikhail Nilov on Pexels.com

Currently, Attention-based recurrent Encoder-Decoder models provide an elegant way of building end-to-end models for different tasks, like automatic speech recognition (ASR), machine translation (MT), etc. An end-to-end ASR model folds traditional acoustic model, pronunciation model, and language model (LM) into a single network. An encoder maps the input speech to a sequence of higher-level learned features. On the other hand, the decoder maps these higher-level features to the output text or labels. This also provides an alignment between the speech and text with the help of an attention mechanism. This type of model can learn end-to-end (E2E) and just requires paired speech and text data.

As E2E models require paired speech and text data, it restricts the models to only these texts. Conventional ASR models leverage a separate LM trained on all available text. As a result, this can be of larger magnitude order. To leverage the power of an external language model, several approaches have been taken.

Language Model integration appraoches

We can divide the LM integration approaches into three broader categories. Let’s discuss these briefly.

Integration scenarios

Now the question arises about – when to integrate the LM model? We can consider the following criteria or scenarios:

Other LM integration approaches

We can integrate a language model in other approaches also. Let’s have a look at the following approaches:

We can use these approaches for integrating the language model with an encoder-decoder attention-based model. This article is based on a research work titled – “A comparison of techniques for language model integration in encoder-decoder speech recognition“. You can have a look for more mathematical and implementation details.


That’s all for this post. I hope this post will help you to enrich your understanding of language model integration in an encoder-decoder model. I am a learner who is learning new things and trying to share with others. Let me know your thoughts on this post. Any suggestions or opinions will be highly appreciated. You can reach me through LinkedIn, Facebook, Email, or find me on GitHub. Get more machine learning-related posts here.

Exit mobile version