Speech Recognition combines knowledge from linguistics, acoustics, computer science and signal processing (a subfield of electrical engineering) to come up with methodologies and technologies that allow computer systems to recognize, and act upon, spoken language.
Also commonly referred to as Speech To Text, this interdisciplinary field is typically studied at the post-graduate level in most universities.
A common mistake is to confuse speech recognition with voice recognition. The two are different, though somewhat overlapping, technological areas. In voice recognition, the emphasis is to identify the speaker based on the properties of their speech, whereas in speech recognition, the objective is to understand what the user is saying, and interpret that as content or commands.
In the case of SpeechToDoc Pro, for example, the statement “Select last sentence” is interpreted as a command applies highlighting to the last sentence. However, the statement “Let’s go to the beach” will print the worlds literally.
Cortana, Siri and Google Alexa are well-known examples of Speech Recognition assistants. A user can use chatty, conversational language to instruct a computer, tablet, or smartphone to execute a series of tasks, that without Speech Recognition, might have involved an inconvenient set of steps.
For example, with Speech Recognition, a user can speak into their Android phone, “Hey Alexa, please find me the best Chinese restaurant within 4 miles”. This is more convenient than using the screen as an interface, where the request would have involved several finger gestures and typing.
Many Speech Recognition systems improve with training. Speech Recognition training is also called “enrollment”. As speakers read text into the system, the system analyzes pronounciation styles and syntactical patterns, and becomes more accurate as it gains experience. Systems that use training are called “speaker dependent” systems. Systems that do not use training, are called “speaker independent” systems.
From a history perspective, Speech Recognition has benefited enormously from successive waves of major innovations. Most recently, it has benefited from Deep Learning Networks, and Big Data.
Among the biggest players in the Speech Recognition industry include Google, Microsoft, IBM, Apple, Amazon, Baidu, Nuance, Ifly Tek and more. Many of which have publicized the core technology in their speech recognition systems as being based on deep learning.
Frequently Asked Questions:
Speech recognition is used to convert spoken words into their text equivalent. If they are voice commands, speech recognition executes the voice commands. Voice recognition is a biometric technology used to identify a particular individual’s voice or for speaker identification.
Speech recognition is a technology. It allows a user to interact with a computer system in natural conversational voice mode, in other words, by speaking to the machine just as you would speak to another person. Applications include data entry, menu navigation, document creation, and telephone dialing.
The algorithms used in speech recognition include PLP features, Viterbi search, deep neural networks, discrimination training, WFST framework, etc. You can keep track of Google’s latest developments in speech recognition technology, by frequently checking their recent publications on speech.