You don’t say! Chrome has had support since around 2013, when Chrome 25 was released. Coming to think of it, it is even weirder now that so little websites seem to make use of the native API’s that are supported in Chrome, Firefox and Edge. Disregarding the part of the user base that still uses Internet Explorer, we continued on our journey.
We based our implementation on a repository that accompanies the MDN page (available here). After downsizing it a little bit, our implementation looked something like the following:
And it worked! This piece of code will ask for your permission to use the microphone after 3 seconds. If you click ‘allow’, it will then listen how well you do on pronouncing Bahasa Indonesia. Finally, it logs the transcribed results in your console. If your Bahasa is rusty, you can of course change the language code on line 6 to any of the supported languages shown in this demo. Note that these are the languages available in Chrome, other browsers might have a different list.
The Web Speech API seems to do a pretty good job at transcribing user commands in various languages. By default, the recording will stop as soon as the user stops speaking. If you want to transcribe longer pieces of audio, you can also set recognition.continuous
to true
. In production environments, you would first want to check support for the SpeechRecognition interface and handle its absence. Also, it might be wise to only start recording after the user clicks a button instead of triggering it with a window.setTimeout()
. But I’ll leave that as an exercise to you.
Speech recognition is a breeze to implement using the Web Speech API. Speech synthesis is even easier to implement, of which you can see the proof here. Enhancing your website with speech recognition can really enhance the user experience when it comes to searching or shopping. Inexperienced customers might like to express their intent in natural language, describing what they want to do or find. In traditional UIs, this is often not possible. Most search inputs force you to type in a condensed kind of language that does not even closely resemble a grammatically correct sentence or action, something which speech recognition might solve. Speech synthesis can in turn help in the realm of chatbots, notifications or messaging, amongst others. In short, smart assistant technologies can not only help us in our homes or on our phones, we can also use them anywhere on the web where natural language is a good transport for communication with your users. So, how are you going to use it in your app?