Blog

How to use Mycroft in a different language

25 Nov, 2019
Xebia Background Header Wave

Mycroft is a voice system that can be configured for any language but it requires language files and most importantly a parser for that language. I was listening to some music the other day and this one line in a song really struck me: “Praat Nederlands met me” (“Speak Dutch to me”). I have been working on a Voice Assistant built on Mycroft for quite a while now but it was always in English. So I thought “How hard can it be?”. Well it is not that hard, but I learned a lot along the way. I’ll show you the steps you need to take to achieve this followed by some insights into how language support is evolving within the Mycroft ecosystem.

Steps to take

Change the language of your “device”

Mycroft is available on a variety of platforms, such as on the (no longer available for purchase) Mark 1 physical device, but also on a Raspberry Pi or a desktop for that matter. I typically experiment on my desktop for ease of development, before I promote my experiment to my Raspberry pi. I followed this section of the Mycroft Readme which describes the steps necessary to get Mycroft running on my local machine.

Mycroft gets its configuration from the online environment and accompanying configuration files. In order to change the language you need to modify the file called mycroft.conf  which is typically located in a folder in your home directory called .mycroft/  (see this document for more details). When you have a fresh installation of Mycroft this file will look as follows:

{
  "max_allowed_core_version": 19.8
}

To change your device into a Dutch device you change it to

{
  "max_allowed_core_version": 19.8,
  "lang": "nl-nl"
}

Restart Mycroft to load the new settings. Other configuration changes can be simply applied by saying update configuration  to your assistant, but this change is somehow not picked up by doing that.

This setting will change the default Speech to text (STT) engine to recognize Dutch. This works out of the box.The built-in Text to Speech (TTS) engine does not support Dutch (at the time of writing) and will try to synthesize the Dutch phrases with an English voice. I needed to take another step to fix that.

Change the text-to-speech engine to one supporting the new Language

Mycroft has a very modular architecture and one of the pluggable components in that architecture is the TTS engine. It supports various implementations for generating speech, see this up to date list for the current options. In my setup I went with the Google version as it currently requires the least effort to setup.

To enable Google’s TTS engine change your mycroft.conf  file into the following:

{
  "max_allowed_core_version": 19.8,
  "lang": "nl-nl",
  "tts": {
    "module": "google",
    "google": {
      "lang": "nl",
      "slow": false
    }
  }
}

Now restart Mycroft to make sure this configuration change gets processed. Now that Mycroft is able to “understand” and “speak” Dutch I can start developing Skills in Dutch

Change your skill to support the new language

Skills in Mycroft are designed to support multiple languages by separating the logic of the skill from the words/phrases used in that skill. A typical skill consists of a python file defining the logic of the skill combined with a vocab  folder and a dialog  folder.

hello-world-skill
├── dialog/
│   ├── en-us
│   │   ├── how.are.you.dialog
│   │   └── welcome.dialog
│   └── nl-nl
│       ├── how.are.you.dialog
│       └── welcome.dialog
├── __init__.py
└── vocab
    ├── en-us
    │   ├── HowAreYou.intent
    │   └── ThankYouKeyword.voc
    └── nl-nl
        ├── HowAreYou.intent
        └── ThankYouKeyword.voc

What I typically do is take the en-us  folders (as they are usually the most complete) and copy them to the locale I’m creating (nl-nl in this case). I then go through the files and translate the words/phrases one by one.

Consequences of using a language different from English in Mycroft

Certain skills stop working, even though they have Dutch translations in their folders. One of the main reasons for this is the translations of other languages lagging behind English. This is especially problematic when using intents that need to be handled by the Padatious intent parser. This intent parser works on sample phrases specified in a file. If this file is missing for the active language the skill fails to load altogether.

Another reason for this is a missing implementation in the language parsing framework. Which skills rely on this framework can be found in the list mentioned in this pull request. These skills rely on the parsing framework for extracting numbers or date and times from utterances, but this parsing has only been implemented for a handful of languages. If you try to parse a sentence in an unsupported language you’ll get an error.

Wake word recognition will also stop working if you are using one of the provided wake words (<span class="lang:default decode:true crayon-inline ">Hey Mycroft</span> , Hey Ezra , Christopher  or <span class="lang:default decode:true crayon-inline ">Hey Jarvis</span> ). This is because the default wake word recognition engine (called precise ) doesn’t have any models available for Dutch. Mycroft will therefore fallback to the Pocketsphinx wake word recognition engine. This engine doesn’t have a Dutch language model either, leaving you with Hey Mycroft  as the only available wake word. You can overcome this issue by creating your own wake word definition for the precise  engine, but it is quite an involved process (as described here).

Final thoughts

Getting Mycroft to fully support a language is currently still a daunting task. It requires changes throughout the entire system and needs a lot of community effort to get it right. Mycroft hosts a translation initiative to translate all words/phrases used in skills. You can contribute to that by signing up here: https://translate.mycroft.ai/. As mentioned, I created an initial version of a Dutch language parser which got merged into the development branch quite recently. Have a look at this github issue to see all the details. I’m working on extending this rudimentary version and once I complete this I will write another blog post about the intricacies of processing natural language.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts