I spent some time over the weekend experimenting with voice2json and rhasspy, trying to set up a fully offline voice assistant system using Mozilla DeepSpeech for speech recognition, a template file containing all known phrases and their mappings to intents, an intent recognizer and a local shell script to parse the recognized intent and invoke commands (think opening a website or folder when an intent is recognized). Rhasspy was easy to use and really fun. It’s amazing how far we’ve come in terms of open-source tools in the TTS/speech recognition space.
Along the way, I discovered Larynx, a TTS system for Linux with high-quality voices from Glow-TTS and others, with intonations that sound human. I’ve kept an eye on the Linux TTS space for years and have been disappointed by the limited consumer-use options. Often, the pre-trained TTS voices sound all too robotic for everyday use. I suppose that’s understandable given the dearth of open-source voice datasets (which is why projects like Mozilla CommonVoice are so exciting!). It’s nice to have a pleasant pre-trained TTS model natively available on Linux.
My use case is to copy text in a browser/Thunderbird RSS article, hit a shortcut and have the TTS system read the selected text aloud so I can look away from the screen and just listen.
I followed the Debian installation instructions, and downloaded and installed the
Harvard Glow TTS files.
# cd Downloads # (or /path/to/downloaded/deb/files) sudo apt install ./larynx*.deb
To create a shortcut that invokes Larynx on selected text, I added aliases in my
~/.bash_aliases file. They use
xclip to access clipboard and selection data. On Debian-based systems, you should be able to install it with
sudo apt install xclip.
# Speak text passed as argument # Usage: speak "This is a test" alias speak="larynx --voice harvard-glow_tts --interactive" # Speak clipboard text # Usage: speak-clipboard alias speak-clipboard="xclip -out -selection clipboard | speak" # Speak currently selected text # Usage: speak-selection alias speak-selection="xclip -out -selection primary | speak"
Under Settings –> Keyboard on GNOME, I added a custom keybinding for Super+S to invoke
bash -i -c "speak-selection". This lets me select any text and hit Super+S to invoke larynx