Tech

Jun 2021

Larynx - A viable Linux TTS

I spent some time over the weekend experimenting with voice2json and rhasspy, trying to set up a fully offline voice assistant system using Mozilla DeepSpeech for speech recognition, a template file containing all known phrases and their mappings to intents, an intent recognizer and a local shell script to parse the recognized intent and invoke commands (think opening a website or folder when an intent is recognized). Rhasspy was easy to use and really fun. It’s amazing how far we’ve come in terms of open-source tools in the TTS/speech recognition space.

Along the way, I discovered Larynx, a TTS system for Linux with high-quality voices from Glow-TTS and others, with intonations that sound human. I’ve kept an eye on the Linux TTS space for years and have been disappointed by the limited consumer-use options. Often, the pre-trained TTS voices sound all too robotic for everyday use. I suppose that’s understandable given the dearth of open-source voice datasets (which is why projects like Mozilla CommonVoice are so exciting!). It’s nice to have a pleasant pre-trained TTS model natively available on Linux.

My use case is to copy text in a browser/Thunderbird RSS article, hit a shortcut and have the TTS system read the selected text aloud so I can look away from the screen and just listen.

Setup

I followed the Debian installation instructions, and downloaded and installed the tts, lang-en_us and Harvard Glow TTS files.

# cd Downloads # (or /path/to/downloaded/deb/files)
sudo apt install ./larynx*.deb

TTS Shortcut

To create a shortcut that invokes Larynx on selected text, I added aliases in my ~/.bash_aliases file. They use xclip to access clipboard and selection data. On Debian-based systems, you should be able to install it with sudo apt install xclip.

# Speak text passed as argument
# Usage: speak "This is a test"
alias speak="larynx --voice harvard-glow_tts --interactive"

# Speak clipboard text
# Usage: speak-clipboard
alias speak-clipboard="xclip -out -selection clipboard | speak"

# Speak currently selected text
# Usage: speak-selection
alias speak-selection="xclip -out -selection primary | speak"

Under Settings –> Keyboard on GNOME, I added a custom keybinding for Super+S to invoke bash -i -c "speak-selection". This lets me select any text and hit Super+S to invoke larynx

References

Apr 2021

Jodi Sudoku

Happy to share that I’m working on Jodi Sudoku, an open-source (AGPLv3), Sudoku progressive web app with the goal of implementing multiplayer support using WebRTC!

I’ve been following the adoption of WebRTC (now supported on all major platforms except Safari, infamous for dragging its feet) and the decentralized web for years, but haven’t had the opportunity to work with it, so I decided to use it in a hobby project.

Frontend

The frontend code uses the standard React + Typescript setup, with React Router and Redux. So far, the web app features

  • Starting a new game with a level of difficulty of your choice
  • Keyboard and mouse/touch input
    • Changing the mode of entering values into cells - Choose a digit first, and then click a cell to enter the digit, or vice versa
  • Responsive cross-platform layout, tested on Firefox and Chrome (desktop and mobile)
  • Undo and redo using redux store history

I’m a big fan of internationalization and supporting regional languages, so the basic one-player implementation currently supports Kannada, English and Polish using react-i18n.

Backend

The backend is more esoteric - I’m interested in using Rails + WebSockets. While I primarily wear the Android hat at work, our backend is built with Rails (a framework I had not worked with before) and I’m interested in improving my Rails fu through this project and contributing more than just the occasional pull request. Rails has a thin wrapper over WebSockets called Action Cable, and I’m curious to see how it fares compared to Node ( + socket.io), which seems to be the internet’s go-to recommendation for scalable WebSockets. WebSockets is a great way to achieve peer discovery (your browser tab discovering peers and finding a way to establish a direct connection despite NAT traversal), required for WebRTC. If there’s a lot of traffic and Action Cable/Rails struggles under the load, I’m aware I might need to swap out Action Cable for Node someday, but I think the Rails experimentation will be worth it. Besides, Node feels like home turf and it’ll be fun to try something different :)

Multiplayer

What’s with the idea of multiplayer Sudoku, you ask? Is there any interest in multiplayer Sudoku? I’m not sure - I found a few multiplayer games on different app stores, but I don’t think there’s a big community. This project is honestly just an excuse to find a way to play Sudoku with my mum - Playing Sudoku together is a ritual whenever I’m in Bangalore. With mugs of basil and ginger tea after dinner, and armed with pens at the kitchen table, we pair up (Jodi (ಜೋಡಿ) is the Kannada word for “pair”) on solving the Sudoku puzzle in the daily newspaper (remember those?), trying to get as many digits as possible but also explaining to each other how we “unlocked” the right digits. I miss that, and I see no reason why that should stop when I’m not in Bangalore :) I’d be happy if others find it useful as well.

The idea is to implement URL-based discovery of peers. Like Jitsi calls, users who open the same link will be able to connect to the server using WebSockets, receive information from the server on the users in the “room”, and then establish browser-to-browser WebRTC peer connections, making the server connection theoretically unnecessary after that point.

I’m thinking of building different multiplayer modes

  • Cooperative: all players in a room collaboratively solve the same puzzle together
  • Challenge (same puzzle): all players in a room individually solve the same puzzle, only aware of the number of empty/filled cells of other players.
  • Challenge (puzzle of same difficulty): All players in a room solve different puzzles of the same difficulty level, only aware of the number of empty/filled cells of other players. Could be fun to also add a “Peek player’s board” feature
  • Time challenge? Turn-based games? Some dastardly variant of Sudoku?

Q: Privacy and fault tolerance

I’ve encountered a lot of flaky internet connections and I’d like Jodi Sudoku to be tolerant of and handle edge cases related to flaky connections, users having to refresh their tabs, abrupt drop-offs etc. Given that exploring WebRTC and peer-to-peer systems is one of the goals, it’s imperative that the server know as little as possible about users, rooms or the type and state of the game in a room.

If the server knows nothing about the state of the game in a room, who is the source of truth in case of failure? Should all players in a room store the game state locally (in the browser local/session storage), and send the most recent state (via WebRTC) to any player who re/joins a room? What happens if two players disagree on the most recent state of the game? Should rooms have a leader who is responsible for storing the game state and sharing it with users who enter the room? What happens if the leader’s connection goes down? I have struggled to square the two seemingly contradictory goals of privacy/decentralization, and maintaining a source of truth for fault recovery, and decided to move forward with making the server aware of users, rooms, and game states. Please reach out if you have any suggestions that would keep the server out of the loop after peer discovery.

My current approach is:

Peer discovery

  1. Abbi enters a room (a room is identified by the link, i.e, judisudoku.app/some-room)
  2. Abbi establishes a WebSocket connection with the server.
  3. Abbi receives the list of users in the room via WebSocket. The list does not have anyone except Abbi. The WebSocket connection with the server is kept open.
  4. Ilana enters the room, connects to the server and receives the list of users. The WebSocket connection with the server is kept open.
  5. Abbi and Ilana use a STUN server, bypass their NAT, and establish a peer-to-peer connection.

Gameplay

  1. Abbi and Ilana begin a cooperative game, notifying each other via WebRTC and notifying the server via WebSocket.
  2. Abbi and Ilana send messages directly to each other via WebRTC when they enter values into cells. They would also need to passively notify the server of the state changes via WebSocket
  3. The server passively stores the game state, doing nothing with it.

New player/rejoin

  1. Jaime joins the game late, and is announced to the other users by the server (the updated list of users is shared) via WebSocket
  2. The server shares the latest state of the game to Jaime via WebSocket
  3. Once connected to the other users via WebRTC p2p connections, Jaime follows step 7 to participate in the game.

Using the server as the single source of truth allows users who join a game midway or experience connection issues and rejoin the game to continue the (collaborative) game from the latest state (if other users have made changes). Other multiplayer modes may need to handle failure/rejoin scenarios differently.

Next steps

I realise I’ve defined a lot of lofty technical ambitions for a simple Sudoku webapp, and I’d like to take this one step at a time. I’m planning to make small releases to keep up my motivation and get feedback from my target demographic of one - When it comes to hobby projects, I tend to be very excited during the design phase and in the early days of implementation, and lose interest and move on to the next shiny puzzle that needs solving when the problem is more or less “solved” in my head :)

The next release will have peer discovery and cooperative game mode.

Jun 2019

Hey Big Tech, Support Regional Languages!

A few months ago, I bumped into someone who was using Kannada as their system language on their iPhone. Curious to find out Android’s support for languages that are not English, I switched my system language to Kannada as well and was pleasantly surprised to find nearly everything on stock Android in Kannada. The app ecosystem on the other hand is a different story.

Uber and Ola, India’s leading taxi apps, don’t seem to care about regional language support - While Ola makes no attempt to support use in Kannada, Uber’s attempt at Kannada internationalization seems to be an afterthought, with entirely broken screens.

Why support regional languages?

  • Barrier to entry - As a person in tech, with friends who work in or take an interest in tech, it’s easy to forget that using technology is hard for a lot of people. Adding a language barrier into the mix makes it worse. English is intimidating to a lot of people, especially in India. Anecdotally, I’ve had relatives tell me that they’d really like to be able to use taxi apps to move around, but they’re afraid of selecting the wrong location or ordering the wrong class of taxi (resulting in memorising the flow of taps 🤦). There’s an argument to be made about better UI/UX, but I think interfaces that are exclusively in English become a barrier to entry.

  • English is not inclusive - The history of English in India is steeped in colonialism (well, duh!), casteism, and elitism. Walk around in any “second-tier” city in India, and you’ll see advertisements for a multitude of companies, schools and websites offering to teach you how to speak “fluent English”. The clamour to learn English is of course driven by its employment potential, but I’d argue that it has to do with caste and elitism as well. Indian society places a high value on English fluency, and apps are silently reinforcing the discrimination that comes with it.

  • Content availability is a problem - There’s a classic chicken-and-egg (could not find an equivalent vegan expression!) issue at play - Platforms don’t have support for regional languages, and there’s not enough regional language content to make it worthwhile for platforms to support them. It looks like Amazon is yet to support Kannada content on Kindles! For now, you have a choice from as many as TWO Kindle eBooks if you search by language. TWO!

  • $$$$ - Okay, fine! I’ll stoop to appealing to capitalism - Companies are leaving money on the table by making their products and services inaccessible to a lot of people! Think about the growth potential, the happy investors, those beautiful graphs in upswing!

  • Because squiggly letters are awesome! ದಯವಿಟ್ಟು ನಿಮ್ಮ ಫೋನ್ ಹಾಗು ಕಂಪ್ಯೂಟರ್-ಗಳಲ್ಲಿ ನಿಮ್ಮ ಭಾಷೆಯನ್ನು ಉಪಯೋಗಿಸಿ

What can you do?

  • Strength in numbers - Don’t be an elitist, prop up regional language numbers by switching your system language on your device(s)
  • Contribute translation strings to open-source projects! I started contributing Kannada translations to Signal Messenger’s Android app, and it’s surprisingly fun and very satisfying to see the number of untranslated strings go down on every submit :)
    • Find open-source projects that could use your help at Weblate
    • Join (or create) a GNOME internationalization team here
    • Contribute to Mozilla’s apps and websites
  • Lobby governments to mandate the availability of regional languages (perhaps in addition to other languages?) on government websites. In 2019, governments paying third parties taxpayer money to build websites that lack regional language support is like building footpaths that are not wheelchair-friendly - egregious and just plain criminally negligent!

Apr 2018

Commit to Master Branch on GitHub Using Travis CI

I’m trying to turn my pet project to find visa requirements for couples, Nomad Couple, into a full-fledged progressive web app (PWA) that has regularly updated visa requirements data (from Wikipedia) through a Travis CI cron job.

As part of that, I’ve set up the wiki scraper repo to be rebuilt every month and any changes in the visa requirements for citizens of different countries (available in dist/output) to be automatically committed back to the master branch.

Attempt #1 - Travis’ Github Pages deployment

I noticed that Travis CI offers automated deployment to GitHub Pages via configuration in .travis.yml. By default, it commits code to the gh-pages branch, but the configuration has a target_branch property to customize it. This is the configuration I tried

language: node_js
node_js:
  - "lts/*"

# ...
# Unrelated configuration for node.js/electron project
# ...

script:
  - npm run scrape
deploy:
  provider: pages
  skip-cleanup: true
  target-branch: master # Commit to master instead of gh-pages
  github-token: $GH_TOKEN
  keep-history: true # By default, Travis uses push --force and wipes out commit history
  verbose: true
  on:
    branch: master

The GH_TOKEN mentioned in the config refers to a token generated (with public_repo permission) on the GitHub personal access tokens page that I’ve saved on Travis CI as an environment variable.

While this was super easy to set up, it has a few issues.

Drawbacks

  1. This approach is dependent on Travis CI’s GitHub Pages support. You’re out of luck if you’re using a different Git hosting provider or your own Git server.
  2. Travis listens for commits on the master branch and triggers a build. Since the commit at build time is pushed back to master, this triggers an infinite build loop[1]!
  3. It’s not possible (as of Apr 2018) to customize the commit message. This rules out adding "[skip ci]" to the commit message to avoid the infinite loop.

Attempt #2 - Good ol’ shell script

Travis supports after_success, a hook that is called when the build succeeds. I replaced the deploy section in .travis.yml above with:

after_success:
- sh .travis-push.sh

travis-push.sh

#!/bin/sh
# Credit: https://gist.github.com/willprice/e07efd73fb7f13f917ea

setup_git() {
  git config --global user.email "travis@travis-ci.org"
  git config --global user.name "Travis CI"
}

commit_country_json_files() {
  git checkout master
  # Current month and year, e.g: Apr 2018
  dateAndMonth=`date "+%b %Y"`
  # Stage the modified files in dist/output
  git add -f dist/output/*.json
  # Create a new commit with a custom build message
  # with "[skip ci]" to avoid a build loop
  # and Travis build number for reference
  git commit -m "Travis update: $dateAndMonth (Build $TRAVIS_BUILD_NUMBER)" -m "[skip ci]"
}

upload_files() {
  # Remove existing "origin"
  git remote rm origin
  # Add new "origin" with access token in the git URL for authentication
  git remote add origin https://vinaygopinath:${GH_TOKEN}@github.com/vinaygopinath/visa-req-wiki-scraper.git > /dev/null 2>&1
  git push origin master --quiet
}

setup_git

commit_country_json_files

# Attempt to commit to git only if "git commit" succeeded
if [ $? -eq 0 ]; then
  echo "A new commit with changed country JSON files exists. Uploading to GitHub"
  upload_files
else
  echo "No changes in country JSON files. Nothing to do"
fi

TODO: If you’re using Travis on public GitHub repositories, your build log is publicly visible. If there are any Git related errors, it is possible that the origin URL (with your GitHub personal access token with access to ALL your public repositories) may be logged, which is a huge security risk. It is strongly recommended to redirect the output of all git commands to /dev/null (e.g, git push origin master --quiet > /dev/null 2>&1) once you’ve verified that the script works for your repo.

References:

Nov 2016

Social media rich snippets with AngularJS and ngMeta

ngMeta, my Angular1 SEO meta tags library, has frequently seen issues related to the preview snippets or rich snippets generated by Facebook, Twitter, Skype, Whatsapp and others when a URL of an Angular site using ngMeta is shared. I’ve addressed it on Github several times, so I thought I would explain the issue in greater detail here.

Sites like Facebook and Twitter use crawlers to fetch URLs and extract the Open Graph meta values. When Open Graph data is not available, they fall back on the basic meta tags (title, description and others). Social media crawlers generally do not execute Javascript, meaning that they pick up the values provided in meta tags as-is. For sites that use ngMeta, meta tags might look like this:

<title ng-bind="ngMeta.title"></title>
<meta name="description" content="{{ngMeta.description}}" />
<meta property="og:title" content="{{ngMeta.title}}" />
<meta property="og:description" content="{{ngMeta.description}}" />

As a result, the rich snippet generated by social media sites is not pleasant (Remember, no Javascript!): Facebook rich snippet

However, Google search’s crawlers do execute Javascript and the search result snippet uses the title, description and other tags set through ngMeta, as expected.

It is technically impossible for a front-end framework library like ngMeta to force crawlers to execute Javascript and pick up the meta content values set through Javascript. Instead, solutions include serving pre-rendered pages, or redirecting requests by crawlers to a specific service that serves a static page with meta tags relevant to the requested URL. For more on the latter, check out angular-social-demo on GitHub.

Another option: If you’re willing to forsake the customization of meta tags for different pages to get rid of uninterpolated Angular expressions like ngMeta.title in your social media snippets, consider adding fallback meta tags that provide a title, description and image for your site. These fallback meta tags must have ng-if="false" so that they are removed in a Javascript-enabled environment:

<title ng-bind="ngMeta.title"></title>
<meta name="description" content="{{ngMeta.description}}" />
<meta property="og:title" content="Site name or other general title" ng-if="false" />
<meta property="og:description" content="Site description" ng-if="false" />
<meta property="og:image" content="https://your-site.com/fallback-snippet-image.jpg" ng-if="false"/>

Feel free to check out the source code of the ngMeta demo site. I’ve just updated it with fallback meta tags.