Free Real-Time Captioning Service using Google Chrome’s Web Speech API, Node.js, and Amazon’s Elastic Cloud Computing

For awhile now, I’ve wanted to be able to offer live captions for people attending services at my church who may be deaf or hard of hearing, to allow them to follow along with the sermon as it is spoken aloud. I didn’t want them to have to install a particular app, since people have a wide variety of phone models and OS’s, and that just sounded like a pain to support long-term. I also wanted to develop something low-cost, so that more churches and ministries could benefit from it.

I decided to take concepts learned my PresentationBridge project from last year’s downtown worship night and use it for this project. The idea was essentially the same, I wanted to be able to relay, in real-time, text data from a local computer to all connected clients using the Node.js library. Instead of the text data coming from something like ProPresenter, the text data would be the results of the Web Speech API’s processing of my audio source.

If you’re a Google Chrome user, Chrome has implemented W3C’s Web Speech API, which allows you to access the microphone, capture the incoming audio, and receive a speech-to-text result, all within the browser using JavaScript. It’s fast and, important to me, it’s free!

Here is how it works: The computer that is doing the actual transcribing of the audio source to text must use Google Chrome and connect to a Bridge room, similar to how my PresentationBridge project works. Multiple bridge rooms (think “venues” or “locations”) can be configured on the server, and if multiple rooms are available, when end users connect, they will be given an option to choose the room they want to be in and receive text. The only requirement for browser choice is the computer doing the transcribing; all others can use any browser on any computer or device they choose.

Screen Shot 2019-11-04 at 1.36.34 PM
This is the primary Bridge interface that does the transcribing work.

From the Bridge interface, you can choose which “Bridge” (venue) you want to control. If the Bridge is configured with a control password, you will have to enter it. Once connected, you can choose whether to send text data to the connected clients, go to logo, etc. You can redirect all users to a new webpage at any time, send a text/announcement, or reload their page entirely. To start transcribing, just click “Start Listening”. You’ll have to allow Chrome to have access to the microphone/audio source (only the first time). When you are connected to the Bridge, you can also choose to send the users to Logo Mode (helpful when you’re not broadcasting), or you can choose to send data or turn it off (helpful when you want to test transcribe but not send it out to everyone). There is also a simple word dictionary that can be used to replace commonly misidentified words with their proper transcription.

A note about secure-origin and accessing the microphone: If you’re running this server and try to access the page via localhost, Google Chrome will allow you to access the microphone without a security warning. However, if you are trying to access the page from another computer/location, the microphone will be blocked due to Chrome’s secure-origin policy.

If you’re not using a secure connection, you can also modify the Chrome security flag to bypass this (not recommended for long-term use because you’ll have to do this every time Chrome restarts, but it’s helpful in testing):

  • Navigate to chrome://flags/#unsafely-treat-insecure-origin-as-secure in the address bar.
  • Find and enable the Insecure origins treated as secure section.
  • Add any addresses you want to ignore the secure origin policy for. Remember to include the port number (the default port for this project is 3000).
  • Save and restart Chrome.

Here is a walkthrough video of the captioning service in action:

I chose to host this project on an Amazon EC2 instance, because my usage fits within the free tier. We set up a subdomain DNS entry to point to the Elastic IP so it’s easy for people in the church to find and use the service. The EC2 instance uses Ubuntu Linux to run the Node.js code. I also used ngninx as a proxy server. This allowed me to run the service on my custom port, but forward the necessary HTTPS (port 443) traffic to it, which helps with load balancing and keeps my server from having to handle all of that secure traffic. I configured it to use our domain’s SSL certificate.

I also created a simple API for the service so that certain commands like “start listening”, “send data”, “go to logo” etc. can be done remotely without user interaction. This will make it easier to automate down the road, which I plan to do soon, so that the captioning service is only listening to the live audio source when we are at certain points in the service like the sermon. Because it’s just a simple REST API, you can use just about anything to control it, including a Stream Deck!

We deployed them in our two auditoriums using ChromeBooks. An inexpensive solution that runs the Chrome Browser!

In order to give the devices a direct feed from our audio consoles, I needed an audio interface. I bought this inexpensive one off Amazon that’s just a simple XLR to USB cable. It works great on Mac, PC, and even ChromeBooks.

Screen Shot 2019-11-14 at 2.15.19 PM.png
XLR to USB audio interface so we can send a direct feed from the audio console instead of using an internal microphone on the computer running the Bridge.

If you’d like to download LiveCaption and set it up for yourself, you can get it from my Github here:

I designed it to support global and individual logos/branding, so it can be customized for your church or organization to use.


  1. hello,
    I am interested in using this for my church. However, much of what you say in the article is “Greek” to me. Do you have a more in depth description on setting this up. For example, where do I go for the EC2; there are so many option on the AWS and I do not see one on the free tier that mentions EC2.
    Thanks in advance!


  2. Hi Joseph! We have many immigrants who attend church services. Would it also be possible to display the text in another language?


    1. The captioning is performed almost word-by-word, so I don’t think it would translate very well if ran through a service, unless you were able to send it full sentences, which it doesn’t know when a sentence has ended. For real time translation, consider having a person who speaks both the language being spoken and the language you need do the translation. You could have immigrants wear earpieces hearing that person translate, for example.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s