For awhile now, I’ve wanted to be able to offer live captions for people attending services at my church who may be deaf or hard of hearing, to allow them to follow along with the sermon as it is spoken aloud. I didn’t want them to have to install a particular app, since people have a wide variety of phone models and OS’s, and that just sounded like a pain to support long-term. I also wanted to develop something low-cost, so that more churches and ministries could benefit from it.
I decided to take concepts learned my PresentationBridge project from last year’s downtown worship night and use it for this project. The idea was essentially the same, I wanted to be able to relay, in real-time, text data from a local computer to all connected clients using the Node.js socket.io library. Instead of the text data coming from something like ProPresenter, the text data would be the results of the Web Speech API’s processing of my audio source.
Here is how it works: The computer that is doing the actual transcribing of the audio source to text must use Google Chrome and connect to a Bridge room, similar to how my PresentationBridge project works. Multiple bridge rooms (think “venues” or “locations”) can be configured on the server, and if multiple rooms are available, when end users connect, they will be given an option to choose the room they want to be in and receive text. The only requirement for browser choice is the computer doing the transcribing; all others can use any browser on any computer or device they choose.
From the Bridge interface, you can choose which “Bridge” (venue) you want to control. If the Bridge is configured with a control password, you will have to enter it. Once connected, you can choose whether to send text data to the connected clients, go to logo, etc. You can redirect all users to a new webpage at any time, send a text/announcement, or reload their page entirely. To start transcribing, just click “Start Listening”. You’ll have to allow Chrome to have access to the microphone/audio source (only the first time). When you are connected to the Bridge, you can also choose to send the users to Logo Mode (helpful when you’re not broadcasting), or you can choose to send data or turn it off (helpful when you want to test transcribe but not send it out to everyone). There is also a simple word dictionary that can be used to replace commonly misidentified words with their proper transcription.
A note about secure-origin and accessing the microphone: If you’re running this server and try to access the page via localhost, Google Chrome will allow you to access the microphone without a security warning. However, if you are trying to access the page from another computer/location, the microphone will be blocked due to Chrome’s secure-origin policy.
If you’re not using a secure connection, you can also modify the Chrome security flag to bypass this (not recommended for long-term use because you’ll have to do this every time Chrome restarts, but it’s helpful in testing):
- Navigate to chrome://flags/#unsafely-treat-insecure-origin-as-secure in the address bar.
- Find and enable the Insecure origins treated as secure section.
- Add any addresses you want to ignore the secure origin policy for. Remember to include the port number (the default port for this project is 3000).
- Save and restart Chrome.
Here is a walkthrough video of the captioning service in action:
I chose to host this project on an Amazon EC2 instance, because my usage fits within the free tier. We set up a subdomain DNS entry to point to the Elastic IP so it’s easy for people in the church to find and use the service. The EC2 instance uses Ubuntu Linux to run the Node.js code. I also used ngninx as a proxy server. This allowed me to run the service on my custom port, but forward the necessary HTTPS (port 443) traffic to it, which helps with load balancing and keeps my server from having to handle all of that secure traffic. I configured it to use our domain’s SSL certificate.
I also created a simple API for the service so that certain commands like “start listening”, “send data”, “go to logo” etc. can be done remotely without user interaction. This will make it easier to automate down the road, which I plan to do soon, so that the captioning service is only listening to the live audio source when we are at certain points in the service like the sermon. Because it’s just a simple REST API, you can use just about anything to control it, including a Stream Deck!
In order to give the devices a direct feed from our audio consoles, I needed an audio interface. I bought this inexpensive one off Amazon that’s just a simple XLR to USB cable. It works great on Mac, PC, and even ChromeBooks.
If you’d like to download LiveCaption and set it up for yourself, you can get it from my Github here: https://github.com/josephdadams/LiveCaption
I designed it to support global and individual logos/branding, so it can be customized for your church or organization to use.