Free Real-Time Captioning Service using Google Chrome’s Web Speech API, Node.js, and Amazon’s Elastic Cloud Computing

For awhile now, I’ve wanted to be able to offer live captions for people attending services at my church who may be deaf or hard of hearing, to allow them to follow along with the sermon as it is spoken aloud. I didn’t want them to have to install a particular app, since people have a wide variety of phone models and OS’s, and that just sounded like a pain to support long-term. I also wanted to develop something low-cost, so that more churches and ministries could benefit from it.

I decided to take concepts learned my PresentationBridge project from last year’s downtown worship night and use it for this project. The idea was essentially the same, I wanted to be able to relay, in real-time, text data from a local computer to all connected clients using the Node.js socket.io library. Instead of the text data coming from something like ProPresenter, the text data would be the results of the Web Speech API’s processing of my audio source.

If you’re a Google Chrome user, Chrome has implemented W3C’s Web Speech API, which allows you to access the microphone, capture the incoming audio, and receive a speech-to-text result, all within the browser using JavaScript. It’s fast and, important to me, it’s free!

Here is how it works: The computer that is doing the actual transcribing of the audio source to text must use Google Chrome and connect to a Bridge room, similar to how my PresentationBridge project works. Multiple bridge rooms (think “venues” or “locations”) can be configured on the server, and if multiple rooms are available, when end users connect, they will be given an option to choose the room they want to be in and receive text. The only requirement for browser choice is the computer doing the transcribing; all others can use any browser on any computer or device they choose.

Screen Shot 2019-11-04 at 1.36.34 PM
This is the primary Bridge interface that does the transcribing work.

From the Bridge interface, you can choose which “Bridge” (venue) you want to control. If the Bridge is configured with a control password, you will have to enter it. Once connected, you can choose whether to send text data to the connected clients, go to logo, etc. You can redirect all users to a new webpage at any time, send a text/announcement, or reload their page entirely. To start transcribing, just click “Start Listening”. You’ll have to allow Chrome to have access to the microphone/audio source (only the first time). When you are connected to the Bridge, you can also choose to send the users to Logo Mode (helpful when you’re not broadcasting), or you can choose to send data or turn it off (helpful when you want to test transcribe but not send it out to everyone). There is also a simple word dictionary that can be used to replace commonly misidentified words with their proper transcription.

A note about secure-origin and accessing the microphone: If you’re running this server and try to access the page via localhost, Google Chrome will allow you to access the microphone without a security warning. However, if you are trying to access the page from another computer/location, the microphone will be blocked due to Chrome’s secure-origin policy.

If you’re not using a secure connection, you can also modify the Chrome security flag to bypass this (not recommended for long-term use because you’ll have to do this every time Chrome restarts, but it’s helpful in testing):

  • Navigate to chrome://flags/#unsafely-treat-insecure-origin-as-secure in the address bar.
  • Find and enable the Insecure origins treated as secure section.
  • Add any addresses you want to ignore the secure origin policy for. Remember to include the port number (the default port for this project is 3000).
  • Save and restart Chrome.

Here is a walkthrough video of the captioning service in action:

[wpvideo r6P0iWGj ]

I chose to host this project on an Amazon EC2 instance, because my usage fits within the free tier. We set up a subdomain DNS entry to point to the Elastic IP so it’s easy for people in the church to find and use the service. The EC2 instance uses Ubuntu Linux to run the Node.js code. I also used ngninx as a proxy server. This allowed me to run the service on my custom port, but forward the necessary HTTPS (port 443) traffic to it, which helps with load balancing and keeps my server from having to handle all of that secure traffic. I configured it to use our domain’s SSL certificate.

I also created a simple API for the service so that certain commands like “start listening”, “send data”, “go to logo” etc. can be done remotely without user interaction. This will make it easier to automate down the road, which I plan to do soon, so that the captioning service is only listening to the live audio source when we are at certain points in the service like the sermon. Because it’s just a simple REST API, you can use just about anything to control it, including a Stream Deck!

IMG_2076.JPG
We deployed them in our two auditoriums using ChromeBooks. An inexpensive solution that runs the Chrome Browser!

In order to give the devices a direct feed from our audio consoles, I needed an audio interface. I bought this inexpensive one off Amazon that’s just a simple XLR to USB cable. It works great on Mac, PC, and even ChromeBooks.

Screen Shot 2019-11-14 at 2.15.19 PM.png
XLR to USB audio interface so we can send a direct feed from the audio console instead of using an internal microphone on the computer running the Bridge.

If you’d like to download LiveCaption and set it up for yourself, you can get it from my Github here: https://github.com/josephdadams/LiveCaption

I designed it to support global and individual logos/branding, so it can be customized for your church or organization to use.

Using Google Apps Script with user input to automate repetitive tasks in Google Docs

Do you find yourself ever doing repetitive tasks over and over again in Google Docs? (Or any of the Google Suite Apps?) I sure do. At my church, we create a Google Doc every week for all of the “talking points”, the parts of the service that aren’t song or sermon, where we script out what someone needs to say or communicate during that portion.

screen shot 2019-01-13 at 5.48.00 am
Here is a sample document that we use each week.

A couple years ago, I started creating template files to help my team do this every week, because having the template already there with some common headers, the service date, etc. removed the barrier to get down to writing the actual words. Creating the files wasn’t too complicated, and after awhile, I started making them “in bulk”, where I would sit down and just make 3-4 months worth of documents at a time, making copies of my master template, editing the new file and updating the date, etc. Then we added a second auditorium, which doubled the amount of documents I needed to create.

With the new year, it was time to create more documents, so I decided this time around that I would create a script to help automate this task using the framework within Google Apps Script.

If you’ve not heard of or used Google Apps Script (GAS), it’s a scripting language based on Javascript, for light-weight application development. All of the code runs on Google’s servers to interact with your documents. If you’ve ever used an “add-on” in Google Apps, it’s using this scripting framework.

It’s pretty easy to use if you know Javascript, and it’s easy to get started. From any document, just go to Tools > Script Editor. This opens a new tab where you can start writing Apps Script.

Here is my script:

[code language=”JavaScript”]

function myFunction()
{
var ui = DocumentApp.getUi();

var templateDocId = ‘[templateid]’; // put the document ID of the master template file here

var prompt_numberOfDocs = ui.prompt(‘How many Talking Point Documents do you want to create?’);
var prompt_startingDate = ui.prompt(‘What is the starting date? Please enter in MM/dd/yyyy.’);

var numberOfDocs = parseInt(prompt_numberOfDocs.getResponseText());
var startingDate = prompt_startingDate.getResponseText();

var prompt_venueResponse = ui.prompt(‘Venue’, ‘Create Documents for both Auditoriums? If no, please type in the Venue Title and click “No”.’, ui.ButtonSet.YES_NO);

var venueTitle = ”;

var bothAuditoriums = true;

if (prompt_venueResponse.getSelectedButton() == ui.Button.NO)
{
venueTitle = prompt_venueResponse.getResponseText();
bothAuditoriums = false;
}

var date = new Date(startingDate);

var htmlOutput = HtmlService
.createHtmlOutput(‘Creating ‘ + numberOfDocs + ‘ documents. Please stand by…

‘)
.setWidth(300)
.setHeight(100);

ui.showModalDialog(htmlOutput, ‘Talking Points – Task Running’);

for (var i = 0; i < numberOfDocs; i++)
{
var loopDate = new Date(date.getTime()+ ((i * 7) * 3600000 * 24)); // uses the looping interval to get the starting date and add 7 days to it, creating a new date object
var documentName = 'Talking Points – ' + Utilities.formatDate(loopDate, Session.getScriptTimeZone(), "MMMM dd, yyyy");
var documentDate = Utilities.formatDate(loopDate, Session.getScriptTimeZone(), "MM/dd/yyyy");
if (bothAuditoriums)
{
createNewTalkingPointDocument(templateDocId, documentName + ' (Aud 1)', 'Aud 1', documentDate);
createNewTalkingPointDocument(templateDocId, documentName + ' (Aud 2)', 'Aud 2', documentDate);
}
else
{
documentName += ' (' + venueTitle + ')';
createNewTalkingPointDocument(templateDocId, documentName, venueTitle, documentDate);
}
}

htmlOutput = HtmlService
.createHtmlOutput('google.script.host.close();’)
.setWidth(300)
.setHeight(100);
ui.showModalDialog(htmlOutput, ‘Talking Points – Task Running’);
}

function createNewTalkingPointDocument(templateDocumentId, documentName, venueTitle, documentDate)
{
//Make a copy of the template file
var documentId = DriveApp.getFileById(templateDocumentId).makeCopy().getId();

//Rename the copied file
DriveApp.getFileById(documentId).setName(documentName);

//Get the document body as a variable
var body = DocumentApp.openById(documentId).getBody();

//Insert the entries into the document
body.replaceText(‘##Venue##’, venueTitle);
body.replaceText(‘##Date##’, documentDate);
}

[/code]

Once you have a script in place, you can choose triggers for when it should run, like when it is opened, or on a schedule, etc.

Here is the new template with the script in action:

screen shot 2019-01-13 at 6.10.10 am

First, I ask how many documents should be created. 1, 5, 500, whatever I need.

screen shot 2019-01-13 at 6.10.29 am

Next, I ask for the starting date. We specifically use these for Sunday services, so I’ve programmed the script to take this starting date and then calculate every 7 days when creating multiple documents.

screen shot 2019-01-13 at 6.10.44 am

Then, I ask the user if they want to create documents for both auditoriums, or if this is for a special service or off-site service, etc. Typically we want them for both auditoriums, but the one-off feature makes things easy for those types of services too.

screen shot 2019-01-13 at 6.10.57 am

As the script runs, it displays this dialog box. Creating that many documents can take awhile, and I wanted the user to be aware of this. The box goes away automatically when the process is completed.

Now that we have this, I can pass the task on to anyone on our team, anytime they need these documents! And it saves a good bit of time. I definitely spent less time creating this script than I would have spent creating the 3-4 months worth of documents manually, and now I never have to do that again!

How can you use Google Apps Script to automate some of your more repetitive tasks?