Uncategorized

Booking a Meeting Room with Alexa – Part Two – Coding the Skill

Hey there! In my previous post Booking a Meeting Room with Alexa – Part One, I talk about how to build up the Interaction Model for your Skill using the Alexa Developer Console. Now, I’ll talk about how to write code that can handle the requests.

Setting Up

I chose to use JavaScript to write the skill, as I wanted to try something a little different to Java which is what I normally use. Alexa has an SDK that allows you to develop Skills in a number of languages including Java and Javascript, but also C#, Python, Go and probably many more. I chose Javascript because of its quick load time and conciseness. I’ve written a previous Skill in both Javascript and Java, the former taking < 1 second to execute and the latter taking ~ 2.5 seconds. They both did the same thing, but Java apps can become bloated quickly and unknowingly if you pick certain frameworks, so be weary when choosing your weapon of choice and make sure it’s going to allow you to write quick responding skills. Waiting for Alexa to respond is like waiting for a spinning wheel on a UI, or like your elderly relative to acknowledge they’ve heard you… I’m sure you know what I mean.

To develop in Javascript, I used npm for managing my dependencies, and placed my production code under “src” and test code under “test” (sorry, Java idioms kicking in here!). I used npm init to create my package.json, which includes information about my package (such as name, author, git url etc.) and what dependencies my javascript code has. I later discovered that you can use ask new to create a bootstrapped skill, which you can then use to fill the gaps with your business logic.

Regarding dependencies, there’s a couple of key ones you need for Alexa development: ask-sdk-core and ask-sdk-model. I also used the ssml-builder library, as it provides a nice Builder DSL for crafting your responses. 

Skill Structure

Skills have an entrypoint for receiving a request, and then delegate off to a specific handler that’s capable of servicing it. The skeleton of that entry point looks like this:

const Alexa = require('ask-sdk-core');
var Speech = require('ssml-builder');

let skill;

exports.handler = async function (event, context) {
    if (!skill) {
        skill = Alexa.SkillBuilders.custom()
            .addRequestHandlers(
                <Your Handlers Here>
            )
            .addErrorHandlers(ErrorHandler)
            .create();
    }
    const response = await skill.invoke(event, context);
    return response;
};

So in your top-level handler, you specify one or more RequestHandlers, and one or more ErrorHandlers. Upon calling the create() function you get returned a Skill object, which you can then use to invoke with the received request.

Lazy initialisation of the singleton skill object is because your lambda code can stay active for a period of time after it completes a request, and can handle other requests that may subsequently occur. Initialising this only once speeds up subsequent requests.

Building a RequestHandler

In the middle of the Alexa.SkillBuilders code block, you can see my <Your Handlers Here> placeholder. This is where you pass in RequestHandlers. These allow you to encapsulate the logic for your Skill into manageable chunks. I had a RequestHandler per Intent that my Skill had, but it’s quite flexible. It used something similar to the chain of command pattern, passing your request to each RequestHandler until it finds one that can handle the request. Your RequestHandler has a canHandle function, which returns a boolean stating whether it can handle the request or not:

const HelpIntentHandler = {
    canHandle(handlerInput) {
        return handlerInput.requestEnvelope.request.type === 'IntentRequest'
            && handlerInput.requestEnvelope.request.intent.name === 'AMAZON.HelpIntent';
    },
    handle(handlerInput) {
        const speechText = 'Ask me a question about Infinity Works!';

        return handlerInput.responseBuilder
            .speak(speechText)
            .reprompt(speechText)
            .withSimpleCard('Help', speechText)
            .getResponse();
    }
};

As you can see above, the canHandle function can decide whether or not it can handle the request based on properties in the request. Amazon has a number of built in Intents, such as AMAZON.HelpIntent and AMAZON.CancelIntent that are available to your Skill by default. So it’s best to have RequestHandlers that can do something with these such as providing a list of things that your Skill can do.

Under that, you have your handle function, which takes the request and performs some actions with it. For example that could be adding two numbers spoken by the user, or in my case calling an external API to check availability and book a room. Below is a shortened version of my Room Booker Skill, hopefully to give you a flavour for how this would look:

async handle(handlerInput) {

        let accessToken = handlerInput.requestEnvelope.context.System.user.accessToken;
        const deviceId = handlerInput.requestEnvelope.context.System.device.deviceId;
        let deviceLookupResult = await lookupDeviceToRoom(deviceId);
        if (!deviceLookupResult)
            return handlerInput.responseBuilder.speak("This device doesn't have an associated room, please link it to a room.").getResponse();

        const calendar = google.calendar({version: 'v3', auth: oauth2Client});
        const calendarId = deviceLookupResult.CalendarId.S;
        let event = await listCurrentOrNextEvent(calendar, calendarId, requestedStartDate, requestedEndDate);

        if (roomAlreadyBooked(requestedStartDate, requestedEndDate, event)) {

            //Look for other rooms availability
            const roomsData = await getRooms(ddb);
            const availableRooms = await returnAvailableRooms(roomsData, requestedStartDate, requestedEndDate, calendar);
            return handlerInput.responseBuilder.speak(buildRoomBookedResponse(requestedStartDate, requestedEndDate, event, availableRooms))
                .getResponse();
        }
        
        //If we've got this far, then there's no existing event that'd conflict. Let's book!
        await createNewEvent(calendar, calendarId, requestedStartDate, requestedEndDate);
        let speechOutput = new Speech()
            .say(`Ok, room is booked at`)
            .sayAs({
                word: moment(requestedStartDate).format("H:mm"),
                interpret: "time"
            })
            .say(`for ${requestedDuration.humanize()}`);
        return handlerInput.responseBuilder.speak(speechOutput.ssml(true)).getResponse();
    }

Javascript Gotchas

I’ll be the first to admit that Javascript is not my forte, and this is certainly not what I’d call production quality! But for anyone like me there’s a couple of key things I’d like to mention. To handle data and time processing I used Moment.js, a really nice library IMO for handling datetimes, but also for outputting them in human-readable format, which is really useful when Alexa is going to say it.

Secondly… callbacks are fun… especially when they don’t trigger! I smashed my head against a wall for a while wondering why when I was using the Google SDK that used callbacks, none of them were getting invoked. Took me longer than I’d like to admit to figure out that the lambda was exiting before my callbacks were being invoked. This is due to Javascript running in an event loop, and callbacks being invoked asynchronously. The main block of my code was invoking the 3rd party APIs, passing a callback to execute later on, but was returning way before they had chance to be invoked. As I was returning the text response within these callbacks, no text was being returned for Alexa to say within the main block, so she didn’t give me any clues as to what was going wrong!

To get around this, I firstly tried using Promises, which would allow me to return a Promise to the Alexa SDK instead of a response. The SDK supports this, and means that you can return a promise that’ll eventually resolve, and can finalise the response processing once it does. After a bit of Googling, I found that it’s fairly straightforward to wrap callbacks in promises using something like:

return new Promise(function (resolve, reject) {

        dynamoDb.getItem(params, function (err, data) {
            if (err) reject(err);
            else {
                resolve(data.Item);
            }
        });
    });

Now that I’d translated the callbacks to promises, it allowed me to return something like the following from the Skill, which the SDK would then resolve eventually:

createNewEvent(calendar, requestedStartDate, requestedEndDate).then(result -> return handlerInput.responseBuilder.speak("Room Booked").getResponse();
)

Unfortunately, I couldn’t quite get this to work, and it’s been a couple of months now since I did this I can’t remember what the reason was! But things to be wary of for me are the asynchronous nature of Javascript, and Closures – make sure that objects you’re trying to interact with are in the scope of the Promises you write. Secondly, using Promises ended up resulting in a lot of Promise-chains, which made the code difficult to interpret and follow. Eventually, I ended up using the async/await keywords, which were introduced in ES8. These act as a lightweight wrapper around Promises, but allow you to treat the code as if it were synchronous. This was perfect for my use case, because the process for booking a room is fairly synchronous – you need to know what room you’re in first, then check its availability, then book the room if it’s free. It allowed me to write code like this:

let deviceLookupResult = await lookupDeviceToRoom(deviceId, ddb);
let clashingEvent = await listCurrentOrNextEvent(calendar, calendarId, requestedStartDate, requestedEndDate);
if (!clashingEvent) {
    await createNewEvent(calendar, calendarId, requestedStartDate, requestedEndDate);

    let speechOutput = new Speech()
        .say(`Ok, room is booked at`)
        .sayAs({
            word: moment(requestedStartDate).format("H:mm"),
            interpret: "time"
        })
        .say(`for ${requestedDuration.humanize()}`);
    return handlerInput.responseBuilder.speak(speechOutput.ssml(true)).getResponse();
}

That to me just reads a lot nicer for this particular workflow. Using async/await may not always be appropriate to use, but I’d definitely recommend looking into it.

Speech Synthesis Markup Language (SSML)

The last thing I want to discuss in this post is Speech Synthesis Markup Language (SSML). It’s a syntax defined in XML that allows you to construct phrases that a text-to-speech engine can say. It’s a standard that isn’t just used by Alexa but by many platforms. In the code snippet above, I used a library called ssml-builder which provides a nice DSL for constructing responses. This library then takes your input, and converts it to SSML. The code above actually returns:

<speak>Ok, room is booked at <say-as interpret-as='time'>9:30</say-as> for an hour</speak>

Alexa supports the majority of features defined by the SSML standard, but not all of them. I used https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html as a reference of what you can get Alexa to do, and it’s still quite a lot! The main thing I had trouble with was getting SSML to output times in a human-readable way – even using the time hints in the say-as attributes resulted in pretty funky ways to say the time! That’s when moment.js came to the rescue, as it was able to output human-readable forms of the times, so I could avoid using SSML to process them entirely.

If you want to play about with SSML, the Alexa Developer Console provides a sandbox under the “Test” tab, which allows you to write SSML and have Alexa say it. This way you can identify the best way to output what you want Alexa to say, and experiment with tones, speeds, emphasis on certain words etc to make her feel more human:

Wrapping Up

And that’s it for this post, hopefully that gives you an idea of where to start if you’ve not done Alexa or Javascript development before (like me!) In the next post I’ll be touching on how to unit test Skills using Javascript frameworks.

Whilst writing this post, Amazon have been sending me step-by-step guides on Alexa Development which I think would be useful to share too, so if you get chance take a look at these as well. And you don’t even need to be a coder to get started with these! Until next time…

Design your Voice Experience
Identify your Customers
Write your Script
Define your Voice Interaction

Build your Front End, Your Way
Developer Console
Command-Line Interface
Third Party Tools – no Coding Required!

Build the Back-End
Start with AWS Lambda
More Tools – No Back-End Setup Required

Booking a Meeting Room with Alexa – Part One

Hey there! This is part one into my adventures of developing an Alexa skill. I was inspired recently on client site, where I saw they’d installed a shiny new room booking system. Each meeting room had a touch screen setup outside of it, and from it you could see who’d booked the room, and also use it to book the room out if it was available.

It had the right idea, but from talking to people I learnt that it wasn’t the most user-friendly, and that it had cost a pretty penny too! I’d been looking for an excuse to dabble with Alexa and Voice UIs, so I decided to see if I could build something similar with commodity hardware.

“Alexa, book this meeting room out for 48 minutes”

Because I like nothing more than diving in at the deep end, I chose a completely unfamiliar tech stack to me. My comfort zone as of late is Java and ECS, so I used AWS Lambda to host the Skill and Javascript as the development language. I used the Serverless framework to manage deployments. The development of a Lambda Skill is split up into two parts – creating and hosting the voice interface, and then the application code that handles your requests.

In this blog post I’ll be focusing on developing the Invocation Model using the Alexa Development Console. To get started, you can go here and sign in using your Amazon.com account. If you need to create an account you can do that here too.

With Alexa, what you write are Skills – code functions that carry out the action you want to happen. They’re triggered by Invocations – user requests in the form of phrases that Alexa uses to figure out what you’re trying to do. In my case, an Invocation was “Alexa, book out this meeting room for 48 minutes”.

Once you get set up with an account, you’ll end up at a page listing your current skills. Towards the right hand side there’s a button called “Create Skill”, go ahead and click that to be taken to the following page to create your skill:

Amazon gives you a number of template models to choose from, to speed up development and give examples of what you can do with Alexa. You can also “Provision your own” backend resources, directing your Skill either to a http endpoint or an AWS Lambda. Alternatively, you can choose “Alexa-Hosted”, which uses AWS Lambda but integrates the code development into the Alexa Console, so you can do code development alongside in the same UI.

An Alexa Skill can have one or more Intents – actions or requests that your Skills can handle. An Intent can be something like “what’s the weather today”, or “what’s on my agenda today”, or “book me a meeting room” (see where I’m going with this? 😉). Intents can be invoked by one or more Utterances, the phrase you’ll use to request your Intent. You can link one or more Utterances to an Intent, which can be useful to capture all the variations that someone might use to request your Intent.

As part of designing the UX, I found it useful to test how I’d interact with my Skill on an Echo Device, but with the microphone turned off. It was interesting to see how many variations I could come up with to request booking a room, and I noted all of these variations and configured them as Utterances, as you can see below:

Within these Utterances, you can have Slots too – parameter placeholders that allow you to specify variables to the request, making the requests more dynamic. In my case, this was allowing the user to specify the duration of the booking, and optionally providing a start time, but it equally could have been movie actors, days of the week, a telephone number etc. Amazon has various Slot Types, such as animals, dates, countries and so on, which allows Alexa to try to match the user request with a value in that Slot Type. These Slots can be optional as well, so your requests can include one or more parameters. You can do this by configuring multiple Utterances, that use one or more of your Slots.

If you don’t want to use of the preconfigured Slot Types you can create your own list of values to match the parameter against, or use the AMAZON.SearchQuery Spot Type, although I’ve had varying success with its speech recognition.

Not related to my Meeting Room Booker Skill, but something worth mentioning. It doesn’t always quite catch what I say (or interprets it differently to how I intended), making it difficult to do exact matches or lookups. For example I tried building a “Skills Matrix” Skill, where I could name a technology and Alexa would tell me who knows about it. I didn’t realise you could have so many variations on interpreting the words “Node JS”! The only way I could think of getting around it at the time was to have a custom “Technology” Slot Type, and for the more difficult technologies to pronounce, list all the expected variations in there. You can also employ a “Dialog Delegation Strategy”, which allows you to defer all dialog management to your lambda, which allows far more possibilities to interact with your user (e.g. you could use fuzzy logic or ML to figure out what the client meant), but it’s a bit more advanced to get set up.

It’s worth noting at this point, you can have a different Interaction Model per Locale, which makes sense as it allows you to account for things such as language and dialect differences. The key thing to ensure is that when you’re developing and testing your Skill (which I’ll cover in following posts) that you’re always using the same Locale, otherwise you just get a rather unhelpful “I don’t know what you mean”-esque response, or an even less unhelpful but more confusing “Uber can help with that”, which completely threw me off for much longer than I’d like to admit!

Eventually, I had an Interaction Model for the Skill created through the UI. Once you’re past the point of trying it out and want to productionise it, you’ll probably be thinking how to create and modify these Skills programmatically. Thankfully, there’s an Alexa Skills Kit (ASK) SDK, that allows you to do just that.

Here’s a link for installation instructions for the CLI – https://developer.amazon.com/docs/smapi/quick-start-alexa-skills-kit-command-line-interface.html

And here’s a quick start to creating a new Skill using the CLI – https://developer.amazon.com/docs/smapi/quick-start-alexa-skills-kit-command-line-interface.html

You can use the ASK CLI to create, update and delete skills. It’s fairly simple to use, so long as all your json config is correct – the errors it returns don’t give you much insight if you’ve missed a required parameter, or specified an invalid value for example.

As I’d already had a Skill created at this point using the UI, I used the CLI here to pull the metadata generated from the UI, to store in Git. The commands I used in particular were:

ask api list-skills to get the skillId for the newly created Skill
ask api get-skill -s {skillId} to get the Skill metadata as a json payload
ask api get-model -s {skillId} -l {locale} to get the Interaction Model metadata as a json payload

At this point, everything that I did in the UI was now available to me as code, and I was able to check it all in to Git. I found it very useful to do that just as with any code, because once you start tweaking and trying out various things it can be difficult to revert back to a good working state without it should things go wrong. You can use the following commands to update your Skill:

ask api update-skill -s {skillId} -f {skill-file-location} to update the Skill metadata
ask api update-model -s {skillId} -l {locale} -f {model-file-location} to update the Interaction Model

You can also use the ASK CLI to create a Skill from scratch, without ever needing to use the UI. You can use ask new to configure and provision a Skill, and it also creates you a folder structure with the json files I generated from my existing Skill already set up, ready for you to get started.

So that was a rather quick “how to get up and going” creating an Alexa Skill. The next step is linking the Skill to some backend code to handle the requests. I’ll be following this blog up with a how to on that, but in the meantime if you have any questions feel free to give me a shout!

Also, if you’re reading this and thinking “my business could really benefit from an Alexa Skill”, then please drop me a line at ben.foster@infinityworks.com and let’s talk 🙂