From Java Microservices to Lambda functions – a journey

You may be one of many organisations (or an engineer in one) that operates Java microservices in the cloud with a desire to move towards a serverless architecture, but are unable to justify the steep migration path (e.g. decomposing your services into functions, rewriting in a more suitable language etc.) from those microservices to the likes of AWS Lambda.

But fear not! Because with the help of spring-cloud-function you can repurpose your existing microservices into serverless functions in a gradual and controlled manner, with minimal effort or interruption of service.

In this article I’ll explain how you can achieve this utilising the Strangler Fig Pattern to quickly prove out the approach and see if it fits your needs. I’ve used AWS CDK, ECS, ALB and Lambda to demonstrate how you can move traffic from a Java microservice to multiple Lambda functions.

I’ve built a sample codebase and accompanying CDK code. I’ve used git branches to show how you go about transitioning over to Lamdba, which I’ll be talking about through this post:

https://github.com/foyst/spring-petclinic-rest-serverless

(The above repo is freely available for people to base prototypes on, to quickly experiment with this approach in their organisations)

It’s based upon the Spring Petclinic REST Application example. I wanted to use something that was well known and understood, and representative of a real application scenario that demonstrates the potential of spring-cloud-function.

Note that the API and documentation for spring-cloud-function has changed over time, and I personally found it difficult to understand what the recommended approach is. So I hope this article also captures some of those pain points and provides others with a guide on how to implement it.

Along the journey I battled against the above and other gotchas, that can distract and take considerable time to overcome. To not distract from the main narrative I’ve moved these to the back of this post, but will list them here for completeness, and signpost to them at the times they cropped up in my explorations.

Setting up AWS CDK

If you’re following along with the GitHub repo above, make sure you’re on the master branch for the first part of this blog.

I’ll not digress into how to set up CDK as there are plenty of resources out there. These are the ones which I found particularly useful:

https://cdkworkshop.com/15-prerequisites/500-toolkit.html

https://cdkworkshop.com/20-typescript/20-create-project/100-cdk-init.html

https://docs.aws.amazon.com/cdk/api/latest/docs/aws-construct-library.html

https://gitter.im/awslabs/aws-cdk

Using the above references, I created my CDK solution with cdk init sample-app --language typescript as a starting point.

I’ve put together a simple AWS architecture using CDK to demonstrate how to do this. I’ve kept to the sensible defaults CDK prescribes, its default configuration creates a VPC with 2 public and 2 private subnets. This diagram shows what’s deployed by my GitHub repo:

I have an RDS MySQL instance, which is used by both the ECS and Lambda applications. ECS is running the Petclinic Java microservice and Lambda is running its serverless counterparts. An Application Load Balancer is used to balance requests between the Fargate container and Lambda functions.

I used Secrets Manager to handle the generation of the RDS password, as this also allows you to pass the secret securely through to the ECS Container. For info on how to set up Secrets Manager secrets with RDS in CDK so you can do credentials rotation, I used https://dev.to/michaelfecher/i-tell-you-a-secret-provide-database-credentials-to-an-ecs-fargate-task-in-aws-cdk-5f4.

Initially I tried to deploy a VPC without NAT gateways, to keep resources isolated and unnecessary costs down. But this is where I encountered my first Gotcha, due to changes in the way Fargate networking works as of version 1.4.0.

Stage One – All requests to Java Container

So in the first instance, all of the requests are routed by default to the Petclinic Fargate Service:

Later on, I’ll use weighted target groups and path-based routing to demonstrate how you can use the Stangler Pattern to gradually migrate from microservices to Lambda functions in a controlled fashion.

To deploy the initial infrastructure with the RDS and ECS Services running, cd into the cdk folder and run the following command:

cdk deploy --require-approval=never --all

This will take some time (~30 mins), mainly due to the RDS Instance spinning up. Go put the kettle on…

Transforming a Spring Boot Microservice to Serverless

Now for the meaty part. In my GitHub repo you can switch over to the 1-spring-cloud-function branch which includes additional CDK and Java config for writing Spring Cloud Functions.

There’s a number of good articles out there that demonstrate how to create serverless functions using Java and Spring, such as Baeldung with https://www.baeldung.com/spring-cloud-function. Where this blog hopefully differs is by showing you a worked example on how to decompose an existing Java microservice written in Spring Boot into Lambda functions.

Importantly, make sure you use the latest versions and documentation – a lot has changed, and there are so many search results pointing to outdated articles and docs that it can make it confusing to understand which are the latest. The latest version of spring-cloud-function at the time of writing this is 3.2.0-M1, and the documentation for this can be found here:

https://docs.spring.io/spring-cloud-function/docs/3.2.0-M1/reference/html/spring-cloud-function.html#_introduction

This too: https://cloud.spring.io/spring-cloud-function/reference/html/

And example functions can be found in https://github.com/spring-cloud/spring-cloud-function/tree/main/spring-cloud-function-samples

So, can you do dual-track development of the existing application alongside splitting out into Lambda functions? Yes, by pushing all application logic out of the REST and Lambda classes (delegating to a Service or similar if you don’t already have one) and having a separate Maven profile for lambda development. By comparing the master and 1-spring-cloud-function branches you can see the additional changes made to the pom.xml, which include this new “lambda” profile:

The Maven lambda profile is aimed at developing the lambdas. It has the spring-cloud-function specific dependencies and plugins connected to it, which ensures none of those bleed into the existing core Java application. When you do development and want to build the original microservice jar, you can use existing Maven build commands as before. Whenever you want to build the Lambda jar just add the lambda profile, e.g. ./mvnw package -P lambda

For this example I’ve created a couple of functions, to demonstrate both how to define multiple functions within one jar, and how to isolate them in separate AWS Lambda functions. I’ve called them getAllOwners and getOwnerById which can be found in src/main/java/org/springframework/samples/petclinic/lambda/LambdaConfig.java:

    @Bean
    public Supplier<Collection<Owner>> getAllOwners() {
        return () -> {
            LOG.info("Lambda Request for all Owners");
            return this.clinicService.findAllOwners();
		};
    }

    @Bean
    public Function<Integer, Owner> getOwnerById() {
        return (ownerId) -> {
            LOG.info("Lambda Request for Owner with id: " + ownerId);
            final Owner owner = this.clinicService.findOwnerById(ownerId);
            return owner;
        };
    }

This is where I experienced my second gotcha! Spring Cloud Functions aspires to provide a cloud-agnostic interface which gives you all the necessary flexibility you may need, but sometimes you want control of the platform internals. In the above for example, you can’t handle returning a 404 when a resource is not found because you don’t have access to the payload that’s returned to ALB/API Gateway.

Thankfully, after posting a Stack Overflow question and a GitHub issue, promptly followed by a swift solution and release (many thanks to Oleg Zhurakousky for the speedy turnaround!) you can now bypass the cloud-agnostic abstractions by returning an APIGatewayProxyResponseEvent, which gets returned to the ALB/API GW unmodified:

    @Bean
    public Function<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> getOwnerById() {
        return (requestEvent) -> {
            LOG.info("Lambda Request for Owner");
            final Matcher matcher = ownerByIdPattern.matcher(requestEvent.getPath());
            if (matcher.matches()) {
                final Integer ownerId = Integer.valueOf(matcher.group(1));
                final Owner owner = this.clinicService.findOwnerById(ownerId);
                if (owner != null) {
                    return buildOwnerMessage(owner);
                } else return ownerNotFound();
            }
            else return ownerNotFound();
        };
    }

    private APIGatewayProxyResponseEvent buildOwnerMessage(Owner owner) {
        final Map<String, String> headers = buildDefaultHeaders();
        try {
            APIGatewayProxyResponseEvent responseEvent = new APIGatewayProxyResponseEvent()
                .withIsBase64Encoded(false)
                .withBody(objectMapper.writeValueAsString(owner))
                .withHeaders(headers)
                .withStatusCode(200);
            return responseEvent;
        } catch (JsonProcessingException e) {
            throw new RuntimeException(e);
        }
    }

    private Map<String, String> buildDefaultHeaders() {
        final Map<String, String> headers = new HashMap<>();
        headers.put("Content-Type", "application/json");
        return headers;
    }

    private APIGatewayProxyResponseEvent ownerNotFound() {
        final Map<String, String> headers = buildDefaultHeaders();
        APIGatewayProxyResponseEvent responseEvent = new APIGatewayProxyResponseEvent()
            .withIsBase64Encoded(false)
            .withHeaders(headers)
            .withBody("")
            .withStatusCode(404);
        return responseEvent;
    }

Using the APIGatewayProxyRequestEvent as a function input type does give you access to the full request, which you’ll probably need when extracting resource paths from a request, accessing specific HTTP headers, or needing fine-grained control on handling request payloads.

For local testing of the functions, there’s a few ways you can do it. Firstly you can use the spring-cloud-starter-function-web dependency, that allows you to test the lambda interfaces by calling a http endpoint with the same name as the lambda @Bean method name. For example you can curl localhost:8080/getOwnerById/3 to invoke the getOwnerById function.

Secondly, if you want to debug the full integration path that AWS Lambda hooks into, you can invoke it the same way Lambda does by creating a new instance of the FunctionInvoker class, and passing it the args you’d call it with in AWS. I’ve left an example of how to do this in src/test/java/org/springframework/samples/petclinic/lambda/LambdaConfigTests.java, which is how the function variants are tested within the spring-cloud-function library itself.

When you’ve developed your Lambda functions, tested them, and are ready to build a jar to serve in AWS Lambda, you can run the following command:

./mvnw clean package -P lambda

You’ll see in the next stage that this is run as part of the CDK deployment.

In the target folder, you’ll see alongside the existing jar that’s used in our Docker microservice container that there’s a new jar with an -aws suffix. What’s the difference between this jar and the original jar? Why do I need a separate variant? Because AWS lambda doesn’t support uber-jars, where jars are nested inside each other. To work in Lambda you have to generate a “shaded” jar, where all of the dependency classes are flattened into a single level within the jar archive. Additionally, by using the spring-boot-thin-layout plugin, you can reduce the size of the jar by removing unnecessary dependencies not required in Lambda, which can bring a small cold start performance improvement (the bigger the jar the longer it takes to load into the Lambda runtime):

<plugin>
                        <groupId>org.springframework.boot</groupId>
                        <artifactId>spring-boot-maven-plugin</artifactId>
                        <dependencies>
                            <dependency>
                                <groupId>org.springframework.boot.experimental</groupId>
                                <artifactId>spring-boot-thin-layout</artifactId>
                                <version>1.0.10.RELEASE</version>
                            </dependency>
                        </dependencies>
                    </plugin>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-shade-plugin</artifactId>
                        <configuration>
                            <createDependencyReducedPom>false</createDependencyReducedPom>
                            <shadedArtifactAttached>true</shadedArtifactAttached>
                            <shadedClassifierName>aws</shadedClassifierName>
                        </configuration>
                    </plugin>

Stage Two – Balancing requests between Lambda and ECS

Once you’re at the point where you have Lambda functions in your Java app ready to handle requests, we can configure the ALB to route requests between two target groups – one targeted at the incumbent Petclinic container, and the other at our new functions.

At this point, switch over to the 1-spring-cloud-function branch.

In here you’ll see an additional lambda-stack.ts that contains the AWS Lambda configuration, and additional changes in lb-assoc-stack.ts, which creates a target group per Lambda function and uses weighted rules to balance traffic between Lambda and ECS:

In this scenario I’m using the ALB integration with Lambda, to demonstrate that the approach is compatible with both ALB and API GW, and that both methods use the same approach from a code implementation perspective.

In lambda-stack.ts, everything is included to build the Lambda functions. In CDK you can incorporate the building of your code into the deployment of your resources, so you can ensure you’re working with the latest version of all your functions and it can all be managed within the CDK ecosystem.

I followed the these two articles to set up a Java Maven application build which delegates the building of the jar files to a Docker container.

https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

https://aws.amazon.com/blogs/opensource/packaging-and-deploying-aws-lambda-functions-written-in-java-with-aws-cloud-development-kit/

The Docker image you use and the commands you run to build your app are configurable so it’s very flexible. AWS provide some Docker images, which ensures that artefacts that are built are compatible with the AWS Lambda runtimes provided.

From the screenshot above you can see that the folder specified by the Code.fromAsset command is mounted at /asset-input, and AWS expects to extract a single archived file from the /asset-output/ folder. What you do in between is up to you. In the code above I trigger a Maven package, using my lambda profile I declared earlier (skipTests on your project at your own risk, it’s purely for demonstration purposes here!).

This is where I encountered the third gotcha… when you see CDK TypeScript compilation issues, double-check your CDK versions are aligned between CDK modules.

Now on the 1-spring-cloud-function branch, rerun the CDK deploy command:

cdk deploy --require-approval=never --all

Rerunning this command will deploy the new lambda functions, and you should see the ALB listener rules change from this:

To this:

Another gotcha to be aware of with Lambda – at the time of writing it doesn’t natively integrate with Secrets Manager, which means your secrets are set statically as environment variables, and are visible through the Lambda console. Ouch.

So at this point, we have an ALB configured to balance requests to 2 owners endpoints between the Petclinic container and new Lambda functions. Let’s test this with a GET request for Owner information:

In doing this we’re presented with a 502 Service Unavailable error. Not ideal but digging into the Lambda CloudWatch logs we can see the first challenge of deploying this lambda:

Further up the call stack we see this issue is affecting the creation of a rootRestController, which is a REST Controller within the petclinic application

The cause behind this with the Petclinic app is that there’s a RootRestController bean that configures the REST capabilities in Spring, which requires Servlet-related beans that aren’t available when you startup using the FunctionInvoker entrypoint.

To avoid this issue, we can conditionally omit classes relating to the REST endpoints from being packaged in the jar within the lambda Maven profile:

                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-compiler-plugin</artifactId>
                        <configuration>
                            <excludes>
                                <exclude>org/springframework/samples/petclinic/rest/**</exclude>
                                <exclude>org/springframework/samples/petclinic/security/**</exclude>
                            </excludes>
                            <testExcludes>
                                <exclude>org/springframework/samples/petclinic/rest/**</exclude>
                                <exclude>org/springframework/samples/petclinic/security/**</exclude>
                            </testExcludes>
                        </configuration>
                    </plugin>

I also needed to exclude classes from both the rest and security packages, as the security packages tried configuring REST security which relied on components no longer being initialised by the RootRestController behind the scenes.

This brings me on to my fifth gotcha – the Petclinic app uses custom serialisers that weren’t being picked up. I scratched my head with this one as I wasn’t able to override the ObjectMapper that was being autoconfigured by spring-cloud-function. However, the following changes fixed the automatic resolution of these serialisers:

  • Upgrade spring-cloud-function-dependencies to 3.2.0-M1 (Bugs in previous versions prevented this from working correctly)
  • Remove spring-cloud-function-compiler dependency (no longer a thing)
  • No longer need to explicitly create a handler class for lambda (i.e. class that extends SpringBootRequestHandler), just use the org.springframework.cloud.function.adapter.aws.FunctionInvoker as the handler
  • If you have more than one function (which you will do as you begin to migrate more actions/endpoints to serverless), use the SPRING_CLOUD_FUNCTION_DEFINITION environment variable to specify the bean that contains the specific function you want to invoke.

Redeploying with all of the above changes resulted in working Lambda functions. At this point we’re able to send requests and they’re picked up by either ECS or Lambda with the same result. Nice!

Stage Three – Strangulating requests from ECS

At this point, you can start to gradually phase out the use of your long-running Java microservices by porting functionality across to Lambdas. This approach allows you to build confidence in the Lambda operations by gradually weighting endpoints in favour of your Lambda function target groups.

You’ll have probably noticed by this point that Lambda variants of Spring Boot are very slow to start up, which brings me to my sixth gotcha. This may be a dealbreaker for some situations but I’d encourage you to explore the points in my conclusion below before deciding on whether to adopt (or partially adopt)

When porting subsequent features or endpoints to Lambda functions, I’d suggest routing a small percentage of your traffic to the Lambda function as a canary test. So long as the error rate is within known tolerances you can gradually route more traffic to the function, until the Lambda is serving 100% of requests. At this point you can deregister the Fargate target group from that particular ALB path condition and repeat the process for other endpoints you want to migrate.

Conclusion and next steps

This blog article aims to give you a guided walkthrough of taking an existing Spring Boot Java microservice, and by applying the Strangler Fig Pattern transition your workloads into Lambda functions in a gradual and controlled fashion. I hope you find this useful, and all feedback is greatly welcomed!

There’s further considerations required to make this production-ready, such as performance tuning the Java Lambdas to reduce the cold start time, and removing DB connection pools that are superfluous in Lambdas and will cause additional load in your database(s). Here’s some suggestions for these in the meantime, but I’m aiming to follow this up with an in-depth post to cover these in more detail:

  • Analysing the impact of cold-starts on user experience
    • When your Java Lambda initially spins up, it takes a similar amount of time to start up as its microservice counterpart. Given the lower memory and cpu typically allocated to Lambdas, this can result in a cold boot taking anything up to 60 seconds, depending on how bulky your application is.
    • However, considering your particular load profiles, so long as there’s a regular stream of requests that keep your Lambdas warm, it may be that cold-starts are rarely experienced and may be within acceptable tolerances
  • Tweaking the resource allocations of your Lambdas
    • Allocating more memory (and cpu) to your functions can significantly improve the cold start up time of Spring (see the graph below), but with a cost tradeoff. Each subsequent request becomes more expensive to serve but no faster, just to compensate for a slow cold start up. If you have a function that’s infrequently used (e.g. an internal bulk admin operation) this may be fine – for a function that’s used repeatedly throughout the day the cost can quickly be prohibitive.
  • AWS Lambda Provisioned Concurrency
    • The pricing of this can be prohibitive, but depending on the amount of concurrency required and when it’s required (e.g. 2 concurrent lambdas in business hours for a back-office API) this may be suitable
      • Continue to assess this compared with just running ECS containers instead, and weigh it up against the other benefits of Lambda (e.g. less infrastructure to manage & secure) to ensure it’s good value
  • Using Function declarations instead of @Bean definitions (which can help to improve cold start time)
  • Replacing DB Connection pooling (i.e. Hikari, Tomcat etc.) with a simple connection (that closes with the function invocation)
    • And combining this with AWS RDS Proxy to manage connection pooling at an infrastructure level
  • Disabling Hibernate schema validation (move schema divergence checking out of your lambdas)
  • Experimenting with Lambda resource allocation, to find the right balance between cold start time and cost
    • See AWS Lambda Power Tuning – an open source library which provides an automated way to profile and analyse various resource configurations of your lambda

Gotchas!

First Gotcha – Fargate 1.4.0 network changes

As of Fargate 1.4.0, Fargate communications to other AWS Services such as ECR, S3 and Secrets Manager use the same network interface that hooks into your VPC. Because of this you have to ensure Fargate has routes available to AWS Services, otherwise you’ll find that ECS is unable to pull container images or acquire secrets. You can either give the containers a public IP, route traffic to a NAT running in a public subnet, or create private VPC endpoints to each of the AWS services you require: https://stackoverflow.com/questions/61265108/aws-ecs-fargate-resourceinitializationerror-unable-to-pull-secrets-or-registry

Second Gotcha – Spring Cloud Function signature challenges

Update: following Oleg’s quick turnaround and release of spring-cloud-function 3.2.0-M1 a lot of what’s below is moot, but I’ve kept it here for reference.

There have been developments to abstract the AWS specifics away from function definitions, and have the translation happen behind the scenes by the function adapters (AWS, GCP, Azure, etc.) into a more generic Map payload, rather than to specific AWS types (APIGatewayProxyResponseEvent, for example).

I was confused reading the Spring Cloud Function documentation and didn’t find it clear how to make the transition from specific handlers to a generic one (FunctionInvoker). In fact, if you try to follow one of the many guides online (including the latest Spring docs) and use Function<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> the adapter ends up wrapping it in another APIGatewayProxyResponseEvent which is malformed. AWSLambdaUtils tries to be helpful here but with the confusing documentation and behaviour, it just got in my way.

Feels like the approach is to abstract away from cloud implementation details in the spring-cloud-function core, and push all of that into the adapters. Problem with that is the adapters then have to map from the generic Message interface into an AWS response (APIGatewayProxyResponseEvent), which is great as it abstracts the cloud-platform implementation detail from you, but if you want that level of control there’s no way to override this

The way I got clarity on the recommended approach was to ignore all the docs and examples, and go with the unit tests in the spring-cloud-function-adapter-aws repo. These demo the latest compatible ways of declaring functions.

I experimented with a few styles of function signatures to see which works…

Doesn’t work:

@Bean
public Supplier<Message<APIGatewayProxyResponseEvent>> getAllOwners() {

Doesn’t work (didn’t correctly set the Content-Type header in the HTTP response)

@Bean
public Function<APIGatewayProxyRequestEvent, Collection<Owner>> getAllOwners() 

You have to use the Message construct if you want control over the Content-Type header, otherwise the Message construct provides a contentType header which is incorrect. This then doesn’t get picked up by the AWS adapter (resulting in a malformed response with a Content-Type of “application/octet-stream” and another “contentType” header of “application/json”, which doesn’t get picked up).

Works:

@Bean
public Supplier<Message<Collection<Owner>>> getAllOwners() {
final Map<String, Object> headers = new HashMap<>();
headers.put("Content-Type", "application/json");
...
return new GenericMessage<>(allOwners, headers);

Also works:

@Bean
public Function<APIGatewayProxyRequestEvent, Message<Collection<Owner>>> getAllOwners() {
final Map<String, Object> headers = new HashMap<>();
headers.put("Content-Type", "application/json");
...
return new GenericMessage<>(allOwners, headers);

There are some limitations with using the Message construct – you’re not allowed to use null payloads when using GenericMessage, which makes it difficult to handle 404 situations.

Closest I could get was returning a Message<String> response, serialising the Owner object myself into a string using the ObjectMapper before returning it wrapped in a Message. That way when I needed to handle a 404 I just returned an empty string. Not pretty (or consistent with the ECS service) though:

All of the above, however, is no longer an issue – as explained in the main article thread, you can now return an APIGatewayProxyResponseEvent and have full control over the payload that’s returned to API GW/ALB.

Third Gotcha – CDK library module version mismatches

NPM dependencies are fun… between working on the core infrastructure code and adding the Lambda functions, CDK had jumped from 1.109.0 to 1.114.0. When I introduced a new component of CDK, it installed the latest version, which then gave me confusing type incompatibility errors between LambdaTarget and IApplicationLoadBalancerTarget. Aligning all my CDK dependencies in package.json (i.e. "^1.109.0") and running an npm update brought everything back in sync.

Fourth Gotcha – AWS Lambda lacking integration with Secrets Manager

There’s a gotcha with Lambda and secrets currently – it can’t be done natively. There is currently no way to natively pass secrets into a lambda like you can with ECS (see this CDK issue). You have to use dynamic references in CloudFormation, that are injected into your lambda configuration at deploy time. This, however, is far from ideal, as it exposes the secret values within the ECS console:

One way you can mitigate this is to make the lambda responsible for obtaining its own secrets, using the IAM role to restrict what secrets it’s allowed to acquire. You can do this by hooking into Spring’s startup lifecycle, by adding an event handler that can acquire the secrets, and set them in the application environment context before the database initialisation occurs.

Fifth Gotcha – Custom Serialisers not loaded in function

Those curious people will spot that if you explore the shaded lambda jar that’s built, although the *Controller classes have been omitted the (De)serialisers are still there. That’s because the compiler plugin has detected dependencies of them in other classes (as opposed to the Controllers which are runtime dependencies and so wouldn’t get picked up). This is fine…

Next challenge… the REST endpoints are using these custom (De)serialisers to massage the payloads and avoid a recursion loop between the entities in the object graph:

At this point I realised maybe Baeldung’s post is a little out of date (even though I’ve bumped up to the latest versions of spring-cloud-function dependencies). So let’s see if I can find a more recent post and port my code changes to that.

But what all of this lacks is an example of how to decompose an existing microservice into one or more serverless functions.

By following the Spring documentation and upgrading to the latest version I was able to resolve the issue with the custom Serialisers not being utilised OOTB.

Sixth Gotcha – Cold start ups are sloooooooooo…w

Spring Boot can be notoriously slow at starting up, depending on the size of your applications and the amount of autoconfiguring it has to do. When running in AWS Lambda it still has to go through the same bootstrapping process, but given the typically lower resources allocated to functions (MBs as opposed to GBs) this can draw out the startup of a lambda to upwards of a minute.

This doesn’t affect every request – once a Lambda has warmed up its instance becomes cached and stays in a warm (but paused state) until the next request comes in, or AWS Lambda cleans up infrequently used functions. This means subsequent function requests typically respond as quickly as their microservice counterparts. But however any requests that don’t hit a warm function, or any concurrent requests that trigger a new Lambda function will incur this cold start penalty.

This may be acceptable for some workloads, for example requests that are triggered in a fire and forget fashion, infrequent (but time consuming) operations, or endpoints that have a steady stream of requests with little or no concurrent requests. I would advise you as part of prototyping this approach to look at the recommendations in the Conclusion section, and using the Fig Strangler Pattern approach documented above trial it out on a subset of requests to understand how it’d affect your workloads.

My expedition from AWS to GCP (with Terraform)

Follow along with my GitHub repo for this blog: https://github.com/foyst/gcp-terraform-quickstart/

TL;DR – Here’s the headline differences that might be useful for those adopting GCP from an AWS background:

  • Difference #1 In AWS, projects or systems are separated using AWS Accounts, where as GCP has the built-in concept of “Projects”.
  • Difference #2 – The GCP Console is always at global level – no need to switch between regions.
  • Difference #3auto_create_subnetworks = false, otherwise you have a subnet created for every availability zone by default.
  • Difference #4 – You have to enable services before you can use them.
  • Difference #5 – Firewall rules apply globally to the VPC, not specific resources.
  • Difference #6 – You don’t define a subnet as public.
  • Difference #7 – An Internet Gateway exists by default, but you have to explicitly create a NAT gateway and router.
  • Difference #8 – To define a Load Balancer in GCP there’s a number of concepts that fit together: Front-end config, Backend Services, Instance Groups, Health Checks and Firewall config.
  • Difference #9 – Load Balancer resources can be regional or global interchangeably.

As someone who has spent a few years working with AWS I’ve decided to dip my toe into GCP, and I thought it’d be useful to chronicle my experiences getting started with it so others may benefit.

I prefer to learn by doing, so I set myself the task of creating a simple reference architecture for load balancing web applications. The diagram below outlines this architecture:

High-level diagram of demo architecture

The goal is to deploy a load balancer than can route traffic to a number of instances running an nginx server. The nginx server itself is the nginxdemos/hello Docker image, which displays a simple webpage with a few details about the instance:

Overview page presented by the Nginx hello Docker image

We’ll use the status page above to demonstrate load balancing, as the server address and name change when the load balancer round-robins our requests.

Creating a Google Cloud Account

Creating an account is relatively straight forward (if you already have a Google Account – if not you’ll need to set one up). Head over to https://cloud.google.com/ and click on the button in the top right of the page.

Once you’ve got setup with a Google Cloud Account, you’ll be presented with a screen similar to this:

Google Cloud Platform Overview Page

From here, you can start creating resources.

Creating a Project

Difference #1 In AWS, projects or systems are separated using AWS Accounts, where as GCP has the built-in concept of “Projects”. This allows you to stay logged in as a user and easily switch between projects. Having this logical grouping also makes it easy to terminate everything running under it when you’re done by simply deleting the project.

To create a new project, simply navigate to the main burger menu on the left-hand side, select “IAM & Admin” and then create a new project either through the “Manage Resources” or “Create a Project” options. Your Project consists of a name, an internal identifier (which can be overridden from what’s generated but can’t be changed later) and a location. Create a project with a meaningful name (I went with “Quickstart”) and choose “No Organisation” as the location.

Difference #2 – The GCP Console is always at global level – no need to switch between regions.

Setting up Billing

In order to create any resources in GCP you need to setup a Billing Account, which is as simple as providing the usual debit/credit card details such as your name, address, and card numbers. You can do this by opening the burger menu and selecting “Billing”. You’ll also be prompted with this action anytime you try to create a resource until Billing is configured..

Like AWS, GCP offers a free tier which you can use to follow this tutorial. During my experiments building a solution and then translating it into Terraform, it cost me a grand total of £1. However this was with me tearing down all resources between sessions which I’d strongly encourage you do too (and is simple enough if you’re following along with Terraform by running a terraform destroy.)

To avoid any horror stories at the end of the month, create a budget and set an alert on it to warn you when resources are bleeding you dry. Because I’m only spinning up a few resources (which will be short-lived) I’ve set my budget to £5.

To create a budget, open the burger menu and select “Billing”, then select “Budgets & alerts” from the left hand menu.

You can apply a budget to either a single project or across your whole estate, and you can also account for discounts and promotions too. With a budget you’re able to set the following:

  • Scope – What projects and resources you’d like to track in your budget
  • Amount – The spending cap you want to apply to your budget. This can be actual or forecasted (looks new, predicts if your budget is likely to be exceeded in the budget period given your resource utilisation to date)
  • Actions – You can trigger email alerts for billing admins when your budget reaches certain thresholds (i.e. 50%, 70%, 100%). You can also publish events using a pub/sub topic to facilitate automated responses to notifications too.
Overview of Budget and Alerting

Setting up GCloud CLI

https://cloud.google.com/sdk/docs/quickstart

Run the gcloud init command to configure the CLI to connect to your specific Google Account and Project. If you’re like me, you may have both business and personal accounts and projects on the same machine. If so you may need to run gcloud auth login first to switch between your accounts.

The GCloud CLI stores multiple accounts and projects as “configurations”, similar to AWS’ concept of “credentials” and “profiles”.

Setting up Terraform

I chose Terraform because it can be used with both AWS and GCP (and Azure for that matter) reducing the learning curve involved in adopting GCP.

For production use cases just like with AWS, you’re best off using a service role that adheres to security best practices (such as principle of least privilege and temporary privilege escalation) for executing automated deployments. But for getting started we’ll used the ‘gcloud auth’ approach.

We’ll be using the TF ‘local’ backend for storing state – when working in teams you can store state in cloud storage to manage concurrent changes where multiple users or tools may interact with a set of resources.

To get started, create a *.tf file in an empty directory (can be called anything, I went with main.tf) and add the following snippets to it:

variable "project_id" {
    type = string
}

locals {
  project_id = var.project_id
}

provider "google" {
  project = local.project_id
  region  = "us-central1"
  zone    = "us-central1-b"
}

The above sets up variables via the terminal (the project_id) and constants inline (the locals declaration), while the provider block configures Terraform with details about which cloud platform you’re using and details about the Project. In my sample code I went with us-central1 region – no reason behind this other than it’s the default.

At this point you’re ready to run terraform init. When you do this it validates and downloads the Google provider libraries needed to compile and interact with the GCP APIs. If you run terraform init before you add the above you’ll see a message warning you that you’ve initialised an empty directory, which means it won’t do anything.

From here, you can start adding resources to the file. Below I’ve linked to some getting started tutorials and useful references for working with the Google provider SDK which I found useful:

Useful links

https://learn.hashicorp.com/tutorials/terraform/install-cli?in=terraform/aws-get-started

https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/getting_started

https://www.terraform.io/docs/cli/init/index.html

https://registry.terraform.io/providers/hashicorp/google/latest/docs

https://stackoverflow.com/a/59056580 – automatically enable APIs (e.g. Compute Engine)

https://github.com/danielpoonwj/gcp-iac-demo

Creating a Network and VM

The first resource we’ll create is a google_compute_network, which is where our compute resources will be deployed.

resource "google_compute_network" "vpc_network" {
  name                    = "terraform-network"
  auto_create_subnetworks = false
  delete_default_routes_on_create = true
  depends_on = [
    google_project_service.compute_service
  ]
}

resource "google_compute_subnetwork" "private_network" {
  name          = "private-network"
  ip_cidr_range = "10.2.0.0/16"
  region        = "us-central1"
  network       = google_compute_network.vpc_network.self_link
}

resource "google_compute_route" "private_network_internet_route" {
  name             = "private-network-internet"
  dest_range       = "0.0.0.0/0"
  network          = google_compute_network.vpc_network.self_link
  next_hop_gateway = "default-internet-gateway"
  priority    = 100
}

There’s a few points worth mentioning about the config above:

  • Difference #3auto_create_subnetworks = false – In the GCP Console you don’t have separate consoles for each of the regions, resources from all regions are displayed on the same page. Important part here though – if you don’t explicitly override this flag then GCP will create a subnetwork for every AZ across all regions, which results in 20+ subnets being created. This was a little OTT for my needs and I also wanted to keep the deployment as similar to the AWS approach as possible, although this may be counter to GCP idioms (one for me to learn more about…)
  • delete_default_routes_on_create = true – By default GCP will create a default route to 0.0.0.0/0 on your network, effectively providing internet routing for all VMs. This may not be preferable as you may want more control over how this is configured, so I disabled this.
  • depends_on – Most of the time Terraform can identify the dependencies between resources, and initialise them in that order. Sometimes it needs a little guidance, and in this situation it was trying to create a network before the Compute Service (mentioned later…) was fully initialised. Adding this attribute prevents race conditions between resource creation.

(Later on I also had to apply the depends_on block to my google_compute_health_check, as TF also attempted to create this in parallel causing more race conditions.

Once you’ve got a network and subnetwork created, we can go ahead and configure a VM:

resource "google_compute_instance" "vm_instance" {
  name         = "nginx-instance"
  machine_type = "f1-micro"

  tags = ["nginx-instance"]

  boot_disk {
    initialize_params {
      image = "centos-7-v20210420"
    }
  }

  metadata_startup_script = <<EOT
curl -fsSL https://get.docker.com -o get-docker.sh && 
sudo sh get-docker.sh && 
sudo service docker start && 
docker run -p 8080:80 -d nginxdemos/hello
EOT

  network_interface {
    network = google_compute_network.vpc_network.self_link
    subnetwork = google_compute_subnetwork.private_network.self_link

    access_config {
      network_tier = "STANDARD"
    }
  }
}

Hopefully most of the above looks fairly similar to its AWS counterpart. The access_config.network_tier property is the main difference – Google has two different tiers (STANDARD and PREMIUM), of which the latter provides performance benefits by routing traffic through Google networks (instead of public internet) whenever it can at an additional cost.

The metadata_startup_script key is a shortcut key TF provides to configure scripts to execute when a VM starts up (similar to the UserData key in AWS). In this case, I used it to install Docker and start a instance of the nginxdemos/hello Docker Image (albeit in a slightly crude manner).

Deploying the Resources

At this point, we’re able to run our first terraform apply.

Difference #4 – You have to enable services before you can use them.

When you run a terraform apply, you may find you get an error stating that the Compute Engine API has not been used in the project before or has been disabled. When you use a service for the first time in a project you have to enable it. This can be done by clicking the “Enable” button found in the service’s landing page in the GCP web console, or you can enable it in Terraform like so:

resource "google_project_service" "compute_service" {
  project = local.project_id
  service = "compute.googleapis.com"
}

Once Terraform has successfully applied your infrastructure, you’ll have a newly created VPC and VM running within it. The first thing you might want to try is SSH into it, however you’ll probably find that the connection hangs and you aren’t able to connect.

You can triage the issue by opening the Compute Engine -> VM instances, click the kebab icon (TIL this is called a kebab icon!) and select “view network details”. Under “Ingress Analysis” in “Network Analysis” you can see that there’s no firewall rules configured, and the default is to implicitly deny traffic:

So next up, I’m going to create a firewall rule to allow inbound internet traffic into my instance. I created the following rule that allows connections from anywhere to target instances tagged with nginx-instance:

resource "google_compute_firewall" "public_ssh" {
  name    = "public-ssh"
  network = google_compute_network.vpc_network.self_link

  allow {
    protocol = "tcp"
    ports    = ["22"]
  }

  direction = "INGRESS"
  source_ranges = ["0.0.0.0/0"]
  target_tags = ["nginx-instance"]
}

Difference #5 – Firewall rules apply globally to the VPC, not specific resources.

In GCP, there isn’t an equivalent of a Security Group, which in AWS-world controls access to the resources it’s associated with. In GCP, firewall rules are associated to the network, and are applied on resources by making use of network tags.

Similar to AWS, you can tag your resources to help you manage and organise them with the use of labels. Separately though, the network tag mechanism is what’s used to apply firewall rules. In the code snippet above, you specify the rules you wish to apply, and also which network tags (or ranges if you’d prefer) to apply the rule to.

Difference #6 – You don’t define a subnet as public.

I’ll leave this article here, which for me nicely summarised the differences between AWS and GCP networking:

https://codeburst.io/vpc-networking-gcp-v-s-aws-77a80bc7cfe2

The key takeaways for me are:

  • Networking in GCP is flat, compared to the hierarchical approach taken by AWS.
  • Routing tables and firewall rules are associated directly with the VPC, not subnets or resources
  • VPCs are global concepts (not regional) and traffic automatically flows across regions
  • Subnets are regional concepts (not AZ-bound) and traffic automatically flows across AZs also
  • “GCP: Instances are made public by specifically enabling them with an external IP address; the ‘Default route to the internet‘ automatically routes Internet-bound traffic to the Internet gateway or NAT gateway (if it exists) based on the existence of the external IP address”

For example, you don’t directly assign a firewall rule to instance(s), but use network tags to apply a firewall rule to them. Similarly for routes you don’t have routing tables that you assign to subnets – you simply define a VPC-level route, the next hop traffic should take, and optionally network tags to specify which resources to apply the route to.

Network analysis showing SSH public access enabled

Ok great, so public access is now permitted and we’ve got an instance that we can SSH to and see an Nginx container running. But going forwards I want to secure this instance behind a load balancer, with no public access.

Nginx container running in VM

So how do we make instances private in TF? Simply omit the access_config element from your google_compute_instance resource if you don’t want a public IP to be assigned.

There appears to be some confusion online on what the “Private Google Access” feature does, specifically its influence on whether an instance is private or public-facing. According to the docs, instances without a public IP can only communicate with other instances in the network. This toggle allows these private instances to communicate with Google APIs whilst remaining private. Some articles allege that it’s this toggle which makes your instance public or private, although from what I’ve read I think that’s inaccurate.

Now, when I made my instance private it introduced a new problem: It broke my Docker bootstrapping, because the instance no longer has a route to the internet. Time to introduce a NAT gateway…

Difference #7 – An Internet Gateway exists by default, but you have to explicitly create a NAT gateway and router.

Some areas of the GCP documentation state that traffic will automatically flow to either the default internet gateway or a NAT gateway based on the presence of an external IP address attached to an instance. This led me to believe that a NAT gateway was also provided by default, although this turned out not to be the case when I removed the external IPs from my Nginx instances. When I did this the instances were unable to connect out to download Docker or the Nginx Docker image.

I added the following to my Terraform which re-enabled outbound connectivity, whilst keeping the instances private:

resource "google_compute_router" "router" {
  name    = "quickstart-router"
  network = google_compute_network.vpc_network.self_link
}

resource "google_compute_router_nat" "nat" {
  name                               = "quickstart-router-nat"
  router                             = google_compute_router.router.name
  region                             = google_compute_router.router.region
  nat_ip_allocate_option             = "AUTO_ONLY"
  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}

Creating a Load Balancer

Creating a Load Balancer is interesting. Google manages a lot of this at a global level and provides different “flavours” of load balancer applicable for different use cases:

Table representing different types of Load Balancer available in GCP

Difference #8 – To define a Load Balancer in GCP there’s a number of concepts that fit together: Front-end config, Backend Services (or Backend Buckets), Instance Groups, Health Checks and Firewall config.

For my HTTP-based service, the flow of incoming requests looks like:

Diagram depicting flow of requests through Load Balancer resources

Load Balancers aren’t a tangible resource in TF, rather the result of configuring and connecting the previously mentioned resource types. The entry point for a Load Balancer appears to be the ‘Forwarding Rule’, specifically compute_forwarding_rule.

To create a Regional-level Load Balancer using the Standard networking tier, I used the following TF:

resource "google_compute_instance_group" "webservers" {
  name        = "terraform-webservers"
  description = "Terraform test instance group"

  instances = [
    google_compute_instance.vm_instance.self_link,
    google_compute_instance.vm_instance_2.self_link
  ]

  named_port {
    name = "http"
    port = "8080"
  }
}

# Global health check
resource "google_compute_health_check" "webservers-health-check" {
  name        = "webservers-health-check"
  description = "Health check via tcp"

  timeout_sec         = 5
  check_interval_sec  = 10
  healthy_threshold   = 3
  unhealthy_threshold = 2

  tcp_health_check {
    port_name          = "http"
  }

  depends_on = [
    google_project_service.compute_service
  ]
}

# Global backend service
resource "google_compute_backend_service" "webservers-backend-service" {

  name                            = "webservers-backend-service"
  timeout_sec                     = 30
  connection_draining_timeout_sec = 10
  load_balancing_scheme = "EXTERNAL"
  protocol = "HTTP"
  port_name = "http"
  health_checks = [google_compute_health_check.webservers-health-check.self_link]

  backend {
    group = google_compute_instance_group.webservers.self_link
    balancing_mode = "UTILIZATION"
  }
}

resource "google_compute_url_map" "default" {

  name            = "website-map"
  default_service = google_compute_backend_service.webservers-backend-service.self_link
}

# Global http proxy
resource "google_compute_target_http_proxy" "default" {

  name    = "website-proxy"
  url_map = google_compute_url_map.default.id
}

# Regional forwarding rule
resource "google_compute_forwarding_rule" "webservers-loadbalancer" {
  name                  = "website-forwarding-rule"
  ip_protocol           = "TCP"
  port_range            = 80
  load_balancing_scheme = "EXTERNAL"
  network_tier          = "STANDARD"
  target                = google_compute_target_http_proxy.default.id
}

resource "google_compute_firewall" "load_balancer_inbound" {
  name    = "nginx-load-balancer"
  network = google_compute_network.vpc_network.self_link

  allow {
    protocol = "tcp"
    ports    = ["8080"]
  }

  direction = "INGRESS"
  source_ranges = ["130.211.0.0/22", "35.191.0.0/16"]
  target_tags = ["nginx-instance"]
}

Difference #9 – Load Balancer resources can be regional or global interchangeably

Depending on the network tier and level of availability you’re architecting for, you can have regional or global Load Balancers – the latter deploys your Load Balancer across all regions and utilises Google networks as much as possible to improve throughput.

However, this confused me when deciding that I only wanted a regional Load Balancer utilising the Standard network tier. According to the GCP docs, Backend Services used by HTTP(S) Load Balancing are always global, but to use the Standard network tier you have to create a regional Forwarding Rule.

This confusion was made more challenging for me by the inconsistent use of global and regional discriminators in TF resource types, which made it a struggle to hook up the resources required to create a Load Balancer. The fact that you create a normal url map and http target proxy, but then attach that to a google_compute_global_forwarding_rule, confused me somewhat!

The name of the Load Balancer appears to come from the google_compute_url_map resource… I’m not quite sure why that is? Maybe because it’s the first LB-related resource that’s created in the chain?

The GCP Console for Load Balancers can be confusing, because when you first open it after deploying the Terraform only a subset of the resources we define are visible:

GCP Console showing Load Balancer Basic view

However, by selecting the “advanced menu” link at the bottom of the page, you get an exploded view of the Load Balancer configuration:

GCP Console showing Load Balancer Advanced view

Even in the Advanced view however, you can’t view URL maps directly (referenced by target proxies). URL maps are what glue the HTTP Target Proxy and Backend Service(s) together, and it’s here where you specify any HTTP routing you’d like to apply (similar to AWS ALB Listener Rules, that map a Rule to a Target Group). You can view existing and attached URL maps by opening the target proxy they’re attached to and following the link that way.

An Instance Group is similar to an Auto Scaling Group in AWS, except you can also have Unmanaged Instance Groups which are a manually maintained group of potentially heterogeneous instances.

I used an Unmanaged Instance Group in this scenario, which combined with the Backend Service is similar to an unmanaged/manually maintained Target Group in AWS terms.

Although Health Checks are related to Instance Groups within the GCP console, they’re not directly linked. This means the service that uses the Instance Group (in our case our Load Balancer) can separately choose which Health Check is most appropriate for its use case.

External HTTP Load Balancers provided by GCP don’t run within your VPC – they’re provided as part of a managed service. Because of this and as per the Load Balancing documentation, you have to create a firewall rule that allows traffic from the following source IP ranges (managed Load Balancer) to your private VMs:

130.211.0.0/22
35.191.0.0/16

You can use the instance network tags we set up earlier to restrict where traffic from the load balancer is allowed to go to.

An awkward limitation of the Load Balancer advanced section of the web console I found was you can’t create all the configuration from here – you have to create a Load Balancer first using the basic wizard, and only then can you edit the advanced elements.

Scaling out

So at this point I have traffic flowing via a Load Balancer to my single instance which is pretty neat, but how can I demonstrate balancing traffic between two instances? Add another instance in TF and hook it up to our webservers instance group:

resource "google_compute_instance" "vm_instance_2" {
  name         = "nginx-instance-2"
  machine_type = "f1-micro"

  ...copy/paste of existing instance config
}

resource "google_compute_instance_group" "webservers" {
  name        = "terraform-webservers"
  description = "Terraform test instance group"

  instances = [
    google_compute_instance.vm_instance.self_link,
    google_compute_instance.vm_instance_2.self_link
  ]

  named_port {
    name = "http"
    port = "8080"
  }
}

And voila! We have a working example that can load balance requests between two instances… result!

Gotchas

Working with GCP and Terraform, there were a couple of gotchas that caught me out.

  • Terraform defaults a lot of the resource parameters if you don’t specify them. Although I imagine a lot of these are sensible defaults (and I suspect TF takes a similar approach with AWS), if you’re not aware of what they’re defaulted to they very quickly conflict with those settings you do specify, and personally it took me a while to identify what parameters were conflicting with each other. GCP wasn’t overly helpful in providing guidance on triaging the conflicts that were reported back through Terraform.
  • Also it appears that some parameters have different defaults between their regional and global resource counterparts, so beware when switching between the two you don’t unintentionally introduce unexpected config conflicts.
  • The field names aren’t always consistent between the web console and TF so something to watch out for. For example in a backend service the console refers to “Named port”, however in TF it’s port_name.
  • The last one is for me to work on (and find the right tooling), but a lack of compile-time checking (compared to something like CDK) which slowed me down. I had to deploy to find out if I was incorrectly mixing regional resources with global ones, which resulted in a longer feedback loop.

Final Thoughts

In conclusion, my first impressions of GCP is that it’s not too dissimilar to offerings provided by AWS once you understand the subtle differences in behaviour and terminology. The UI feels more responsive and looks slicker in my opinion, especially when you compare it to the current mixture of old and new UIs strewed across the AWS services.

Creating resources in GCP with TF was straightforward enough. The fact that VPCs are created at a global level and all resources are displayed at a global level allows you to view your whole estate from one view which I like. Just need to be mindful of regional vs global resources, specifically what permutations of these you can use and pros and cons of each.

How to improve this setup

  • Replace the Unmanaged Instance Group with a managed one. This would be similar to using an Auto Scaling Group in AWS, which could elastically scale instances and create new ones in the event of instance failures. For the purposes of this I wanted to understand all of the pieces and how they fit together, but it wouldn’t be too difficult to convert what’s here to use a Managed Instance Group instead.
  • In GCP you can use “Container-optimised” OS images that start a specified Docker image when the VM boots up. This would remove the need of the metadata_startup_script script, which would save a good few minutes on provisioning new VMs. However, I’d probably recommend something a bit more comprehensive for managing containerised applications, such as Google Kubernetes Engine (GKE)
  • If the containerisation route isn’t an option, you could consider ways to provision your VMs in a repeatable and idempotent way. For example. employing the likes of Ansible or Chef to do this provisioning at runtime, or build an OS image with something like Packer to speed up the deployment.

Learning next steps

Now I’ve gained a basic understanding of the GCP platform and how to deploy resources with Terraform, my next explorations will be into

  • GKE – how to automate provisioning of Docker Containers using a combination of Terraform to provision GKE, and then using Kubernetes to run a fleet of containers. This would be similar to the use of either AWS ECS or EKS
  • Serverless services – Now that I understand more about the lower-level networking concepts, I’ll look to explore and compare GCPs offerings to the likes of AWS Lambda, Step Function, SNS, SQS etc.

Team Development Roadmaps

As engineers we spend so much time focusing on the client’s projects, developing Impact Maps, Story Maps, Stakeholder Maps… all sorts of maps to confirm our approach and provide shared understanding between everyone and a strategy for solving the problem. What often doesn’t get considered is how the members of the team can use the opportunities within a project to improve and learn. Making sure that your team feel challenged and invested whilst providing a safe space helps them to feel valued and be at their natural best. At Infinity Works we strive to be “Best for Client, Best for Colleague”, so with that in mind I had an idea around Team Development Roadmaps.

What’s one of those then? I see it as a personal progression plan that ties into the project and its milestones in a way that allows you to link both personal and project targets together. They’re often seen as separate, but I believe providing opportunities within projects to allow team members to develop and experiment is a great way to keep them motivated. I did this recently with a new team of mine and I’ve outlined my approach here, hopefully you might find it useful.

First off, we did an initial brainstorming session where I had the team spend 10 minutes jotting down all of their goals. We used Miro (my new favourite online collaboration tool) to facilitate the session. We then spent the next 30 minutes walking through each of the goals, allowing the team to talk about what’s important to them.

If possible, have this conversation right at the start of a project, or as the team’s entering the forming phase – it shouldn’t be an afterthought. Having an open and honest conversation tunes everyone in to each other’s goals and aspirations. We actually unearthed some common threads between team members, and opportunities to support each other in achieving those goals which we wouldn’t have known about otherwise.

When we had a high level project roadmap and an understanding of what technologies, features and engagements we’d need with various stakeholders, we then revisited our wall of personal goals. We had another session to line up personal goals with project milestones, to identify opportunities in the project that can contribute towards them:

There will often be goals or activities that for whatever reason won’t be achievable within the project itself, such as long-running themes or external activities like engaging in communities of practise. It’s still worth capturing these as the team can still discuss ways to accommodate these too, for example agreeing on lunch breaks long enough to allow people to attend other meetups, or factoring in time on activities outside of the project into sprint commitments,

Depending on the stage of your project and level of detail known, the output from this session may be as high-level as the diagram above, or it could be as detailed as a Story Map that overlays personal goals on top.

Try to align the two roadmaps as closely as possible to increase the chance of sticking to them. Build Story Maps around your team’s goals, attribute them to milestones and factor them in to your usual team ceremonies. When walking through user stories in planning, identify how they can contribute to individuals’ personal goals and organise the team in the best way to balance both project and personal goals.

Finally, we captured the goals and associated actions in a table. This provided us with a targeted list of actions to focus on, along with owners and a review date so we can regularly review progress and check actions and goals off as they’re completed.

I believe doing this will pay dividends in many ways. By investing in your team this way they’ll feel valued and motivated, and it can help to build camaraderie as they work together to help achieve each other’s goals. I’m keen to hear feedback on this approach, what works well and not so well, and any other suggestions for building teams. Hopefully this gives you food for thought!

To be successful, you’ve got to take the lead

I have been doing a lot of reflection recently. I want to become a great leader – to be a servant to others, to drive myself and those in my charge towards a Just Cause that we all aspire towards. In doing this I’ve been figuring out what to focus on in order to help myself and others achieve greatness in everything we do.

But if you want to be successful in leading others, you first need to be able to lead yourself. What does that even mean?

Here are some ideas I’ve read about recently which really resonate with me, and hopefully may be of use to you too.

Self Leadership

For me, this is about taking charge of your career, getting what you need in order to succeed. It’s about taking ownership and jumping to the challenges that present themselves, no matter how big or small. No one else is going to do the work for you, and you shouldn’t rely on anyone doing all the heavy lifting either.

Yeah sure this may mean taking on the responsibility and accountability, and potentially being exposed to failure if it doesn’t go to plan. Hopefully you work for a company that builds psychological safety and trust into their culture, where taking ownership and stretching yourself should be actively encouraged and not something to be fearful of. If that isn’t the case then look for opportunities to nurture that mindset within your company, maybe start with developing that trust and safety within your team.

It’s important how you ask for that support though – show others you’re enthusiastic about progressing by proactively identifying what you need. Be concise about what you need and ask for it – people will help. You’re more likely to get a positive response this way, as opposed to throwing in the towel with “someone else would be better suited for this work” or “I don’t know what to do”. This could be anything from “I need advice on diagnosing this problem” or “I need support analysing this complex customer requirement” to “I really want to take point on building the new solution, but I need your support with how to design it. Could you help?”.

Finding your Just Cause

Simon Sinek lists this as one of 5 pillars to having an Infinite Mindset. It is a future state or ideal that you are willing to sacrifice yourself for, by means of dedicating your time, energy, career or life towards pursuing. A Just Cause in itself is infinite – a never ending quest for a better future that is so profound and inspiring that you’re constantly energised and passionate about working towards it.

In this self actualisation, money and job security are no longer the motivators – you’re working for a higher cause. Having a Just Cause encourages a “Service Oriented” mindset – being in service to others. From my experience, if you’re able to prioritise your Just Cause and being in service to others, then your personal needs are also rewarded as a result. I do what I can to make an impact and help others, in the hope that the company I’m working for recognises and rewards the right behaviours that contribute to a great collaborative culture.

Motivation and self leadership are significantly easier if you have a Just Cause and your job and company aligns with it. It makes getting out of bed that much easier and turning up to work enjoyable and less like “work”.

I’m still trying to articulate my Just Cause, but before I can do that I need to discover my “why” – my origin story, why I am who I am and do what I do. All I know right now is that I love fixing problems and building solutions, working, collaborating and coaching fellow colleagues and doing what I can to help them develop. I aim to make a positive impact in everything I work on, and if I get to earn a bit of dollar whilst doing that then all the better.

Having a Worthy Rival

I don’t mean an adversary or someone you despise or detest, it could be someone you work with or a close friend. I’m on about that someone who forces us to take stock and push ourselves to do better, who excels in areas we want to develop as well. We all have one of those, whether it’s the tech lead in your team, the confident public speaker talking at events, or the person who just oozes charisma while engaging with stakeholders.

It shouldn’t be someone you’re focused on beating however as that invites a finite mindset, which drives you to focus against them and to do what you can to surpass them. It also shouldn’t be someone you can get the better of to bolster your own confidence. The effort it takes to constantly be No.1 (whatever that means) in a contest which has no end is draining and masks the real opportunity that’s available to you.

Don’t envy them… but rather embrace them. Accept humility and acknowledge that you can learn a thing or two from them. Get them to mentor you or study from afar, learn from their strengths and use them to bolster your weaknesses. Find someone who inspires you, figure out what it is they do so well, and challenge yourself to do better.

Challenge Assumed Constraints

I recently attended a coaching course led by two excellent tutors, Andy and Sean from Erskine Nash, where they introduced me to a book called “Self Leadership and the One Minute Manager”. It describes the idea of challenging assumed constraints, also known as Elephant thinking – becoming so acclimbatised to a constraint that you no longer challenge it, it limits your potential experience.

When was the last time you looked at a job, or a project, or an opportunity to broaden your horizons, and you’ve shied away from it because you fear you’d be lacking the skills or experience needed? Or the last time you worked with a client who “just didn’t get Agile”, or a colleague who didn’t understand your point of view? This is the perfect opportunity to challenge those assumed constraints, to question those limitations that you’ve defined and see what other options you have available to you.

One way of doing this is to imagine having this discussion with a close friend or family member, where they’d ask you “what are you going to do about it?”. There’s just something impactful and thought provoking when it comes from someone you have a close relationship with that causes you to really challenge your self-imposed restraints.

Embracing your Points of Power

As engineers we can be both incredibly proud and stubborn, wanting to prove to everyone and ourselves that we can solve any challenge. Whilst perseverance and determination are great qualities to have, showing vulnerability and humility and asking for support is not a weakness… it is a sign of strength.

In the “One Minute Manager” they refer to this as your “Points of Power” – different sources of power which you can draw from in a situation in order to make things happen. There are 5 different Points of Power which I’ll talk about briefly below.

Task & Knowledge Power

As engineers we’re expected to have task and knowledge power – awareness and understanding of the problems and challenges we face and how to solve them. But if you go into every situation expecting yourself to be able to solve every problem you’ll burn yourself out very quickly. To me, this is a contributor of Imposter Syndrome – that feeling you’re not good enough and sooner or later you’ll be found out. By embracing humility and leaning on the task & knowledge power of others, we can tackle any challenge.

For example, this power may refer to skills and expertise with a particular technology, working in a specific industry or with a client’s business domain. Being able to gauge your task and knowledge power against each of these allows you to understand where your strengths are, and when you need to seek the support of others.

Personal Power

Being technically excellent is a great quality for an engineer to have, but being personable and having people skills is what makes the difference between a great engineer and a great consultant. It’s not always easy building your personal power depending on where your comfort zone is – I personally can be quite introverted in some situations, and sometimes I love nothing more than sticking my earphones in and cracking on.

There are many ways you can develop Personal Power: building relatedness inside and outside your team (ice breakers for example, are a great technique when forming a new team); offering your support whenever possible (even if you’re not an expert in the task at hand); developing your active listening skills. Charisma and personality help, but you will be respected by your peers for just helping out when you can.

Relationship Power

This is an extension of utilising your personal power to build your relationships and connections. In the film “2 Guns”, whenever Denzel Washington’s character needs something, he “knows a guy”. I try to think like this too – I won’t always have all the answers, but I know enough people who have the relevant knowledge and task power to get the job done, and I know these people by building relations.

How can you extend this further? Go to conferences, watch talks, give talks, could be 5 minutes or 50 long. Each of these interactions and connections you make along the way help expand your list of contacts and sources of information you can tap into to get what you need.

Position Power

Ideally the least-preferable power, sometimes just being in the right position can aid you in obtaining the outcome you need. This could be beneficial if for example you are the tech lead, product owner or architect. In the “One Minute Manager”, they refer to this as the Power you hopefully never need to use, but is always good to know it’s there if you need it.

Your levels of Power can vary not only throughout your career, but from situation to situation. For example, you may have a lot of Knowledge power when working with a particular technology or business domain, but you may need to rely more on your Personal and Relationship Power to get what you need when working with unfamiliar systems or new clients. Knowing where you lie on these scales is valuable, and being willing to utilise these other Power points when required can be very advantageous.

Persevere… but be Patient

Personal development doesn’t happen overnight, I’m 12 years into my career and I’m barely figuring this all out myself! If you’ve read this far down then it’s great to see you’re passionate about wanting to become a great self leader, but it’s something that takes a lot of courage, determination, and time.

We’ve grown up in a world where recognition and feedback is instant, and with the likes of smartphones and social media constantly within reach we expect everything to happen with the same immediacy as instant messaging. Simon Sinek refers to this as the “Instant Gratification” model, and the downside to this mindset is that we get disheartened when change doesn’t happen at the same rate of pace.

The journey is long and sometimes it can feel like it’s difficult to be the master of your own destiny, to have the agency to effect change and motivate yourself to constantly improve. Be patient, a lot of this comes with time and experience but if you can keep momentum and focus on continuous improvement, great things will be bestowed upon you – the self leader.

Closing

The above is a very brief insight into some schools of thought I’ve been looking into recently. It’s all too easy for people to tell you to just “take ownership”, and “empower” you without any real support or direction. But hopefully some of the above gives you inspiration on how to take charge of your own career, and get what you need in order to succeed.

References & Further Reading

Videos

Reading

Erskine Nash Associates

Performance Tuning Next.js

TL;DR: Next.js 9.3 introduces getStaticPaths, which allows you to generate a data-driven list of pages to render at build time, potentially allowing you to bypass server-side rendering for some use cases. You can now also use the fallback property to dynamically build pages on request, and serve the generated html instead.

On a recent project we built a website for a client using a combination of Next.js and Contentful headless CMS. The goal of the website was to offer a responsive experience across all devices whilst keeping load times to a minimum and supporting SEO.

I rather like Next.js – it combines the benefits of React with Server Side Rendering (SSR) and static html builds, enabling caching for quick initial page loads and SEO support. Once the cached SSR page has been downloaded, Next.js “hydrates” the page with React and all of the page components, completely seamlessly to the user.

The website is deployed to AWS using CloudFront and Lambda@Edge as our CDN and SSR platform. It works by executing a lambda for Origin Requests and caching the results in CloudFront. Regardless of where the page is rendered (client or server) Next.js runs the same code which in our case queries Contentful for content to display on the page, which is neat as the same code handles both scenarios.

During testing, we noticed that page requests that weren’t cached in CloudFront could take anything up to 10 seconds to render. Although this only affects requests that miss the cache, this wasn’t acceptable to us as it impacts every page that needs to be server-side generated, and the issue would also be replicated for every edge location in CloudFront. This issue only affects the first page load of a visitors session however, as subsequent requests are handled client-side and only the new page content and assets are downloaded.

Whilst investigating the issue we spotted that the majority of processing time was spent in the lambda. We added extra logging to output the elapsed time at various points in the lambda, and then created custom CloudWatch metrics from these to identify where most of the time was incurred.

We identified that the additional overhead was caused by javascript requiring the specific page’s javascript file embedded within the lambda, which is dynamically loaded for the page requested. It’s dynamically loaded to avoid loading all page assets when only rendering a single page, which would add considerable and unnecessary startup time to the lambda.

The lambda we used was based on the Next.js plugin available for the serverless framework, but as we were using Terraform we took the bits we needed from here to make it work https://github.com/serverless-nextjs/serverless-next.js/blob/master/README.md

Due to the overhead from the require statement, we experimented with the resource allocation given to the lambda. It was initially set to 128mb, so we played with various configurations and applied load against the website using JMeter to see if extra resources improved the responsiveness.

We found that by tweaking the memory allocation of the lambda, we could improve the average startup time from ~10 seconds to ~2 seconds. We found that the sweet spot was 368mb, just as the curve begins to flatten out. On the surface, increasing from 128mb to 368mb triples our lambda costs, however these are negligible as the lambda only runs on cache misses with most of our requests served from the CloudFront cache. That said adding extra resources for the sake of milliseconds would be superfluous and more expensive.

This improvement in speed was good enough for us, considering it impacted only a small percentage of visitors. A colleague of mine afterwards however suggested a couple of further refinements that could be made, which would reduce this impact even further. These options would require additional development effort which for us wasn’t possible at the time, but would make the website really responsive for all visitors.

Other strategies for mitigating the cold start issue

Multiple cache behaviours for different paths

By identifying which areas of your website are updated more often than others, you can mitigate the lambda issue by tweaking the cache expiries associated with them in CloudFront. For example, your homepage may change several times a day, whereas your news articles once published might stay fairly static. In this case, you could apply a short cache expiry to the root of your website / and a longer one for /news/*.

Invalidating CloudFront caches proactively

You could proactively invalidate CloudFront caches whenever content on your website changes. CloudFront allows you to specify a path to evict the cache for, so you can be really specific on what you want to invalidate. In our scenario, we could use Contentful webhooks to be notified when a piece of content is updated or removed, and use a lambda to trigger a cache invalidation for that path.

Generating dynamic pages at build time

As of Next.JS 9.3 there is now a getStaticPaths function, which allows you to generate dynamic pages (that use placeholders i.e. /news/[article-uri] at build time. This can significantly reduce the need for SSR depending on your use case.

Initially, you had to generate all of these pages as part of your build, which could be quite inefficient (e.g. rebuilding a website that has thousands of blog articles every time a new blog is published). However, as of Next.JS 9.3 you can now generate static pages on demand as announced here using the fallback key on getStaticPaths

In our project, we could use Contentful WebHooks to trigger website builds, passing through the URI of the new page into the build pipeline to specify what part of the website to rebuild. If you have a page template for /news/* for example, you’d possibly have to trigger a rebuild of all news.

Doing this would negate a lot of the above, as for us we could build a lot of the website upfront, and then new blog articles could be built on demand when visitors accessed them. Next.js’ fallback functionality notifies you when a page is being built for the first time, allowing you to present an intermediary “page loading” screen for the first visitor who triggers the build, giving them visual feedback and keeping them engaged whilst the page builds behind the scenes.

Hopefully this overview gives you some understanding of the potential performance issues faced when using SSR with Next.js, and also the variety of options available to you when tuning your application.

More details of Next.js’ Server Side Rendering and Static Generation capabilities can be found here:

https://nextjs.org/blog/next-9-4

https://nextjs.org/docs/basic-features/data-fetching

https://nextjs.org/docs/basic-features/pages

Booking a Meeting Room with Alexa – Part Two – Coding the Skill

Hey there! In my previous post Booking a Meeting Room with Alexa – Part One, I talk about how to build up the Interaction Model for your Skill using the Alexa Developer Console. Now, I’ll talk about how to write code that can handle the requests.

Setting Up

I chose to use JavaScript to write the skill, as I wanted to try something a little different to Java which is what I normally use. Alexa has an SDK that allows you to develop Skills in a number of languages including Java and Javascript, but also C#, Python, Go and probably many more. I chose Javascript because of its quick load time and conciseness. I’ve written a previous Skill in both Javascript and Java, the former taking < 1 second to execute and the latter taking ~ 2.5 seconds. They both did the same thing, but Java apps can become bloated quickly and unknowingly if you pick certain frameworks, so be weary when choosing your weapon of choice and make sure it’s going to allow you to write quick responding skills. Waiting for Alexa to respond is like waiting for a spinning wheel on a UI, or like your elderly relative to acknowledge they’ve heard you… I’m sure you know what I mean.

To develop in Javascript, I used npm for managing my dependencies, and placed my production code under “src” and test code under “test” (sorry, Java idioms kicking in here!). I used npm init to create my package.json, which includes information about my package (such as name, author, git url etc.) and what dependencies my javascript code has. I later discovered that you can use ask new to create a bootstrapped skill, which you can then use to fill the gaps with your business logic.

Regarding dependencies, there’s a couple of key ones you need for Alexa development: ask-sdk-core and ask-sdk-model. I also used the ssml-builder library, as it provides a nice Builder DSL for crafting your responses. 

Skill Structure

Skills have an entrypoint for receiving a request, and then delegate off to a specific handler that’s capable of servicing it. The skeleton of that entry point looks like this:

const Alexa = require('ask-sdk-core');
var Speech = require('ssml-builder');

let skill;

exports.handler = async function (event, context) {
    if (!skill) {
        skill = Alexa.SkillBuilders.custom()
            .addRequestHandlers(
                <Your Handlers Here>
            )
            .addErrorHandlers(ErrorHandler)
            .create();
    }
    const response = await skill.invoke(event, context);
    return response;
};

So in your top-level handler, you specify one or more RequestHandlers, and one or more ErrorHandlers. Upon calling the create() function you get returned a Skill object, which you can then use to invoke with the received request.

Lazy initialisation of the singleton skill object is because your lambda code can stay active for a period of time after it completes a request, and can handle other requests that may subsequently occur. Initialising this only once speeds up subsequent requests.

Building a RequestHandler

In the middle of the Alexa.SkillBuilders code block, you can see my <Your Handlers Here> placeholder. This is where you pass in RequestHandlers. These allow you to encapsulate the logic for your Skill into manageable chunks. I had a RequestHandler per Intent that my Skill had, but it’s quite flexible. It used something similar to the chain of command pattern, passing your request to each RequestHandler until it finds one that can handle the request. Your RequestHandler has a canHandle function, which returns a boolean stating whether it can handle the request or not:

const HelpIntentHandler = {
    canHandle(handlerInput) {
        return handlerInput.requestEnvelope.request.type === 'IntentRequest'
            && handlerInput.requestEnvelope.request.intent.name === 'AMAZON.HelpIntent';
    },
    handle(handlerInput) {
        const speechText = 'Ask me a question about Infinity Works!';

        return handlerInput.responseBuilder
            .speak(speechText)
            .reprompt(speechText)
            .withSimpleCard('Help', speechText)
            .getResponse();
    }
};

As you can see above, the canHandle function can decide whether or not it can handle the request based on properties in the request. Amazon has a number of built in Intents, such as AMAZON.HelpIntent and AMAZON.CancelIntent that are available to your Skill by default. So it’s best to have RequestHandlers that can do something with these such as providing a list of things that your Skill can do.

Under that, you have your handle function, which takes the request and performs some actions with it. For example that could be adding two numbers spoken by the user, or in my case calling an external API to check availability and book a room. Below is a shortened version of my Room Booker Skill, hopefully to give you a flavour for how this would look:

async handle(handlerInput) {

        let accessToken = handlerInput.requestEnvelope.context.System.user.accessToken;
        const deviceId = handlerInput.requestEnvelope.context.System.device.deviceId;
        let deviceLookupResult = await lookupDeviceToRoom(deviceId);
        if (!deviceLookupResult)
            return handlerInput.responseBuilder.speak("This device doesn't have an associated room, please link it to a room.").getResponse();

        const calendar = google.calendar({version: 'v3', auth: oauth2Client});
        const calendarId = deviceLookupResult.CalendarId.S;
        let event = await listCurrentOrNextEvent(calendar, calendarId, requestedStartDate, requestedEndDate);

        if (roomAlreadyBooked(requestedStartDate, requestedEndDate, event)) {

            //Look for other rooms availability
            const roomsData = await getRooms(ddb);
            const availableRooms = await returnAvailableRooms(roomsData, requestedStartDate, requestedEndDate, calendar);
            return handlerInput.responseBuilder.speak(buildRoomBookedResponse(requestedStartDate, requestedEndDate, event, availableRooms))
                .getResponse();
        }
        
        //If we've got this far, then there's no existing event that'd conflict. Let's book!
        await createNewEvent(calendar, calendarId, requestedStartDate, requestedEndDate);
        let speechOutput = new Speech()
            .say(`Ok, room is booked at`)
            .sayAs({
                word: moment(requestedStartDate).format("H:mm"),
                interpret: "time"
            })
            .say(`for ${requestedDuration.humanize()}`);
        return handlerInput.responseBuilder.speak(speechOutput.ssml(true)).getResponse();
    }

Javascript Gotchas

I’ll be the first to admit that Javascript is not my forte, and this is certainly not what I’d call production quality! But for anyone like me there’s a couple of key things I’d like to mention. To handle data and time processing I used Moment.js, a really nice library IMO for handling datetimes, but also for outputting them in human-readable format, which is really useful when Alexa is going to say it.

Secondly… callbacks are fun… especially when they don’t trigger! I smashed my head against a wall for a while wondering why when I was using the Google SDK that used callbacks, none of them were getting invoked. Took me longer than I’d like to admit to figure out that the lambda was exiting before my callbacks were being invoked. This is due to Javascript running in an event loop, and callbacks being invoked asynchronously. The main block of my code was invoking the 3rd party APIs, passing a callback to execute later on, but was returning way before they had chance to be invoked. As I was returning the text response within these callbacks, no text was being returned for Alexa to say within the main block, so she didn’t give me any clues as to what was going wrong!

To get around this, I firstly tried using Promises, which would allow me to return a Promise to the Alexa SDK instead of a response. The SDK supports this, and means that you can return a promise that’ll eventually resolve, and can finalise the response processing once it does. After a bit of Googling, I found that it’s fairly straightforward to wrap callbacks in promises using something like:

return new Promise(function (resolve, reject) {

        dynamoDb.getItem(params, function (err, data) {
            if (err) reject(err);
            else {
                resolve(data.Item);
            }
        });
    });

Now that I’d translated the callbacks to promises, it allowed me to return something like the following from the Skill, which the SDK would then resolve eventually:

createNewEvent(calendar, requestedStartDate, requestedEndDate).then(result -> return handlerInput.responseBuilder.speak("Room Booked").getResponse();
)

Unfortunately, I couldn’t quite get this to work, and it’s been a couple of months now since I did this I can’t remember what the reason was! But things to be wary of for me are the asynchronous nature of Javascript, and Closures – make sure that objects you’re trying to interact with are in the scope of the Promises you write. Secondly, using Promises ended up resulting in a lot of Promise-chains, which made the code difficult to interpret and follow. Eventually, I ended up using the async/await keywords, which were introduced in ES8. These act as a lightweight wrapper around Promises, but allow you to treat the code as if it were synchronous. This was perfect for my use case, because the process for booking a room is fairly synchronous – you need to know what room you’re in first, then check its availability, then book the room if it’s free. It allowed me to write code like this:

let deviceLookupResult = await lookupDeviceToRoom(deviceId, ddb);
let clashingEvent = await listCurrentOrNextEvent(calendar, calendarId, requestedStartDate, requestedEndDate);
if (!clashingEvent) {
    await createNewEvent(calendar, calendarId, requestedStartDate, requestedEndDate);

    let speechOutput = new Speech()
        .say(`Ok, room is booked at`)
        .sayAs({
            word: moment(requestedStartDate).format("H:mm"),
            interpret: "time"
        })
        .say(`for ${requestedDuration.humanize()}`);
    return handlerInput.responseBuilder.speak(speechOutput.ssml(true)).getResponse();
}

That to me just reads a lot nicer for this particular workflow. Using async/await may not always be appropriate to use, but I’d definitely recommend looking into it.

Speech Synthesis Markup Language (SSML)

The last thing I want to discuss in this post is Speech Synthesis Markup Language (SSML). It’s a syntax defined in XML that allows you to construct phrases that a text-to-speech engine can say. It’s a standard that isn’t just used by Alexa but by many platforms. In the code snippet above, I used a library called ssml-builder which provides a nice DSL for constructing responses. This library then takes your input, and converts it to SSML. The code above actually returns:

<speak>Ok, room is booked at <say-as interpret-as='time'>9:30</say-as> for an hour</speak>

Alexa supports the majority of features defined by the SSML standard, but not all of them. I used https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html as a reference of what you can get Alexa to do, and it’s still quite a lot! The main thing I had trouble with was getting SSML to output times in a human-readable way – even using the time hints in the say-as attributes resulted in pretty funky ways to say the time! That’s when moment.js came to the rescue, as it was able to output human-readable forms of the times, so I could avoid using SSML to process them entirely.

If you want to play about with SSML, the Alexa Developer Console provides a sandbox under the “Test” tab, which allows you to write SSML and have Alexa say it. This way you can identify the best way to output what you want Alexa to say, and experiment with tones, speeds, emphasis on certain words etc to make her feel more human:

Wrapping Up

And that’s it for this post, hopefully that gives you an idea of where to start if you’ve not done Alexa or Javascript development before (like me!) In the next post I’ll be touching on how to unit test Skills using Javascript frameworks.

Whilst writing this post, Amazon have been sending me step-by-step guides on Alexa Development which I think would be useful to share too, so if you get chance take a look at these as well. And you don’t even need to be a coder to get started with these! Until next time…

Design your Voice Experience
Identify your Customers
Write your Script
Define your Voice Interaction

Build your Front End, Your Way
Developer Console
Command-Line Interface
Third Party Tools – no Coding Required!

Build the Back-End
Start with AWS Lambda
More Tools – No Back-End Setup Required

Booking a Meeting Room with Alexa – Part One

Hey there! This is part one into my adventures of developing an Alexa skill. I was inspired recently on client site, where I saw they’d installed a shiny new room booking system. Each meeting room had a touch screen setup outside of it, and from it you could see who’d booked the room, and also use it to book the room out if it was available.

It had the right idea, but from talking to people I learnt that it wasn’t the most user-friendly, and that it had cost a pretty penny too! I’d been looking for an excuse to dabble with Alexa and Voice UIs, so I decided to see if I could build something similar with commodity hardware.

“Alexa, book this meeting room out for 48 minutes”

Because I like nothing more than diving in at the deep end, I chose a completely unfamiliar tech stack to me. My comfort zone as of late is Java and ECS, so I used AWS Lambda to host the Skill and Javascript as the development language. I used the Serverless framework to manage deployments. The development of a Lambda Skill is split up into two parts – creating and hosting the voice interface, and then the application code that handles your requests.

In this blog post I’ll be focusing on developing the Invocation Model using the Alexa Development Console. To get started, you can go here and sign in using your Amazon.com account. If you need to create an account you can do that here too.

With Alexa, what you write are Skills – code functions that carry out the action you want to happen. They’re triggered by Invocations – user requests in the form of phrases that Alexa uses to figure out what you’re trying to do. In my case, an Invocation was “Alexa, book out this meeting room for 48 minutes”.

Once you get set up with an account, you’ll end up at a page listing your current skills. Towards the right hand side there’s a button called “Create Skill”, go ahead and click that to be taken to the following page to create your skill:

Amazon gives you a number of template models to choose from, to speed up development and give examples of what you can do with Alexa. You can also “Provision your own” backend resources, directing your Skill either to a http endpoint or an AWS Lambda. Alternatively, you can choose “Alexa-Hosted”, which uses AWS Lambda but integrates the code development into the Alexa Console, so you can do code development alongside in the same UI.

An Alexa Skill can have one or more Intents – actions or requests that your Skills can handle. An Intent can be something like “what’s the weather today”, or “what’s on my agenda today”, or “book me a meeting room” (see where I’m going with this? 😉). Intents can be invoked by one or more Utterances, the phrase you’ll use to request your Intent. You can link one or more Utterances to an Intent, which can be useful to capture all the variations that someone might use to request your Intent.

As part of designing the UX, I found it useful to test how I’d interact with my Skill on an Echo Device, but with the microphone turned off. It was interesting to see how many variations I could come up with to request booking a room, and I noted all of these variations and configured them as Utterances, as you can see below:

Within these Utterances, you can have Slots too – parameter placeholders that allow you to specify variables to the request, making the requests more dynamic. In my case, this was allowing the user to specify the duration of the booking, and optionally providing a start time, but it equally could have been movie actors, days of the week, a telephone number etc. Amazon has various Slot Types, such as animals, dates, countries and so on, which allows Alexa to try to match the user request with a value in that Slot Type. These Slots can be optional as well, so your requests can include one or more parameters. You can do this by configuring multiple Utterances, that use one or more of your Slots.

If you don’t want to use of the preconfigured Slot Types you can create your own list of values to match the parameter against, or use the AMAZON.SearchQuery Spot Type, although I’ve had varying success with its speech recognition.

Not related to my Meeting Room Booker Skill, but something worth mentioning. It doesn’t always quite catch what I say (or interprets it differently to how I intended), making it difficult to do exact matches or lookups. For example I tried building a “Skills Matrix” Skill, where I could name a technology and Alexa would tell me who knows about it. I didn’t realise you could have so many variations on interpreting the words “Node JS”! The only way I could think of getting around it at the time was to have a custom “Technology” Slot Type, and for the more difficult technologies to pronounce, list all the expected variations in there. You can also employ a “Dialog Delegation Strategy”, which allows you to defer all dialog management to your lambda, which allows far more possibilities to interact with your user (e.g. you could use fuzzy logic or ML to figure out what the client meant), but it’s a bit more advanced to get set up.

It’s worth noting at this point, you can have a different Interaction Model per Locale, which makes sense as it allows you to account for things such as language and dialect differences. The key thing to ensure is that when you’re developing and testing your Skill (which I’ll cover in following posts) that you’re always using the same Locale, otherwise you just get a rather unhelpful “I don’t know what you mean”-esque response, or an even less unhelpful but more confusing “Uber can help with that”, which completely threw me off for much longer than I’d like to admit!

Eventually, I had an Interaction Model for the Skill created through the UI. Once you’re past the point of trying it out and want to productionise it, you’ll probably be thinking how to create and modify these Skills programmatically. Thankfully, there’s an Alexa Skills Kit (ASK) SDK, that allows you to do just that.

Here’s a link for installation instructions for the CLI – https://developer.amazon.com/docs/smapi/quick-start-alexa-skills-kit-command-line-interface.html

And here’s a quick start to creating a new Skill using the CLI – https://developer.amazon.com/docs/smapi/quick-start-alexa-skills-kit-command-line-interface.html

You can use the ASK CLI to create, update and delete skills. It’s fairly simple to use, so long as all your json config is correct – the errors it returns don’t give you much insight if you’ve missed a required parameter, or specified an invalid value for example.

As I’d already had a Skill created at this point using the UI, I used the CLI here to pull the metadata generated from the UI, to store in Git. The commands I used in particular were:

ask api list-skills to get the skillId for the newly created Skill
ask api get-skill -s {skillId} to get the Skill metadata as a json payload
ask api get-model -s {skillId} -l {locale} to get the Interaction Model metadata as a json payload

At this point, everything that I did in the UI was now available to me as code, and I was able to check it all in to Git. I found it very useful to do that just as with any code, because once you start tweaking and trying out various things it can be difficult to revert back to a good working state without it should things go wrong. You can use the following commands to update your Skill:

ask api update-skill -s {skillId} -f {skill-file-location} to update the Skill metadata
ask api update-model -s {skillId} -l {locale} -f {model-file-location} to update the Interaction Model

You can also use the ASK CLI to create a Skill from scratch, without ever needing to use the UI. You can use ask new to configure and provision a Skill, and it also creates you a folder structure with the json files I generated from my existing Skill already set up, ready for you to get started.

So that was a rather quick “how to get up and going” creating an Alexa Skill. The next step is linking the Skill to some backend code to handle the requests. I’ll be following this blog up with a how to on that, but in the meantime if you have any questions feel free to give me a shout!

Also, if you’re reading this and thinking “my business could really benefit from an Alexa Skill”, then please drop me a line at ben.foster@infinityworks.com and let’s talk 🙂

Unrewarding Retros? Time to take action!

I love a good retro. It gives you an opportunity to vent, de-stress after a sprint, raise concerns but also for praise and motivation ready for the next sprint. A friendly platform for open, honest and frank discussions.

But what about the venting? The frustration? Those things in your project or amongst your team that cause issues. Those hindrances that we complacently acknowledge as “just the way it is” or “nothing we can do about that for now, but we will do later…”. Most of the time they keep reappearing retro after retro, with little or no attention given to them. So what can we do about them? If all we do is complain for an hour or so every 2 weeks how is that productive?

The step that frequently seems missing to me in a retro is to collate the list of woes that have hindered the sprint, and to do something about them. I’ve been in Retros like this: We collate the obstacles from a Sprint; we make a list of actions; but they’re rarely actioned. They infrequently leave the whiteboard they’re written on, or at best they’re noted down in Confluence and never looked at again. Nothing more than a remnant of good intentions.

Don’t finish the retro until you have a list of actions. A popular method for defining meaningful actions follows this criteria: Specific; Measurable; Attainable; Realistic; and Timely. Having SMART actions provide quantifiable feedback that can motivate your team to engage with them. These criteria can be broken down into the following:

  • Specific and Measurable – An action needs to be tangible, whose progress can be tracked so teams can feel a sense of achievement about working towards correcting issues.
  • Attainable and Realistic – If it isn’t then the team won’t be motivated to achieve it. You want people to be engaged in working towards the action so they don’t get ignored (and you’re no better off than you started)
  • Timely – needs to be something that can be accomplished before the next Retro, so its success can be realised quickly. Also prevents the actions from perpetuating (in the same way we keep stories small and focused)

For the things that go well during a Sprint… well we don’t need to do anything about those… do we? Absolutely not! Things that go well in a sprint should be praised, but we should take action to ensure that those things continue to happen. This isn’t always necessary for every positive, but sometimes you need to keep on top of them… you can’t have too much of a good thing!

But how do you encourage and enable the team to take ownership of these actions? They can be ignored because “someone else will do it” or “didn’t realise it was on me”. People on the team can’t justify complaining about things if they don’t take ownership. Admittedly there are external factors that are outside the control of the team, but this alone warrants its own blog (and even these can be mitigated to some extent).

So this is what I’m proposing – have them visible as sprint goals, maybe on a whiteboard close to the team so they’re constantly reminded of them. Alternatively track them the same way we track everything else in a Sprint – as a story or ticket so they’re no less significant than anything else. Sometimes this may not be possible, given the client or the nature of the actions, so use your judgement to determine how best to track them. Their importance needs to be realised, to a greater extent than what they currently are.

I’m currently trying the whiteboard trick in my team’s Sprints, so I’ll try to feedback on how well this goes. Give it a go yourself and share your experience!

Reactive Kafka + Slow Consumers = Diagnosis Nightmare

Recently I’ve been working with the combination of Reactive Streams (in the form of Akka Streams) and Kafka, as it’s a good fit for some of the systems we’re building at work.

I hope it’ll be beneficial to others to share a particular nuance I discovered whilst working with this combintion, in particular a problem with slow downstream consumers.

To give a brief overview, we were taking messages from a Kafka topic and then sending them as the body of http post requests. This was working fine for the majority of the time, as we only get a message every couple of seconds normally.

However, the issues came when we had to deal with an influx of messages, from an upstream source that every now and then batches about 300 messages and pushes them onto Kafka. What made it even more difficult is that we didn’t know this was a contributing factor at the time…

So what happened? We saw a lot of rebalancing exceptions happen in the consumer, and also because we were using non-committing Kafka consumers, all the messages from offset 0 were constantly being re-read every second or so as a result of the constant rebalancing. Also when you try and use the kafka-consumer-groups script that comes with Kafka, you don’t get a list of partitions and consumers, but a notification that the consumer group either doesn’t exist or is rebalancing.

It turns out, Kafka was constantly redistributing the partitions across the 2 nodes within my affected consumer group. I can’t recall how I eventually figured this out, but the root cause was combining kafka in a reactive stream with a slow downstream consumer (http).

At the time of writing we’re using akka-stream-kafka 0.11-M3, and it has an “interesting” behaviour when working with slow downstream consumers – it stops its scheduled polling when there is no downstream demand, which in turn stops its heartbeating back to Kafka. Because of this, whenever the stream was applying backpressure (because we were waiting on http responses), the backpressure propogated all the way back to the Kafka consumer, which in turn stopped heartbeating.

To replicate this, I created the following Kafka topic:
./kafka-topics.sh --create --zookeeper 192.168.99.102:2181 --topic test_topic --replication-factor 3 --partitions 6

Then I used this code to publish messages onto Kafka, and ran two of these consumers to consume in parallel within the same Kafka consumer group.

What this causes the Kafka broker to do (at least with its default configuration) is to consider that node as slow or unavailable, which triggers a rebalancing of partitions to other nodes (which it deems might be available to pick up the slack). That’s why when I kept reviewing the state of kafka-consumer-groups, you’d eventually see all partitions being consumed by one node, then the other, then getting the rebalancing message. And because both of our nodes were using non-committing consumers, they both kept receiving the full backlog of messages, meanining they both became overwhelmed with messages and applied backpressure, which meant Kafka kept reassigning partitions… it was a vicious cycle!

Using the kafka-consumer-groups script you can see this happening:

benfoster$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.99.102:9000 --describe --group test-consumer
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
test-consumer, test_topic, 3, unknown, 3, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 4, unknown, 2, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 5, unknown, 3, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 0, unknown, 3, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 1, unknown, 2, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 2, unknown, 3, unknown, consumer-1_/192.168.99.1
benfoster$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.99.102:9000 --describe --group test-consumer
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
test-consumer, test_topic, 3, unknown, 3, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 4, unknown, 2, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 5, unknown, 3, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 0, unknown, 3, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 1, unknown, 2, unknown, consumer-1_/192.168.99.1
test-consumer, test_topic, 2, unknown, 3, unknown, consumer-1_/192.168.99.1
benfoster$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.99.102:9000 --describe --group test-consumer
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
test-consumer, test_topic, 0, unknown, 75, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 1, unknown, 74, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 2, unknown, 75, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 3, unknown, 75, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 4, unknown, 75, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 5, unknown, 75, unknown, consumer2_/192.168.99.1
benfoster$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.99.102:9000 --describe --group test-consumer
Consumer group `test-consumer` does not exist or is rebalancing.
benfoster$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.99.102:9000 --describe --group test-consumer
Consumer group `test-consumer` does not exist or is rebalancing.

And within my consumer’s app logs, you can see it constantly rereading the same messages:

2016-07-01 09:37:37,171 [PM-akka.actor.default-dispatcher-22] DEBUG a.kafka.internal.PlainConsumerStage PlainConsumerStage(akka://PM) - Push element ConsumerRecord(topic = test_topic, partition = 0, offset = 0, key = null, value = test2)
2016-07-01 09:42:07,344 [PM-akka.actor.default-dispatcher-14] DEBUG a.kafka.internal.PlainConsumerStage PlainConsumerStage(akka://PM) - Push element ConsumerRecord(topic = test_topic, partition = 0, offset = 0, key = null, value = test2)
2016-07-01 09:38:57,217 [PM-akka.actor.default-dispatcher-22] DEBUG a.kafka.internal.PlainConsumerStage PlainConsumerStage(akka://PM) - Push element ConsumerRecord(topic = test_topic, partition = 1, offset = 3, key = null, value = test24)
2016-07-01 09:43:37,390 [PM-akka.actor.default-dispatcher-20] DEBUG a.kafka.internal.PlainConsumerStage PlainConsumerStage(akka://PM) - Push element ConsumerRecord(topic = test_topic, partition = 1, offset = 3, key = null, value = test24)
etc...

So how did we fix this? Thankfully for us we knew that the number of elements ever to appear in a batch would be small (few hundred elements) so we added an in-memory buffer to the stream, which meant we could buffer all these for the http endpoint to eventually process and Kafka would be unaffected. This was a quick fix and got us what we needed.

As soon as you add a buffer, the two consumers behave, and you get this:

benfoster$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.99.102:9000 --describe --group test-consumer
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
test-consumer, test_topic, 3, unknown, 87, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 4, unknown, 86, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 5, unknown, 86, unknown, consumer2_/192.168.99.1
test-consumer, test_topic, 0, unknown, 87, unknown, consumer1_/192.168.99.1
test-consumer, test_topic, 1, unknown, 86, unknown, consumer1_/192.168.99.1
test-consumer, test_topic, 2, unknown, 86, unknown, consumer1_/192.168.99.1

Is it the right fix? Probably not, if we were dealing with greater volume or velocity we’d have to treat this app as a slow consumer, and possibly ditch the reactive streams abstraction in favour of utilising the lower level Kafka API to ensure we had full control over our partition and heartbeat management. But that constitutes a dedicated article(s) in its own right.

I hope someone else finds this useful, one of the mishaps you can have when you abstract so far away you don’t realise the issues that could occur beneath the woodwork.

I’ve uploaded the source code for the reactive producer (shamelessly ripped from an Activator template, and the ticker publisher code from my friend Neil Dunlop) and consumer I used if you’d like to replicate the scenario, You’ll need a Kafka broker running:
https://github.com/foyst/reactive-kafka-publisher
https://github.com/foyst/reactive-kafka-slow-consumer

Road to Continuous Deployment Part 1 – Performance Testing

The next set of Articles are documenting a presentation I’m working on to demonstrate Continuous Delivery, and how in a world where CD is desired, it’s becoming increasingly important for the appropriate level of testing to happen. I’ll be breaking them down to tackle one topic at a time, finally building up to a full demo on how I feel CD can work.

Firstly, I’m going to start with a problem CI/CD can address – Performance Testing. Not so much on how to performance test or how CI/CD can build systems that don’t require it, but how a continuous delivery pipeline can quickly alert developers to potential problems that could otherwise remain undetected and take ages to debug once in production.

Nothing new or groovy here, but a problem most developers (and definitely sql developers) are familiar with: the N+1 problem. Java developers using Hibernate will be familiar with the @OneToMany annotation, but less so on the @Fetch annotation, and even less so on the implications of changing the FetchMode. I aim to demonstrate how something as simple as changing a Hibernate @OneToMany Fetch strategy can drastically affect the performance of a system

There are several good reasons why you may want to do this:
* This is a reasonable thing to do: Maybe you want to lazy load children, no point eagerly loading all details with a join
* Might not always be a bad thing to do (perhaps most of the time only a few children are accessed, which negates the overall performance impace) but testing should be done to assess the potential impact on performance

Side-bar: The demo project for this article was originally ported from the spring boot data rest example within their samples, however the @Fetch annotation appears to be ignored, which makes it difficult to demonstrate.
This article gives a good direction on what I expected to happen and what the problem likely is: http://www.solidsyntax.be/2013/10/17/fetching-collections-hibernate/
I suspect the spring boot configuration doesn’t use the Criteria API behind the scenes, which means the @Fetch annotation will be ignored.

The application is a simple school class registration system, with the domain modelled around classes and students. One GET resource is available which returns all classes and students as children nodes. Below is a sample of the returned json:

  {
    "id": 3,
    "className": "Oakleaf Grammar School - Class 3",
    "students": [
      {
        "id": 970,
        "firstName": "Marie",
        "lastName": "Robinson"
      },
      {
        "id": 2277,
        "firstName": "Steve",
        "lastName": "Parker"
      },
      {
        "id": 4303,
        "firstName": "Lillian",
        "lastName": "Carpenter"
      },
      {
        "id": 9109,
        "firstName": "Samuel",
        "lastName": "Woods"
      }
    ]
  }

So at this point, I have a simple application whose performance can be altered by simply changing the @Fetch annotation, but how can we test the effect of this?

This is what Gatling is designed for. It is a performance and load testing library written in Scala, which has a really nice DSL that can be used to express scenarios for testing load on systems

This is code required to define a scenario for testing our system:

import io.gatling.core.Predef._
import io.gatling.http.Predef._

class SchoolClassLoadTest extends Simulation {

val httpConf = http
.baseURL("http://localhost:8080") // Here is the root for all relative URLs

val scn = scenario("Scenario Name").repeat(10, "n") {

exec(http("request_1") // Using http protocol
.get("/v1/class") // Get http method with the relative url
.check(status.is(200))) // Check response status code is 200 (OK)
  }
}

Not much is it?

Side-bar: Gatling also comes with is a recorder UI. This allows you to record all interactions with applications over HTTP and save them as scenarios. Gatling achieves this by acting as a proxy, which you can use like any other web proxy. You can route web browser or application traffic via the Gatling Recorder Proxy, and it will record all interactions made through it and save them as a scenario similar to the one above, which can then be tweaked later.

Once you have your scenario defined, you can configure your simulations to meet whatever load and duration parameters you like. For example in our app I’m going to run the test simulating 10 concurrent each making 10 requests:

setUp(scn.inject(atOnceUsers(10)).protocols(httpConf))

This is just a simple example of what you can do with Gatling: you can specify ramp up and down of users to test how an app scales; pause between steps to simulate human interaction; and much more. For more about what you can do with Gatling check out their QuickStart Page

Back to our app. I’m now going to hit this simulation against our app whilst it’s using the FetchMode.JOIN config:
N+1WithJoin

N.B. I’ve warmed up the DB cache before running both the before and after benchmarks by running the simulation once before I’ve recorded the results.

Above is the baseline for our app – you can see the mean response time is 221ms, and the max is 378ms. The 95 percentile is 318ms. Now look what happens when I simply change the @Fetch strategy from JOIN to SELECT:
N+1WithSelect

The average response time has increased from 221ms to 3.4 seconds, with the 95th percentage rocketing up to 8.3 seconds! Something as simple as an annoatation change can have such a dramatic impact on performance. The worrying thing is this is so easy to do: it could be done by someone unfamiliar with Hibernate and database performance tuning; by someone frantically looking to find some performance gain and changing a number of things arbitrarily; or what I consider worse – someone who believes that the benefits of doing this outweigh the performance knock, but fail to gather the evidence to back it up!

Now that we’ve got the tools required to test performance, in my next article I’ll look into how this can be proactively monitored between releases, stay tuned…