Savvy computer hacker & netrunner

This site's road to serverless architecture

This website exists since more than 5 years and has two posts. However, I’m trying to improve the quality of this site, and the first (and most important) step for this is to implement a proper architecture for the site. Let’s go cloud-native!

What is “serverless”?

A Serverless application will run on a server provided by a cloud provider in a big data center. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. All infrastructure management tasks are shifted to the cloud provider.

Hugo

This site is built with Hugo. It’s a static site generator. You write your content in Markdown and Hugo will generate the HTML for you. Of course, this poses many advantages:

  • No CMS, no PHP, Python, NodeJS, or any other backend technology.
  • No database.
  • No dependency manager for loading thousands of libraries to display some HTML.
  • No upgrade issues (due to the simplicity of the system).
  • It’s secure by design: No SQL injection, no code injection, no zero-day-exploits. Good luck attacking my HTML!
  • You can focus on your actual content, rather than solving technical problems.

The status quo

This website used to run on a webhoster called DomainFactory. This was quite expensive and contained lots of features that I had no use for (PHP, Databases, E-Mail inboxes, oversized webspace).

Also, if my site would suddenly become popular and have millions of users, it would not be able to handle the traffic (in the case of this website, this will never happen of course).

There was no CI/CD - updating the website meant manually updating files on the server via FTP.

The goal

Here are the key points that matter to me when running a web application.

  • Zero maintenance (of the infrastructure) - I don’t want to do sysadmin tasks.
  • Only pay for services that are actually used.
  • No manual deployment - deploying a new version must be as easy as pushing to a Git repository.
  • Use of a CDN (Content Delivery Network) for availability, scalability and monitoring purposes.

The new architecture

For running this website, the following AWS services are used:

S3

Hugo’s generated HTML is stored in a S3 bucket. This is the HTML that is served when requesting the site. This guide from AWS describes how to configure a bucket for static site hosting.

CodePipeline

Whenever changes are pushed to a private GitHub repository, this code is checked out and used as the source of a CodePipeline.

The actual build is defined in buildspec.yml, which is part of the Git repository.

phases:
 install:
   commands:
     - chmod +x hugo
 build:
   commands:
      - ./hugo
artifacts:
 files:
   - '**/*'
 base-directory: 'public'

As you can see, the configuration will execute hugo. This will generate the HTML. It will then treat the content of the public folder as the artifact. This is the folder in which Hugo stores the generated code.

You can see that the self-contained hugo binary is part of the repository. I think that this is fine for a small project, but there are better solutions like using a dependency manager to download Hugo, or providing a Docker container that contains Hugo. The drawback of adding the Hugo binary to the Git repository is that it’s a misuse of Git and it will increase the size of your Git repository - especially when you decide to update the binary.

This is more or less how a build of my website looks like. It takes 75ms to build this website’s HTML! This will take longer if you have more content, but Hugo is known to be a very fast static site generator.

[Container] 2020/10/25 12:02:32 Entering phase INSTALL
[Container] 2020/10/25 12:02:32 Running command chmod +x hugo

...
[Container] 2020/10/25 12:02:32 Running command ./hugo
Start building sites … 

                   | EN  
-------------------+-----
  Pages            | 15  
  Paginator pages  |  0  
  Non-page files   |  0  
  Static files     | 16  
  Processed images |  0  
  Aliases          |  1  
  Sitemaps         |  1  
  Cleaned          |  0  

Total in 75 ms

[Container] 2020/10/25 12:02:32 Phase complete: BUILD State: SUCCEEDED
[Container] 2020/10/25 12:02:32 Phase context status code:  Message: 
[Container] 2020/10/25 12:02:32 Entering phase POST_BUILD
[Container] 2020/10/25 12:02:32 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2020/10/25 12:02:32 Phase context status code:  Message: 
[Container] 2020/10/25 12:02:32 Expanding base directory path: public
[Container] 2020/10/25 12:02:32 Assembling file list
[Container] 2020/10/25 12:02:32 Expanding public
[Container] 2020/10/25 12:02:32 Expanding file paths for base directory public
[Container] 2020/10/25 12:02:32 Assembling file list
[Container] 2020/10/25 12:02:32 Expanding **/*
[Container] 2020/10/25 12:02:32 Found 33 file(s)

Caveats

It took me some time to find out that CodePipeline needs extra configuration to support Git submodules. In Hugo, you normally store your theme as a Git submodule (this makes it really easy to update your theme). However, my build didn’t run because Submodules were not pulled from Git. This StackOverflow question has great answers that explain the problem (and the possible solution).

CloudFront

Cloudfront is a CDN. Using a CDN to serve your content has many advantages. This also allows you to access your S3 bucket using your own domain - but we will cover that later.

URL Rewriting with Lambda@Edge

Once you successfully managed to automatically push your code to the S3 bucket and you linked a CloudFront distribution to that bucket, you will find out that accessing your-id.cloudfront.com shows your index.html, but your-id.cloudfront.com/posts shows a S3 error (typically 403).

This is the “root document” (typically index.html) that you configured for your S3 bucket will only work in the bucket’s root directory, but not for sub directories. Neither S3 nor CloudFront now that when accessing your-id.cloudfront.com/posts or your-id.cloudfront.com/posts/, you really want to access your-id.cloudfront.com/posts/index.html. The way to fix this is by using a Lambda@Edge function that fixes the URL on every request.

At the time of writing this post, the only supported Runtime for Lambda@Edge functions was NodeJS. I used this function to rewrite the URLs:

'use strict';

exports.handler = (event, context, callback) => {
    var request = event.Records[0].cf.request;

    if(request.uri != "/") {
        let paths = request.uri.split("/");
        let lastPath = paths[paths.length - 1];
        let isFile = lastPath.split(".").length > 1;

        if(!isFile) {
            console.log("Old URL: " + request.uri);
            if(lastPath != "") {
                request.uri += "/";
            }

            request.uri += "index.html";
        }

        console.log("New URL: " + request.uri);
    }

    callback(null, request);

};

Some example calls for this function:

example.com/image.png => example.com/image.png
example.com/posts     => example.com/posts/index.html
example.com/posts/    => example.com/posts/index.html

CloudFront can be configured to run this method on every “View Event”.

To be honest, i was a bit surprised that apparently, the best practice (at least this is the technique they are using in their official resource about static site hosting) involves using a Lambda function to rewrite the URL. This seems quite over-engineered to me. In any web server, you can achieve the same result by defining a simple rewrite rule.

Add your own domain with Route 53

If you want to use your own external domain (such as andreasfroewis.com), you need to make your Domain’s Nameservers point to Amazon’s Route 53 Nameservers (they are displayed in the console upon creating a Route 53 zone). You can then create A (or AAAA) records for this domain that point to your CloudFront distributions, but you can add any kind of domain record. I found Route 53 very easy to use.

Caveats

After you configured your nameservers, you must still add your domain(s) - both with and without www. to the CloudFront distribution.

Add CNAME to CloudFront

SSL certificate

I created a free certificate with AWS' CMA (Certificate Manager). They verify that you’re the owner of the domain by asking you to add a special DNS record to the domain’s nameserver. The process is super easy and takes no more than two minutes. Of course, you can also import your own certificate.