How to build python code bundles for AWS Lambda quickly and easily

AWS Lambda is conceptually really cool but as soon as your code creeps beyond a single python file that uses botocore things start to get messy and cumbersome. It’s tempting to add an entirely new tool to your workflow, but theres really no need.

The approach I use is good old make. It’s a perfect fit really. We have input files:

  • A requirements.txt or some other definition of our dependencies
  • Some code that is checked in along side the requirements.txt
  • Possibly some configuration that needs to be bundled alongside the code
  • An entrypoint - such as lambda_handler.py

We want to take these and assemble a lambda.zip.

One of the nice things about this setup is that when you run make it will only update the things that have changed. This means that the requirements.txt gather step only needs to be run once - rebuilding the zip files can actually be really quick.

Make 101

If you are familiar with how a Makefile is plumbed together you can skip this bit. A Makefile is a collection of build targets and the rules for how to build those targets.

lambda.zip: lambda_handler.py
	mkdir -p build/lambda_zip
	cp lambda_handler.py build_lambda_zip/
	rm -f lambda.zip
	cd build_lambda_zip/ && zip -q -X -9 -r ../lambda.zip *

In this example, lambda.zip is the target. make is responsible for generating that target, and if any of the dependencies listed (lambda_handler.py in this example) are newer than lambda.zip it knows it needs to recreate the zip.

One very important thing is that a Makefile must be tab indented.

Sometimes there isn’t a single file that is generated by a build step. Sometimes there might not even be a file. For example, you might want to upload a build artifact only when something has changed. The make idiom for this is to use a stamp file. A stamp file is a 0 byte marker that indicates some process has been completed at a give date and time. So for example:

upload.stamp: lambda.zip
	aws lambda update-function-code --function-name MyFunction --zip-file lambda.zip
	touch $@

The build target is upload.stamp. The target needs building every time lambda.zip is updated. awscli is used to do a code upload, then touch $@ creates the stamp file (or updates its modification timestamp). This upload is now idempotent.

There are some special rules in make. These are rules that don’t have targets on disk. For example, make clean. Without some configuration hint make would believe that you wanted to create a file called clean. If you happened to have a file called clean then make would think that the build was up to date and that it didn’t need to clean anything. What this means is that we need targets that are always built. These are called .PHONY targets, and you need to include a declaration in your Makefile like this:

.PHONY: all clean

Basic Makefile structure

We’ll look at the basic skaffold first before delving into specifics.

I declare a bunch of paths at the top of my Makefile. They are all relative to the cwd which i grab with $(shell pwd):

SRC_DIR=$(shell pwd)
BUILD_DIR=$(SRC_DIR)/build
STAGING_DIRECTORY_STAMP=$(BUILD_DIR)/staging-directory-stamp
STAGING_DIRECTORY=$(BUILD_DIR)/staging
OUTPUT_ZIP=$(BUILD_DIR)/lambda.zip

The all target defines what should happen if you just run make with no arguments. We let make know about our .PHONY rules too:

all: $(OUTPUT_ZIP)
.PHONY: all clean

make clean needs to delete any files that were created by running make:

clean:
	rm -f $(STAGING_DIRECTORY_STAMP)
	rm -rf $(STAGING_DIRECTORY)
	rm -f $(OUTPUT_ZIP)

We have a build step to generate a staging directory when the lambda_handler.py code changes:

$(STAGING_DIRECTORY_STAMP): $(SRC_DIR)/lambda_handler.py
	rm -rf $(STAGING_DIRECTORY)
	mkdir $(STAGING_DIRECTORY)
	cp $(SRC_DIR)/lambda_handler.py $(STAGING_DIRECTORY)/
	touch $@

And then we zip it up as $cwd/build/lambda.zip:

$(OUTPUT_ZIP): $(STAGING_DIRECTORY_STAMP)
	rm -f $(OUTPUT_ZIP)
	cd $(STAGING_DIRECTORY) && zip -q -9 -r $(OUTPUT_ZIP) *

Collecting and extracting wheels

We want to collect all the eggs in requirements.txt. We’ll use the pip wheel command to do any compilation and build a wheelhouse. Subsequent builds can reuse the same wheels and avoid compilation:

$(CACHE_WHEELHOUSE_STAMP): $(SRC_DIR)/requirements.txt
	pip wheel -q -r requirements.txt . --wheel-dir=$(CACHE_WHEELHOUSE) --find-links=$(CACHE_WHEELHOUSE)
	touch $@

We want to preserve the built wheels as much as we can, but we don’t have a mechanism to purge old wheels. Because we want to be able to get just the wheels related to the current requirements.txt we use a second wheelhouse that we delete before repopulating it. By using the first wheelhouse as a --find-links this is pretty much a straight copy and fast:

$(STAGING_WHEELHOUSE_STAMP): $(CACHE_WHEELHOUSE_STAMP)
	rm -rf $(STAGING_WHEELHOUSE)
	pip wheel -q -r requirements.txt . --wheel-dir=$(STAGING_WHEELHOUSE) --find-links=$(CACHE_WHEELHOUSE)
	touch $@

Now the best part of collecting wheels like this is that we can just unzip them into the build directory and they will be in the correct location:

$(STAGING_DIRECTORY_STAMP): $(STAGING_WHEELHOUSE_STAMP)
	rm -rf $(STAGING_DIRECTORY)
	unzip -q "$(STAGING_WHEELHOUSE)/*.whl" -d $(STAGING_DIRECTORY)
	touch $@

Reproducibility and Idempotence

One nice property of this is theoretically if a build is run twice on the same base OS then you should get the same output, bit for bit. And this should mean use can use the CodeSha256 property returned from the Lambda API to not only prove what is deployed is what you think it is but also build in idempotence. However its not that simple.

If your zip building process is not creating identical output you can use the Debian diffoscope utility to help figure out what went wrong. Here are some things we spotted and fixed.

First we need to add an extra parameter to our zip incantation:

$(OUTPUT_ZIP): $(STAGING_DIRECTORY_STAMP)
	rm -f $(OUTPUT_ZIP)
	cd $(STAGING_DIRECTORY) && zip -q -X -9 -r $(OUTPUT_ZIP) *

This turns on --no-extra mode. This tells zip to ignore non-essential extra file attributes. By default these extra attributes introduce some non-determinism, so we just get rid of them.

Next up is that when a wheel is unpacked the mtime of the directories that are created are the current time. This metadata is preserved in the zip, but isn’t interesting or useful to us. I pick an arbitrary date (in this case the mtime of the last commit) and clamp the modification timestamps:

BUILD_DATE=$(shell git log --date=local -1 --format="@%ct")

$(STAGING_DIRECTORY_STAMP): $(STAGING_WHEELHOUSE_STAMP)
	rm -rf $(STAGING_DIRECTORY)
	unzip -q "$(STAGING_WHEELHOUSE)/*.whl" -d $(STAGING_DIRECTORY)
	find "$(STAGING_DIRECTORY)" -newermt "$(BUILD_DATE)" -print0 | xargs -0r touch --no-dereference --date="$(BUILD_DATE)"
	touch $@

The next problem are .so files that are generated by the build process. Hopefully you don’t have any, in which case you are done. Right now if you run a setup.py based compilation of an .so twice you will get different outputs. Some of this is the use of random /tmp directories. Right now the easiest way to work around this is just to pre-compile your binary dependencies as wheels and upload them to a private repository. The right fix involves using the learnings of the Reproducible Builds team to make python wheels repeatable.

You should now have reproducible lambda zips.

Bonus targets

As alluded to earlier, we can upload the zip directly to AWS by calling out to awscli. And why not add a make invoke to deploy, upload and run our function?

UPLOAD_CODE_STAMP=$(BUILD_DIR)/upload-stamp

$(UPLOAD_CODE_STAMP): $(OUTPUT_ZIP)
	aws lambda update-function-code --function-name MyFunction --zip-file lambda.zip
	touch $@

upload: $(UPLOAD_CODE_STAMP)

invoke: $(UPLOAD_CODE_STAMP)
	aws lambda invoke \
      --function-name MyFunction \
      --invocation-type RequestResponse \
      --payload file://example-payload.json

.PHONY: all clean upload invoke

Because of the dependencies invoke will build a new lambda.zip if somethings changed and then deploy it, before finally running it. Perfect when developing!

Raciness in Amazon RDS backups

I recently wrote a restore script for an AWS RDS instance. With RDS you spin up a new instance from a backup (rather than restoring into the existing instance). So you can:

  1. Rename the original out of the way (foo -> foo-old) then restore the backup in it’s place. The final server will have the same hostname.
  2. Restore the backup with a new name (foo-2). This means it will have a different hostname and you’ll need to deploy a settings change to your apps.

I went with option (1) so that I always knew which database was ‘active’. That meant doing as much validation up front as possible. You don’t want to move the existing database and then a few minutes in to the script have it fail, finding out that you asked it to restore to a point in time last year!

So how do you validate the target restore date? You can’t restore to 30s ago - there is a lag of a few minutes. And obviously you can’t restore before the database existed. But also you can’t restore before the oldest backup.

With botocore the first part of this is easy:

result = client.describe_db_instances(DBInstanceIdentifier=dbname)
db = result['DBInstances'][0]
if target > db['LatestRestorableTime']:
    raise ValueError("The target time is too recent")
if target < db['InstanceCreateTime']:
    raise ValueError('Cannot restore to before the db was created')

Unfortunately there isn’t an EarliestRestorableTime. As far as I can tell you can use the SnapshotCreateTime time of earliest backup:

result = client.describe_db_snapshots(DBInstanceIdentifier=dbname)
snapshots = result.get('DBSnapshots', [])
snapshots.sort(key=lambda snapshot: snapshot['SnapshotCreateTime'])
if not snapshots or target < snapshots[0]['SnapshotCreateTime']:
    raise ValueError("Can't restore before the first backup")

But that’s still not enough. When you are testing your backup restore script you run it a lot. And this validation frequently didn’t stop me passing in an invalid date. It turned out that if you run it in quick succession there are still snapshots from the previous instance of foo hanging around. The only way to tell which snapshots belong to this instance is to filter on the InstanceCreateTime:

result = client.describe_db_snapshots(DBInstanceIdentifier=dbname)
snapshots = result.get('DBSnapshots', [])
snapshots = filter(
    lambda s: s['InstanceCreateTime'] == db['InstanceCreateTime'],
    snapshots,
)
snapshots.sort(key=lambda snapshot: snapshot['SnapshotCreateTime'])
if not snapshots or target < snapshots[0]['SnapshotCreateTime']:
    raise ValueError("Can't restore before the first backup")

Grim.

Deleting an Amazon ELB properly

Recently I automated deletion of an Elastic Load Balancer and the Subnet it was in. Fairly straightforward stuff. One slight problem is that an ELB doesn’t have states. When you delete one it disappears immediately. But in the background it is still there. When you try to delete the subnet it was in you get a dependency error. After a couple of minutes it does work.

The same is true if you try to delete a security group that the ELB was using.

What’s going on?

If you look at the Network interfaces view in the EC2 control panel after you’ve deleted an ELB you will see that its network interfaces still exist and are cleaned up in the background. Internally it’s in a ‘deleting’ state, but we just can’t see that in the ELB API.

We can simulate it to some extent by polling the network interfaces API. It turns out (thanks Andrew @ AWS support) that ELB sets the description of its ENI’s to ‘ELB yournamehere’. So with botocore we can do something like:

description = "ELB " + your_balancers_name
for i in range(120):
    interfaces = client.describe_network_interfaces(
        Filters=[
            {"Name": "description", "Values": [description]},
        ]
    ).get('NetworkInterfaces', [])
    if len(interfaces) == 0:
        return
    time.sleep(1)
raise Exception("ELB not deleted after 2 minutes")

This is now done automatically when deleting an ELB with Touchdown. It won’t consider an ELB deleted until all of its network interfaces have been cleaned up. So you shouldn’t get any errors deleting subnets or security groups!

Support told me this behaviour is unlikely to change (and has been that way for at least 2 years), and they have fed back internally that the possible values for the ENI description field should be documented (they aren’t right now).

Making a node virtualenv

These days npm ships with nodejs, which you can install on ubuntu with::

sudo add-apt-repository ppa:chris-lea/node.js -y
sudo apt-get update
sudo apt-get install nodejs

By default I have 2 ways of using npm::

sudo npm -g install bower

This (depending on how and where npm was installed) installs itself into /usr/local. The thing you installs goes on your PATH. But I don’t really let anything near /usr unless it’s in a Debian package. So I can also do::

npm install bower

This will create a node_modules directory in the current working directory. If i want to run bower I can just run::

node_modules/.bin/bower

Which is a bit yuck. Neither of these really cut it for me - I have been spoiled by virtualenv. Can I npm into my virtualenv - and have . bin/activate work for the node stuff too?

First of all I want a virtualenv::

virtualenv /home/john/myvirtualenv

npm -g tries to write into /usr/local because that is the default prefix config option. But I need to use -g so that I can get the bin/bower to appear. Helpfully you can override the prefix from the command line::

npm -g --prefix /home/john/myvirtualenv install bower

Now when i . bin/activate the JS binaries are available too.

Taking this one step further, you can install npm into a virtualenv and then set the default prefix::

npm -g --prefix /home/john/myvirtualenv install npm

cat > /home/john/myvirtualenv/lib/node_modules/npm/npmrc << EOF
prefix /home/john/myvirtualenv
global true
EOF

Now I can do this::

. /home/john/myvirtualenv/bin/activate
npm install bower lessc requirejs

And the dependencies are installed into that virtualenv - so as long as i have sourced my activate file I can just::

bower

I’ve written virtualenv-js which does the 2 steps for you.

You can use it with mkvirtualenv of virtualenv wrapper by putting it in ~/.virtualenvs/ and adding this to ~/.virtualenvs/postmkvirtualenv::

#!/bin/bash
# This hook is run after a new virtualenv is activated.
~/.virtualenvs/virtualenv-js $VIRTUAL_ENV

(Make sure you chmod +x virtualenv-js). (You could also just save virtualenv-js as postmkvirtualenv if you want).

Greenlets and Gio

I made a thing! It’s like gevent but on top of the GLib stack instead of libev.

Most GLib async code has a sync and async variant, and the same pattern is used. So we can hook into GObjectMeta and automatically wrap them in something like this:

def some_api(*args, **kwargs):
    current = greenlet.getcurrent()

    # Call the function and pass it current.switch as the callback - this
    # is what allows the current coroutine to be resumed
    args = args + (current.switch, None)
    some_api_async(*args, **kwargs)

    # Pause the current coroutine. It will be resumed here when the
    # callback calls current.switch()
    obj, result, _ = current.parent.switch()

    # Actually return the expected value
    return some_api_finish(result)

So all use of the synchronous API’s that have an asynchronous version would be replaced by a asynchronous greenlet version.