10 Aug 2016
Todays failure is about xargs
. For whatever reason inside a qemu-user-static
environment inside docker it can no longer do its part to help build a kernel:
CLEAN arch/arm/boot
/usr/bin/xargs: rm: Argument list too long
Makefile:1502: recipe for target 'clean' failed
make[2]: *** [clean] Error 126
make[2]: Leaving directory '/src/debian/linux-source-4.7.0/usr/src/linux-source-4.7.0'
debian/ruleset/targets/source.mk:35: recipe for target 'debian/stamp/install/linux-source-4.7.0' failed
make[1]: *** [debian/stamp/install/linux-source-4.7.0] Error 2
make[1]: Leaving directory '/src'
debian/ruleset/local.mk:96: recipe for target 'kernel_source' failed
It looks like I need to patch the kernels Makefile
to workaround some limit qemu is introducing.
09 Aug 2016
I’ve recently been using my Docker for Mac install to try my hand at packaging the latest upstream kernel for my Raspberry Pi. I have a Debian Jessie ARM rootfs with kernel-package
installed and the ideas was to run make-kpkg
on an OSX-local checkout. It was able to build the main kernel but then:
make[3]: *** Documentation/Kbuild: Is a directory. Stop.
Makefile:1260: recipe for target '_clean_Documentation' failed
make[2]: *** [_clean_Documentation] Error 2
make[2]: Leaving directory '/src/debian/linux-source-4.7.0/usr/src/linux-source-4.7.0'
debian/ruleset/targets/source.mk:35: recipe for target 'debian/stamp/install/linux-source-4.7.0' failed
make[1]: *** [debian/stamp/install/linux-source-4.7.0] Error 2
make[1]: Leaving directory '/src'
debian/ruleset/local.mk:96: recipe for target 'kernel_source' failed
make: *** [kernel_source] Error 2
The moral of this story is to not try and build kernels on case insensitive filesystems.
26 Jul 2016
I recently built a visualisation of all Django migrations in a project and the dependencies between them. I was most interested in recent migrations, and in particular if a migration had been changed after it had been deployed. So adding the tag a migration was introduced in (and the tag it was last modified in) seemed like a good idea.
My first attempt was to query each migration with git log
with a diff filter to find out when it was added. Then I could use git tag
to see which tags it was in:
$ git log --format="format:%H" --follow --diff-filter=A touchdown/core/adapters.py
184d8e88017726e695ee9cb22e428b667f6d22de
$ git tag --contains 184d8e88017726e695ee9cb22e428b667f6d22de
0.0.1
0.0.10
0.0.11
0.0.12
0.0.13
0.0.14
<snip>
This was slow. I was traversing the same log again and again 100’s of times. So for version 2 I traverse the log in tag order just once.. What changed between 0.0.1 and 0.0.2? What changed between 0.0.2 and 0.0.3?. If it’s changed and i’ve never seen it before then it must be a new file. The new version is much faster.
import subprocess
from distutils.version import StrictVersion
# Finds the root commit of the repository
tags = [
subprocess.check_output([
"git", "rev-list", "--max-parents=0", "HEAD"
]).strip()
]
# Every tag in order
tags.extend(
sorted(
subprocess.check_output(["git", "tag", "-l"]).split(),
key=StrictVersion,
),
)
added = {}
changed = {}
for left, right in zip(tags, tags[1:]):
files = set(subprocess.check_output([
"git",
"show",
"{}...{}".format(left, right),
"--no-commit-id",
"-r",
"--name-only",
"--format=format:",
]).strip().split())
for file in files:
if file not in added:
added[file] = right
changed[file] = right
for file, version in added.items():
print file, version, changed[file]
One additional changed I could make is to remove files from added
and changed
if they aren’t in git ls-files {tag}
.
24 Jul 2016
Much has been said about Docker, but for me the most transformational aspect of it has been for my dev boxes. I’ve been using GitHub Pages for a while but i’ve always resisted testing with jekyll locally - I don’t want to mess around with gem
and have it make a mess of a fairly pristine install just so I can blog something. With Docker i’ve finally moved past this: today I added a docker-compose.yml
to this repo:
version: '2'
services:
jekyll:
image: jekyll/jekyll:pages
command: jekyll serve --drafts --watch -H 0.0.0.0
volumes:
- .:/srv/jekyll
ports:
- "4000:4000"
When I run docker-compose up
my checkout is mounted in a docker container and port 4000 is available on 127.0.0.1
to view a preview of my blog. As I edit files jekyll automatically updates itself. I just Ctrl+C
when i’m done, and if i want to really clean up then I can finish off with docker-compose down
.
And this is with Docker for Mac, running a linux container transparently on an OS X machine. And yes, folder watching is working just as well as on Linux! And port forwarding works just as well too.
21 Jul 2016
AWS Lambda is conceptually really cool but as soon as your code creeps beyond a single python file that uses botocore
things start to get messy and cumbersome. It’s tempting to add an entirely new tool to your workflow, but theres really no need.
The approach I use is good old make
. It’s a perfect fit really. We have input files:
- A
requirements.txt
or some other definition of our dependencies
- Some code that is checked in along side the
requirements.txt
- Possibly some configuration that needs to be bundled alongside the code
- An entrypoint - such as
lambda_handler.py
We want to take these and assemble a lambda.zip
.
One of the nice things about this setup is that when you run make
it will only update the things that have changed. This means that the requirements.txt
gather step only needs to be run once - rebuilding the zip files can actually be really quick.
Make 101
If you are familiar with how a Makefile
is plumbed together you can skip this bit. A Makefile
is a collection of build targets and the rules for how to build those targets.
lambda.zip: lambda_handler.py
mkdir -p build/lambda_zip
cp lambda_handler.py build_lambda_zip/
rm -f lambda.zip
cd build_lambda_zip/ && zip -q -X -9 -r ../lambda.zip *
In this example, lambda.zip
is the target. make
is responsible for generating that target, and if any of the dependencies listed (lambda_handler.py
in this example) are newer than lambda.zip
it knows it needs to recreate the zip.
One very important thing is that a Makefile
must be tab indented.
Sometimes there isn’t a single file that is generated by a build step. Sometimes there might not even be a file. For example, you might want to upload a build artifact only when something has changed. The make
idiom for this is to use a stamp file. A stamp file is a 0 byte marker that indicates some process has been completed at a give date and time. So for example:
upload.stamp: lambda.zip
aws lambda update-function-code --function-name MyFunction --zip-file lambda.zip
touch $@
The build target is upload.stamp
. The target needs building every time lambda.zip is updated. awscli
is used to do a code upload, then touch $@
creates the stamp file (or updates its modification timestamp). This upload is now idempotent.
There are some special rules in make
. These are rules that don’t have targets on disk. For example, make clean
. Without some configuration hint make would believe that you wanted to create a file called clean
. If you happened to have a file called clean
then make would think that the build was up to date and that it didn’t need to clean anything. What this means is that we need targets that are always built. These are called .PHONY
targets, and you need to include a declaration in your Makefile
like this:
Basic Makefile structure
We’ll look at the basic skaffold first before delving into specifics.
I declare a bunch of paths at the top of my Makefile
. They are all relative to the cwd which i grab with $(shell pwd)
:
SRC_DIR=$(shell pwd)
BUILD_DIR=$(SRC_DIR)/build
STAGING_DIRECTORY_STAMP=$(BUILD_DIR)/staging-directory-stamp
STAGING_DIRECTORY=$(BUILD_DIR)/staging
OUTPUT_ZIP=$(BUILD_DIR)/lambda.zip
The all
target defines what should happen if you just run make
with no arguments. We let make know about our .PHONY
rules too:
all: $(OUTPUT_ZIP)
.PHONY: all clean
make clean
needs to delete any files that were created by running make
:
clean:
rm -f $(STAGING_DIRECTORY_STAMP)
rm -rf $(STAGING_DIRECTORY)
rm -f $(OUTPUT_ZIP)
We have a build step to generate a staging directory when the lambda_handler.py
code changes:
$(STAGING_DIRECTORY_STAMP): $(SRC_DIR)/lambda_handler.py
rm -rf $(STAGING_DIRECTORY)
mkdir $(STAGING_DIRECTORY)
cp $(SRC_DIR)/lambda_handler.py $(STAGING_DIRECTORY)/
touch $@
And then we zip it up as $cwd/build/lambda.zip
:
$(OUTPUT_ZIP): $(STAGING_DIRECTORY_STAMP)
rm -f $(OUTPUT_ZIP)
cd $(STAGING_DIRECTORY) && zip -q -9 -r $(OUTPUT_ZIP) *
Collecting and extracting wheels
We want to collect all the eggs in requirements.txt
. We’ll use the pip wheel
command to do any compilation and build a wheelhouse. Subsequent builds can reuse the same wheels and avoid compilation:
$(CACHE_WHEELHOUSE_STAMP): $(SRC_DIR)/requirements.txt
pip wheel -q -r requirements.txt . --wheel-dir=$(CACHE_WHEELHOUSE) --find-links=$(CACHE_WHEELHOUSE)
touch $@
We want to preserve the built wheels as much as we can, but we don’t have a mechanism to purge old wheels. Because we want to be able to get just the wheels related to the current requirements.txt
we use a second wheelhouse that we delete before repopulating it. By using the first wheelhouse as a --find-links
this is pretty much a straight copy and fast:
$(STAGING_WHEELHOUSE_STAMP): $(CACHE_WHEELHOUSE_STAMP)
rm -rf $(STAGING_WHEELHOUSE)
pip wheel -q -r requirements.txt . --wheel-dir=$(STAGING_WHEELHOUSE) --find-links=$(CACHE_WHEELHOUSE)
touch $@
Now the best part of collecting wheels like this is that we can just unzip them into the build directory and they will be in the correct location:
$(STAGING_DIRECTORY_STAMP): $(STAGING_WHEELHOUSE_STAMP)
rm -rf $(STAGING_DIRECTORY)
unzip -q "$(STAGING_WHEELHOUSE)/*.whl" -d $(STAGING_DIRECTORY)
touch $@
Reproducibility and Idempotence
One nice property of this is theoretically if a build is run twice on the same base OS then you should get the same output, bit for bit. And this should mean use can use the CodeSha256
property returned from the Lambda API to not only prove what is deployed is what you think it is but also build in idempotence. However its not that simple.
If your zip building process is not creating identical output you can use the Debian diffoscope
utility to help figure out what went wrong. Here are some things we spotted and fixed.
First we need to add an extra parameter to our zip
incantation:
$(OUTPUT_ZIP): $(STAGING_DIRECTORY_STAMP)
rm -f $(OUTPUT_ZIP)
cd $(STAGING_DIRECTORY) && zip -q -X -9 -r $(OUTPUT_ZIP) *
This turns on --no-extra
mode. This tells zip to ignore non-essential extra file attributes. By default these extra attributes introduce some non-determinism, so we just get rid of them.
Next up is that when a wheel is unpacked the mtime
of the directories that are created are the current time. This metadata is preserved in the zip, but isn’t interesting or useful to us. I pick an arbitrary date (in this case the mtime of the last commit) and clamp the modification timestamps:
BUILD_DATE=$(shell git log --date=local -1 --format="@%ct")
$(STAGING_DIRECTORY_STAMP): $(STAGING_WHEELHOUSE_STAMP)
rm -rf $(STAGING_DIRECTORY)
unzip -q "$(STAGING_WHEELHOUSE)/*.whl" -d $(STAGING_DIRECTORY)
find "$(STAGING_DIRECTORY)" -newermt "$(BUILD_DATE)" -print0 | xargs -0r touch --no-dereference --date="$(BUILD_DATE)"
touch $@
The next problem are .so
files that are generated by the build process. Hopefully you don’t have any, in which case you are done. Right now if you run a setup.py
based compilation of an .so
twice you will get different outputs. Some of this is the use of random /tmp
directories. Right now the easiest way to work around this is just to pre-compile your binary dependencies as wheels and upload them to a private repository. The right fix involves using the learnings of the Reproducible Builds team to make python wheels repeatable.
You should now have reproducible lambda zips.
Bonus targets
As alluded to earlier, we can upload the zip directly to AWS by calling out to awscli
. And why not add a make invoke
to deploy, upload and run our function?
UPLOAD_CODE_STAMP=$(BUILD_DIR)/upload-stamp
$(UPLOAD_CODE_STAMP): $(OUTPUT_ZIP)
aws lambda update-function-code --function-name MyFunction --zip-file lambda.zip
touch $@
upload: $(UPLOAD_CODE_STAMP)
invoke: $(UPLOAD_CODE_STAMP)
aws lambda invoke \
--function-name MyFunction \
--invocation-type RequestResponse \
--payload file://example-payload.json
.PHONY: all clean upload invoke
Because of the dependencies invoke
will build a new lambda.zip if somethings changed and then deploy it, before finally running it. Perfect when developing!