Multi-core twisted with systemd socket activation

With a stateless Twisted app you can scale by adding more instances. Unless you are explicitly offloading to subprocesses you will often have spare cores on the same box as your existing instance. But to exploit them you end up running haproxy or faking a load balancer with iptables.

With the SO_REUSEPORT socket flag multiple processes can listen on the same port, but this isn’t available from twisted (yet). But with systemd and socket activation we can use it today.

As a proof of concept we’ll make a 4 core HTTP static web service. In a fresh Ubuntu 16.04 VM install python-twisted-web.

In /etc/systemd/system/web@.socket:

[Unit]
Description=Socket for worker %i

[Socket]
ListenStream = 8080
ReusePort = yes
Service = www@%i.service

[Install]
WantedBy = sockets.target

And in /etc/systemd/system/web@.service

[Unit]
Description=Worker %i
Requires=www@%i.socket

[Service]
Type=simple
ExecStart=/usr/bin/twistd --nodaemon --logfile=- --pidfile= web --port systemd:domain=INET:index:0 --path /tmp
NonBlocking=yes
User=nobody
Group=nobody
Restart=always

Then to get 4 cores:

$ systemctl enable --now [email protected]
$ systemctl enable --now [email protected]
$ systemctl enable --now [email protected]
$ systemctl enable --now [email protected]

Lets test it. In a python shell:

import urllib
import time

while True:
    urllib.urlopen('http://172.16.140.136:8080/').read()
    time.sleep(1)

And in another terminal you can tail the logs with journalctl:

$ sudo journalctl -f -u www@*.service
Apr 26 02:43:51 ubuntu twistd[10441]: 2017-04-26 02:43:51-0700 [-] - - - [26/Apr/2017:09:43:51 +0000] "GET / HTTP/1.0" 200 2081 "-" "Python-urllib/1.17"
Apr 26 02:43:52 ubuntu twistd[10441]: 2017-04-26 02:43:52-0700 [-] - - - [26/Apr/2017:09:43:52 +0000] "GET / HTTP/1.0" 200 2081 "-" "Python-urllib/1.17"
Apr 26 02:43:53 ubuntu twistd[10444]: 2017-04-26 02:43:53-0700 [-] - - - [26/Apr/2017:09:43:53 +0000] "GET / HTTP/1.0" 200 2081 "-" "Python-urllib/1.17"
Apr 26 02:43:54 ubuntu twistd[10452]: 2017-04-26 02:43:54-0700 [-] - - - [26/Apr/2017:09:43:54 +0000] "GET / HTTP/1.0" 200 2081 "-" "Python-urllib/1.17"
Apr 26 02:43:55 ubuntu twistd[10452]: 2017-04-26 02:43:55-0700 [-] - - - [26/Apr/2017:09:43:55 +0000] "GET / HTTP/1.0" 200 2081 "-" "Python-urllib/1.17"
Apr 26 02:43:56 ubuntu twistd[10447]: 2017-04-26 02:43:56-0700 [-] - - - [26/Apr/2017:09:43:56 +0000] "GET / HTTP/1.0" 200 2081 "-" "Python-urllib/1.17"
Apr 26 02:43:57 ubuntu twistd[10450]: 2017-04-26 02:43:57-0700 [-] - - - [26/Apr/2017:09:43:57 +0000] "GET / HTTP/1.0" 200 2081 "-" "Python-urllib/1.17"

As you can see the twisted[pid] changes as different cores handle requests.

If you deploy new code you can systemctl restart www@*.service to restart all cores.

systemctl enable will mean the 4 cores are available on next boot, too.

Building the Linux kernel on a Mac inside Docker: Attempt #2

Todays failure is about xargs. For whatever reason inside a qemu-user-static environment inside docker it can no longer do its part to help build a kernel:

  CLEAN   arch/arm/boot
/usr/bin/xargs: rm: Argument list too long
Makefile:1502: recipe for target 'clean' failed
make[2]: *** [clean] Error 126
make[2]: Leaving directory '/src/debian/linux-source-4.7.0/usr/src/linux-source-4.7.0'
debian/ruleset/targets/source.mk:35: recipe for target 'debian/stamp/install/linux-source-4.7.0' failed
make[1]: *** [debian/stamp/install/linux-source-4.7.0] Error 2
make[1]: Leaving directory '/src'
debian/ruleset/local.mk:96: recipe for target 'kernel_source' failed

It looks like I need to patch the kernels Makefile to workaround some limit qemu is introducing.

Building the Linux kernel on a Mac inside Docker: Attempt #1

I’ve recently been using my Docker for Mac install to try my hand at packaging the latest upstream kernel for my Raspberry Pi. I have a Debian Jessie ARM rootfs with kernel-package installed and the ideas was to run make-kpkg on an OSX-local checkout. It was able to build the main kernel but then:

make[3]: *** Documentation/Kbuild: Is a directory.  Stop.
Makefile:1260: recipe for target '_clean_Documentation' failed
make[2]: *** [_clean_Documentation] Error 2
make[2]: Leaving directory '/src/debian/linux-source-4.7.0/usr/src/linux-source-4.7.0'
debian/ruleset/targets/source.mk:35: recipe for target 'debian/stamp/install/linux-source-4.7.0' failed
make[1]: *** [debian/stamp/install/linux-source-4.7.0] Error 2
make[1]: Leaving directory '/src'
debian/ruleset/local.mk:96: recipe for target 'kernel_source' failed
make: *** [kernel_source] Error 2

The moral of this story is to not try and build kernels on case insensitive filesystems.

Finding when a file was added to Git and when it was last changed

I recently built a visualisation of all Django migrations in a project and the dependencies between them. I was most interested in recent migrations, and in particular if a migration had been changed after it had been deployed. So adding the tag a migration was introduced in (and the tag it was last modified in) seemed like a good idea.

My first attempt was to query each migration with git log with a diff filter to find out when it was added. Then I could use git tag to see which tags it was in:

$ git log --format="format:%H" --follow --diff-filter=A touchdown/core/adapters.py
184d8e88017726e695ee9cb22e428b667f6d22de
$ git tag --contains 184d8e88017726e695ee9cb22e428b667f6d22de
0.0.1
0.0.10
0.0.11
0.0.12
0.0.13
0.0.14
<snip>

This was slow. I was traversing the same log again and again 100’s of times. So for version 2 I traverse the log in tag order just once.. What changed between 0.0.1 and 0.0.2? What changed between 0.0.2 and 0.0.3?. If it’s changed and i’ve never seen it before then it must be a new file. The new version is much faster.

import subprocess
from distutils.version import StrictVersion


# Finds the root commit of the repository
tags = [
    subprocess.check_output([
        "git", "rev-list", "--max-parents=0", "HEAD"
    ]).strip()
]
# Every tag in order
tags.extend(
    sorted(
        subprocess.check_output(["git", "tag", "-l"]).split(),
        key=StrictVersion,
    ),
)

added = {}
changed = {}

for left, right in zip(tags, tags[1:]):
    files = set(subprocess.check_output([
        "git",
        "show",
        "{}...{}".format(left, right),
        "--no-commit-id",
        "-r",
        "--name-only",
        "--format=format:",
    ]).strip().split())

    for file in files:
        if file not in added:
            added[file] = right
        changed[file] = right


for file, version in added.items():
    print file, version, changed[file]

One additional changed I could make is to remove files from added and changed if they aren’t in git ls-files {tag}.

Blogging with the aid of docker-compose

Much has been said about Docker, but for me the most transformational aspect of it has been for my dev boxes. I’ve been using GitHub Pages for a while but i’ve always resisted testing with jekyll locally - I don’t want to mess around with gem and have it make a mess of a fairly pristine install just so I can blog something. With Docker i’ve finally moved past this: today I added a docker-compose.yml to this repo:

version: '2'
services:
  jekyll:
    image: jekyll/jekyll:pages
    command: jekyll serve --drafts --watch -H 0.0.0.0
    volumes:
      - .:/srv/jekyll
    ports:
      - "4000:4000"

When I run docker-compose up my checkout is mounted in a docker container and port 4000 is available on 127.0.0.1 to view a preview of my blog. As I edit files jekyll automatically updates itself. I just Ctrl+C when i’m done, and if i want to really clean up then I can finish off with docker-compose down.

And this is with Docker for Mac, running a linux container transparently on an OS X machine. And yes, folder watching is working just as well as on Linux! And port forwarding works just as well too.