Raciness in Amazon RDS backups

I recently wrote a restore script for an AWS RDS instance. With RDS you spin up a new instance from a backup (rather than restoring into the existing instance). So you can:

  1. Rename the original out of the way (foo -> foo-old) then restore the backup in it’s place. The final server will have the same hostname.
  2. Restore the backup with a new name (foo-2). This means it will have a different hostname and you’ll need to deploy a settings change to your apps.

I went with option (1) so that I always knew which database was ‘active’. That meant doing as much validation up front as possible. You don’t want to move the existing database and then a few minutes in to the script have it fail, finding out that you asked it to restore to a point in time last year!

So how do you validate the target restore date? You can’t restore to 30s ago - there is a lag of a few minutes. And obviously you can’t restore before the database existed. But also you can’t restore before the oldest backup.

With botocore the first part of this is easy:

result = client.describe_db_instances(DBInstanceIdentifier=dbname)
db = result['DBInstances'][0]
if target > db['LatestRestorableTime']:
    raise ValueError("The target time is too recent")
if target < db['InstanceCreateTime']:
    raise ValueError('Cannot restore to before the db was created')

Unfortunately there isn’t an EarliestRestorableTime. As far as I can tell you can use the SnapshotCreateTime time of earliest backup:

result = client.describe_db_snapshots(DBInstanceIdentifier=dbname)
snapshots = result.get('DBSnapshots', [])
snapshots.sort(key=lambda snapshot: snapshot['SnapshotCreateTime'])
if not snapshots or target < snapshots[0]['SnapshotCreateTime']:
    raise ValueError("Can't restore before the first backup")

But that’s still not enough. When you are testing your backup restore script you run it a lot. And this validation frequently didn’t stop me passing in an invalid date. It turned out that if you run it in quick succession there are still snapshots from the previous instance of foo hanging around. The only way to tell which snapshots belong to this instance is to filter on the InstanceCreateTime:

result = client.describe_db_snapshots(DBInstanceIdentifier=dbname)
snapshots = result.get('DBSnapshots', [])
snapshots = filter(
    lambda s: s['InstanceCreateTime'] == db['InstanceCreateTime'],
    snapshots,
)
snapshots.sort(key=lambda snapshot: snapshot['SnapshotCreateTime'])
if not snapshots or target < snapshots[0]['SnapshotCreateTime']:
    raise ValueError("Can't restore before the first backup")

Grim.

Deleting an Amazon ELB properly

Recently I automated deletion of an Elastic Load Balancer and the Subnet it was in. Fairly straightforward stuff. One slight problem is that an ELB doesn’t have states. When you delete one it disappears immediately. But in the background it is still there. When you try to delete the subnet it was in you get a dependency error. After a couple of minutes it does work.

The same is true if you try to delete a security group that the ELB was using.

What’s going on?

If you look at the Network interfaces view in the EC2 control panel after you’ve deleted an ELB you will see that its network interfaces still exist and are cleaned up in the background. Internally it’s in a ‘deleting’ state, but we just can’t see that in the ELB API.

We can simulate it to some extent by polling the network interfaces API. It turns out (thanks Andrew @ AWS support) that ELB sets the description of its ENI’s to ‘ELB yournamehere’. So with botocore we can do something like:

description = "ELB " + your_balancers_name
for i in range(120):
    interfaces = client.describe_network_interfaces(
        Filters=[
            {"Name": "description", "Values": [description]},
        ]
    ).get('NetworkInterfaces', [])
    if len(interfaces) == 0:
        return
    time.sleep(1)
raise Exception("ELB not deleted after 2 minutes")

This is now done automatically when deleting an ELB with Touchdown. It won’t consider an ELB deleted until all of its network interfaces have been cleaned up. So you shouldn’t get any errors deleting subnets or security groups!

Support told me this behaviour is unlikely to change (and has been that way for at least 2 years), and they have fed back internally that the possible values for the ENI description field should be documented (they aren’t right now).

Making a node virtualenv

These days npm ships with nodejs, which you can install on ubuntu with::

sudo add-apt-repository ppa:chris-lea/node.js -y
sudo apt-get update
sudo apt-get install nodejs

By default I have 2 ways of using npm::

sudo npm -g install bower

This (depending on how and where npm was installed) installs itself into /usr/local. The thing you installs goes on your PATH. But I don’t really let anything near /usr unless it’s in a Debian package. So I can also do::

npm install bower

This will create a node_modules directory in the current working directory. If i want to run bower I can just run::

node_modules/.bin/bower

Which is a bit yuck. Neither of these really cut it for me - I have been spoiled by virtualenv. Can I npm into my virtualenv - and have . bin/activate work for the node stuff too?

First of all I want a virtualenv::

virtualenv /home/john/myvirtualenv

npm -g tries to write into /usr/local because that is the default prefix config option. But I need to use -g so that I can get the bin/bower to appear. Helpfully you can override the prefix from the command line::

npm -g --prefix /home/john/myvirtualenv install bower

Now when i . bin/activate the JS binaries are available too.

Taking this one step further, you can install npm into a virtualenv and then set the default prefix::

npm -g --prefix /home/john/myvirtualenv install npm

cat > /home/john/myvirtualenv/lib/node_modules/npm/npmrc << EOF
prefix /home/john/myvirtualenv
global true
EOF

Now I can do this::

. /home/john/myvirtualenv/bin/activate
npm install bower lessc requirejs

And the dependencies are installed into that virtualenv - so as long as i have sourced my activate file I can just::

bower

I’ve written virtualenv-js which does the 2 steps for you.

You can use it with mkvirtualenv of virtualenv wrapper by putting it in ~/.virtualenvs/ and adding this to ~/.virtualenvs/postmkvirtualenv::

#!/bin/bash
# This hook is run after a new virtualenv is activated.
~/.virtualenvs/virtualenv-js $VIRTUAL_ENV

(Make sure you chmod +x virtualenv-js). (You could also just save virtualenv-js as postmkvirtualenv if you want).

Greenlets and Gio

I made a thing! It’s like gevent but on top of the GLib stack instead of libev.

Most GLib async code has a sync and async variant, and the same pattern is used. So we can hook into GObjectMeta and automatically wrap them in something like this:

def some_api(*args, **kwargs):
    current = greenlet.getcurrent()

    # Call the function and pass it current.switch as the callback - this
    # is what allows the current coroutine to be resumed
    args = args + (current.switch, None)
    some_api_async(*args, **kwargs)

    # Pause the current coroutine. It will be resumed here when the
    # callback calls current.switch()
    obj, result, _ = current.parent.switch()

    # Actually return the expected value
    return some_api_finish(result)

So all use of the synchronous API’s that have an asynchronous version would be replaced by a asynchronous greenlet version.

Patterns for asynchronous javascript

Isn’t writing synchronous code nice?

function do_stuff () {
  do_thing_one ();
  do_thing_two ();
  do_thing_three ();
}

But synchronous is bad! Bad, bad, bad. So then came async. The simple pattern is to pass a callback to the function you are calling. It’s not as bad in JS because we just can do this:

function do_stuff () {
  do_thing_one ( function (result) {
    do_thing_two ( function (result) {
      do_thing_three ( function (result) {
      });
    });
  });
}

I’ve left out error handling. That really depends on the library.. But i imagine its messy. Now let’s see GIO style async.

function do_stuff () {
  do_thing_one_async ( function (ar) {
    var result = do_thing_one_finish (ar);
    do_thing_two_async ( function (ar) {
      var result = do_thing_two_finish (ar);
      do_thing_three_async (function (ar) {
        var result = do_thing_three_finish (ar);
      });
    });
  });
}

I like this a lot better than how i’d do it in python. But wouldn’t it be nice if you could write async code something like this?

var do_stuff = async (function () {
  var result = yield do_thing_one ();
  yield do_thing_two ();
  yield do_thing_three ();
});

Or even:

var do_stuff = async (function () {
  var result = yield do_thing_one ();
  yield do_thing_two ();
  try{
    yield do_thing_three ();
  } catch (e) {
    print("Exception handled");
  }
});

You can in python. You can in vala. And for JS? Well, I was going to say now you can. But while I was looking for a good Vala link, I noticed Alex already did something like this over a year ago. D’oh.

What would be really nice is if the async wrappers could be generated automatically by GI. I had a first stab at this by simply parsing the GIR xml with E4X and providing an alternative import mechanism (thanks for the suggestion jdahlin). However to get full coverage i’d have to consider interfaces and inspect every object that implements an interface as it lands in JavaScript land to ensure it is wrapped. Ew.