vagrant, ansible local and docker

This is a minor annoyance to people who want to work with docker on their vagrant boxes and provision them with the ansible_local provisioner.

To have docker installed in your box, you simply need to enable the docker provisoner in your Vagrantfile:

config.vm.provision "docker", run: "once"

Since you’re using the ansible_local provisiner, you might skip this and write a task that installs docker from or wherever it suits you anyway, but I prefer this as vagrant knows how to best install docker onto itself.

Now obviously you can have the provisioner pull images for you, but for any crazy reason you want to pass most, if not all, of the provisioning to ansible. And thus you want to use among others the docker_image module. So you write something like:

- name: install python-docker
  become: true
    name: python-docker
    state: present

- name: install docker images
    name: busybox

Well this is going to greet you with an error message when you up the machine for the fist time:

Error message

TASK [install docker images] ***************************************************
fatal: [default]: FAILED! => {“changed”: false, “msg”: “Error connecting: Error while fetching server API version: (‘Connection aborted.’, error(13, ‘Permission denied’))”}
to retry, use: –limit @/vagrant/ansible.retry

Whereas when you happily run vagrant provision right away:

TASK [install docker images] ***************************************************
changed: [default]

Why does this happen? Because even though the installation of docker makes the vagrant user a member of group docker, this becomes effective with the next login.

The quickest way to bypass this is to make that part of your first run of ansible provisioning as super user:

- name: install docker images
  become: true  
    name: busybox

I am using the docker_image module only as an example here for lack of a better example with other docker modules on a Saturday morning. Pulling images is something that is of course very easy to do with the vagrant docker provisioner itself.

default: Running ansible-playbook…

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [default]

TASK [install python-docker] ***************************************************
changed: [default]

TASK [install docker images] ***************************************************
changed: [default]

PLAY RECAP *********************************************************************
default : ok=3 changed=2 unreachable=0 failed=0


Vagrant, ansible_local and pip

I try to provision my Vagrant boxes with the ansible_local provisioner. The other day I was using the pip ansible module while I was booting the box, but was getting errors while installing packages. It turns out that the pip version I had when I created the environment needed an upgrade. Sure you can run a pip install pip --upgrade from the command line, but how do you do so within a playbook? Pretty easy it seems:

- hosts: all
    - name: create the needed virtual environment and upgrade pip
        chdir: /home/vagrant
        virtualenv: work
        virtualenv_command: /usr/bin/python3 -mvenv
        name: pip
        extra_args: --upgrade

    - name: now install the requirements
        chdir: /home/vagrant
        virtualenv: work
        virtualenv_command: /usr/bin/python3 -mvenv
        requirements: /vagrant/requirements.txt

(Link to pastebin here in case the YAML above does not render correctly for you.)

I hope it helps you too.

Happy sysadmin day

Hello and happy SysAdmin day. The baby swing bellow is from 2011. While at rest it looks like a safe swing, it is not. The chains latch too close to the middle and it is very easy for the seat to revolve around a second horizontal axis while swinging. You can understand how I know.

It is SysAdmin day today. We make sure the chains latch properly so your software runs without extra revolutions.

You’re welcome :)

baby swing


unbound, python and conditional replies based on source IP address

We’re using unbound internally for DNS resolution. It works smoothly and allows for some DNS tricks when you want to implement some split-brain trickery, but not a complete split-brain deployment.  The other day we needed to send out conditional replies based on the IP address of the querying machine.  Unbound comes with a python module but it has some of the weirdest, unhelpful documentation ever.  I am not alone in believing this.

It is very hard to figure out the source IP address of a DNS query using the unbound python library. My first pointer on how to do so was on ServerFault.  I have uploaded my own version of an operate function at pastebin. The code in question that you need to consider is:

# Find out source IP address
rl = qstate.mesh_info.reply_list
while (rl):
  if rl.query_reply:
    q = rl.query_reply
  rl =

# Careful with this conditional
try: addr = q.addr
except NameError: addr = None

The try … except handling is needed because I found out that sometimes the q.addr may not be defined and thus further down the line you may be bitten by an abnormal exit of your script.

Update: two friends have suggested that I change the while loop with a more Pythonic list comprehension:

q = next((x for x in qstate.mesh_info.reply_list if x.query_reply), None)
try: addr = q.query_reply.addr
except NameError: addr = None

One of them actually has a pretty cool pastebin about it.

Your first steps installing Graylog

A new colleague needed some help to setup a Graylog installation. He had never done this before, so he asked for assistance. What follows is a rehash of an email I sent him on how to proceed and build knowledge on the subject:

So initially I had zero knowledge of Graylog. What I did to accustom myself with it was to download an OVA file with a prepared virtual machine and run it via VMware Fusion. The same VM can also be imported to VirtualBox and even to AWS, although they also provide ready AMIs for deployment in AWS.  Links:
Keep in mind that this is a full installation of what Graylog needs to work with and it also comes with a handly little script named “graylog-ctl” that manipulates a lot of configuration. The big catch is that graylog-ctl is not part of any standard Graylog deployment. It only comes with the OVA and the AMI images.
So after I had some fun with it on a VM on my workstation, reading the documentation and testing stuff, I had an initial deployment of the AMI image in AWS. But this is not an installation that can scale.  Which brings us to the next steps:
  • For Graylog to work you need to provide it with a MongoDB and an ElasticSearch database. It is your choice whether these will be clustered for high availability or not, whether they will run in the same machine or not. You control the complete architecture. So in my case I made the following decisions:
  • I am running a MongoDB replica set using three VMs. This is a standard setup as it is described in the MongoDB online documentation. Since it is not password protected, it only accepts connections from the Graylog instance. I used AWS security groups for that.
  • I am using an ElasticSearch cluster with three VMs where the nodes are both data and masters. If you can, use 7 nodes, three masters (lower machines since they do not run queries and do not index any data) and four data nodes (higher end machines). Again, since this is not password protected, I used AWS security groups to allow access only from the Graylog instance.
  • I am running a single Graylog instance on a separate VM. Currently it only listens for syslog stuff.  When the need arises, I will add a two more nodes to increase the availability.  I think I changed as many as four or five lines in the main configuration file. Graylog uses MongoDB to store its configuration, which includes anything you configure via the web interface.
  • Pay extra attention to the versions of ElasticSearch and MongoDB that your Graylog version requires. Use exactly what is mentioned in the documentation. For example in my case I am not running ES 6.x but the latest 5.x.
Now it is time to up your game. Once you see that your installation is working you have to decide whether to password protect access to MongoDB and ElasticSearch and whether to encrypt traffic between all those instances or not. I say give it a go.
I’ve not even touched issues like database management for Mongo and Elastic, backing them up, restoring, deleting indices, etc because this is post from zero to your first week testing Graylog.  There is plenty of stuff out there to take you to the next level, once you get used to the complexity of the software involved.
Should you need any more help, ping me anytime.

terraform, route53 and lots of records

At work we try to manage as much as we can with terraform. This also includes Route53 for zones and records. In a certain situation we had about 14 zones and 1476 records managed in a single state file.

As it happened I needed a zone recreated (but not erased) and this affected about 409 records. Well deleting them with terraform apply took ages. To the point that the temporary STS token expired and botched the process.  So after a little facepalming, I decided to cleanup the zone from the AWS console and then issue a batch of terraform state rm to reconcile the state. Happily, after that, apply took its time (but reasonably) and all was well.

I am thinking that next time I am faced with such a situation, to lock the state file in Dynamo, copy it over from S3, manipulate it locally, unlock and run a plan to see how it all plays out. Or, wherever I can, use a state per zone instead of a state file encompassing a set of zones.

Rule 110 in Haskell

Because does not always render MarkDown properly, you may need to read a copy of this post here.

One of the things I learned by reading AIM 239 is the Game of Life and Cellular Automata. One particular kind of one dimensional cellular automata, Rule 110 popped by my twitter stream the other day, so I thought I could try and code it with the minimal Haskell subset that I can handle.

Rule 110 is special because it is proven to be able to simulate a Turing machine. Head over to its Wikipedia page if you want to learn more about the proof and the interesting story around it.

Rule 110 starts with a string of zeros and ones and a transition table that decides the next state of the automaton. If you put each line of the strings after the other, interesting patterns can emerge. Let’s see the transition state:

Current pattern 111 110 101 100 011 010 001 000
New state for center cell 0 1 1 0 1 1 1 0

If you look closely, you can use a list of eight digits and its index in order to encode the above state transitions:

rule110 = [
  0, -- ((0,0,0), 0)
  1, -- ((0,0,1), 1)
  1, -- ((0,1,0), 1)
  1, -- ((0,1,1), 1)
  0, -- ((1,0,0), 0)
  1, -- ((1,0,1), 1)
  1, -- ((1,1,0), 1)
  0  -- ((1,1,1), 0)
  ] :: [Int]

But what about the transitions of the leftmost and rightmost digit you might think. Let’s assume that their missing neighbor is zero. Therefore, given an initial state and a rule that governs the transitions, we may calculate the next state with:

nextState :: [Int] -> [Int] -> [Int]
nextState state rule =
  [ rule !! x |
    let t = [0] ++ state ++ [0],
    i <- [1..(length(t)-2)],
    let x = (t !! (i-1)) * 4 + (t !! i) * 2 + (t !! (i+1))

-- construct an infinite sequence of next states
sequenceState :: [Int] -> [Int] -> [[Int]]
sequenceState state rule =
  [state] ++ sequenceState (nextState state rule) rule


*Main> state = [0,1,1,0]
*Main> nextState state rule110

One of the most interesting patterns occurs when we begin with the right most digit being 1 and all the rest being zeros:

*Main> state = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1] :: [Int]
*Main> x = take 30 $ sequenceState state rule110
*Main> showState x
                          ** *
                        **   *
                       ***  **
                      ** * ***
                     ******* *
                    **     ***
                   ***    ** *
                  ** *   *****
                 *****  **   *
                **   * ***  **
               ***  **** * ***
              ** * **  ***** *
             ******** **   ***
            **      ****  ** *
           ***     **  * *****
          ** *    *** ****   *
         *****   ** ***  *  **
        **   *  ***** * ** ***
       ***  ** **   ******** *
      ** * ******  **      ***
     *******    * ***     ** *
    **     *   **** *    *****
   ***    **  **  ***   **   *
  ** *   *** *** ** *  ***  **
 *****  ** *** ****** ** * ***
**   * ***** ***    ******** *

The output was somehow pretty printed:

showState [] = return ()
showState state = do
  -- putStrLn $ show (state !! 0)
  putStrLn $ [ c | d <- (state !! 0), let c = if d == 0 then ' ' else '*' ]
  showState $ tail state

I wish I can find time and play more with cellular automata. I kind of find a day every five years or so.

Update: Here is a pattern using Rule 90:

                           * *
                         * *
                        *   *
                       * * * *
                     * *
                    *   *
                   * * * *
                  *       *
                 * *     * *
                *   *   *   *
               * * * * * * * *
             * *
            *   *
           * * * *
          *       *
         * *     * *
        *   *   *   *
       * * * * * * * *
      *               *
     * *             * *
    *   *           *   *
   * * * *         * * * *
  *       *       *       *
 * *     * *     * *     * *
*   *   *   *   *   *   *   *