Running redash on Kubernetes

Redash is a very handy tool that allows for you to connect to various data sources and produce interesting graphs. Your BI people most likely love it already.

Redash makes use of Redis, Postgres and a number of services written in Django as can be seen in this example docker-compose.yml file. However, there is very scarce information on how to run it on Kubernetes. I suspect that part of the reason is that while docker-compose.yml makes use of YAML’s merge, kubectl does not allow for this. So there exist templates that make a lot of redundant copies of a large block of lines. There must be a better way, right?

Since the example deployment with docker-compose runs all services on a single host, I decided to run my example deployment in a single pod with multiple containers. You can always switch to a better deployment to suit your needs if you like afterwards.

Next, was my quest on how to deal with the redundancy needed for the environment variables used by the different Redash containers. If only there was a template or macro language I could use. Well the most readily available, with the less installation hassle (if not already on your system) is m4. And you do not have to do weird stuff as you will see. Using m4 allows us to run something like m4 redash-deployment-simple.m4 | kubectl apply -f - and be done with it:

define(redash_environment, `
        - name: PYTHONUNBUFFERED
          value: "0"
        - name: REDASH_REDIS_URL
          value: "redis://"
        - name: REDASH_MAIL_USERNAME
          value: "redash"
        - name: REDASH_MAIL_USE_TLS
          value: "true"
        - name: REDASH_MAIL_USE_SSL
          value: "false"
        - name: REDASH_MAIL_SERVER
          value: ""
        - name: REDASH_MAIL_PORT
          value: "587"
        - name: REDASH_MAIL_PASSWORD
          value: "password"
          value: ""
        - name: REDASH_LOG_LEVEL
          value: "INFO"
        - name: REDASH_DATABASE_URL
          value: "postgresql://redash:redash@"
        - name: REDASH_COOKIE_SECRET
          value: "not-so-secret"
          value: "redash.query_runner.python"

apiVersion: apps/v1
kind: Deployment
  name: redash
    app: redash
  replicas: 1
      app: redash
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
        app: redash
      - name: redis
        image: redis
        - name: redis
          containerPort: 6379
      - name: postgres
        image: postgres:11
        - name: POSTGRES_USER
          value: redash
        - name: POSTGRES_PASSWORD
          value: redash
        - name: POSTGRES_DB
          value: redash
        - name: postgres
          containerPort: 5432
      - name: server
        image: redash/redash
        args: [ "server" ]
        - name: REDASH_WEB_WORKERS
          value: "2"
        - name: redash
          containerPort: 5000
      - name:  scheduler
        image: redash/redash
        args: [ "scheduler" ]
        - name: QUEUES
          value: "celery"
        - name: WORKERS_COUNT
          value: "1"
      - name: schedulded-worker
        image: redash/redash
        args: [ "worker" ]
        - name: QUEUES
          value: "scheduled_queries,schemas"
        - name: WORKERS_COUNT
          value: "1"
      - name: adhoc-worker
        image: redash/redash
        args: [ "worker" ]
        - name: QUEUES
          value: "queries"
        - name: WORKERS_COUNT
          value: "1"
apiVersion: v1
kind: Service
  name: redash-nodeport
  type: NodePort
    app: redash
  - port: 5000
    targetPort: 5000

You can grab redash-deployment.m4 from Pastebin. What we did above was to define the macro redash_environment (with care for proper indentation) and use this in the container definitions in the Pod instead of copy-pasting that bunch of lines four times. Yes, you could have done it with any other template processor too.

You’re almost done. Postgres is not configured so, you need to connect and initialize the database:

$ kubectl exec -it redash-f8556648b-tw949 -c server -- bash
redash@redash-f8556648b-tw949:/app$ ./ database create_tables
redash@redash-f8556648b-tw949:/app$ exit

I used the above configuration to quickly launch Redash on a Windows machine that runs the Docker Desktop Kubernetes distribution. For example no permanent storage for Postgres is defined. In a production installation it could very well be that said Postgres lives outside the cluster, so there is no need for such a container. The same might hold true for the Redis container too.

What I wanted to demonstrate, was that due to this specific circumstance, a 40+ year old tool may come to your assistance without needing to install any other weird templating tool or what. And also how to react in cases where you need !!merge and it is not supported by your parser.

It turns out we do pay postage for our email

Well, not everybody, but some of us we do. Let me explain myself:

12 years ago after reading about TipJoy, a Y Combinator startup, I thought that this might be a scheme that could be used to force mass senders to pay something in order to ensure (via the investment cost to post to my inbox). I just thought of it kind the wrong way, and not in terms of snail-mail. I thought the recipient should be paid to read the email. But anyway TipJoy folded and life did its own thing and this was an idea left to collect dust.

It turns out that a whole industry focused on email delivery sprang in the meantime. IP reputation became a thing and the small server you operated from your house was part of a dialup or DSL pool that was used by spambots. No matter your intentions and your rigor in setting up your email server, your sender’s reputation was next to nothing. The same held true for your ISPs outgoing mail server too. The same is (still) true if you try to setup a VM to a cloud provider (if you’re allowed to send outgoing email at all). Once you needed to know about SMTP and POP3 only (IMAP too if you wanted to be fancier). Now you needed to learn new stuff too, SPF, DKIM, greylisting, RBL, DMARC, the list goes on. That’s how the sending industry was formed, providing guaranteed delivery, metrics like open-rate and more complex analytics.

Here I am now, some years later, and I am running a small mailing list for the Greek Database research community (I am not a database researcher, just a friend of the community). This mailing list always had reachability problems, even when I was running it on infrastructure I totally controlled when I was working on an ISP. Spam bots and inexperienced users always resulted in some sort of blacklisting that a single person postmaster team struggled to handle. There were delivery problems with a lot of personal time devoted to unblock them.

Since 2014 that I quit the postmaster business, I am running the list using Mailgun (It could have been any other M3AAWG member, I just picked them because of Bob Metcalfe mentioning them on twitter sometime). They used to provide a free service and the list was well within those limits, but they changed that. So there’s a monthly cost that varies from $1 to $10 depending the traffic. Delivery has been stellar ever since I switched to Mailgun and the few issues the mailing list had, were glitches from my side.

So it turns out that you do pay some postage to send your email after-all and the M3AAWG members are the couriers.

Moving from vagrant+VirtualBox to WSL2

The past two years I had been working on Windows 10 Home. I had a setup that was really easy for me, using a vagrant box per project and using minikube as my docker host. But …

… I wanted to upgrade to Windows 10 Pro. The Windows upgrade was relatively easy and then started the issues. Hyper-V does not play nice with VirtualBox which was the provider for my vagrant boxes. One option was to

> bcdedit /set hypervisorlaunchtype off

which allowed me to work as before, but one of the reasons to make this upgrade to Pro was to run Docker Desktop natively. With hypervisorlaunchtype set to off this is not possible since WSL2 does not run.

So I took a tarfile of each ~vagrant user per virtual machine, then bcdedit /set hypervisorlaunchtype auto, rebooted and I had both WSL2 and docker desktop operational. You can also have minikube run on top of Hyper-V and of course you can always run the kubernetes that comes with the docker desktop.

Because I did not have an elaborate setup with my VMs, the transition seems to have happened without many issues. I now have one user per project in my WSL and still can keep tarfile backups or whatever else if needed.

As for VMs, I am thinking of giving multipass a chance.

Greenspun’s tenth rule and mutt

Mutt is a mail client that I used probably from 199? to 2001. Prior to that I was using elm and felt that because I could not understand its code, a switch was in order.

So what Greenspun’s tenth rule has to do with a text mail client? Let’s remember what the rule says:

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Yesterday, mutt version 2.o was announced. I only found out about it because a friend pinged me. And sure enough, in the release notes MuttLisp is mentioned, a first version of which appeared here.

Being a mail client, mutt already follows Zawinsky’s Law. Now it follows Greenspun’s tenth rule too.

Today’s #soundtrack

[ I posted this originally on Facebook, but since there’s some dust collected in the blog, well … ]

Today’s #soundtrack is two albums I had not listened to for a long long time:

  • Dreams of Freedom (Ambient Translations of Bob Marley)
  • Chant Down Babylon – a hip-hop remix of Bob Marley songs

As I am writing this, I decided that it is high time I put all my CDs in boxes store them away and make some space. (I parted with my Rap / Hip-Hop collection in 2005; let’s part with the other half 15 years later). For sentimental reasons I will only keep the John Coltrane and JAZZ & TZAZ MAGAZINE ones on display.

We do not even have a proper CD player anymore. We have a Tivoli Model One and its aux port is shared between a Google Audio (where we cast sound) and a portable CD player (also not so frequently used).

Streaming automation has made this almost a no brainer choice. For now at least. I think I own more music than I can listen to anyway. Even on streaming services I rarely look for new releases. I just look for the convenience of recalling something I already own or know about and may not own it for some reason. Like the first 8 albums of Iron Maiden 🙂

The “oil lamp” law

[ Originally a Facebook post, copied here for posterity. ]

Some thirty years ago I was told the story of a server with an oil lamp on the side (the kind that Greek Orthodox people light to honor God and the Saints). It was put there to humor the situation: the server need not break under any circumstance.

Well, it has been my experience of many years, sectors and shops of different sizes, that no matter what, there is always at least one key system that “needs” an oil lamp by its side in the organization. A system that is critical enough to warrant all the attention it gets, yet so critical that nobody risks upgrading / changing / phasing it out during their tenure (the system is guaranteed to outlive them; I count three such systems that have outlived me). Untouchable systems that get replaced only when they physically die.

Seek out who needs an oil lamp. Plan accordingly.

[ There’s another “law” that follows as a result of the oil-lamp, but maybe for another time. ]

sometimes you need to change your docker network

I was working locally with some containers running over docker-compose. Everything was OK, except I could not access a specific service within our network. Here is the issue: docker by default assigns IP addresses to containers from the pool and docker-compose from It just so happened that what I needed to access lived on space. So what to do to overcome the nuisance? Obviously you cannot renumber a whole network for some temporary IP overlapping. Let’s abuse reserved IP space instead. Here is the relevant part of my daemon.json now:

  "default-address-pools": [
    { "base": "", "size": 28 },
    { "base": "", "size": 28 },
    { "base": "", "size": 28 }

According to RFC5737 are reserved for documentation purposes. I’d say local work is close enough to documentation to warrant the abuse, since we also adhere to its operational implications. Plus I wager that most of the people while they always remember classic RFC1918 addresses, seldom take into account TEST-NET-1 and friends.

Happy SysAdmin Day

I’m into this for so many years, I cannot even remember when I started. root, IT guy, system administrator, goto person, SRE, DevOps, whatever acronym life brings next. And all that, because at some point in time while still an undergraduate, I told my friend Panos:

“You see those guys? One day, we’ll be doing their work.”
“Nah”, he said, “they’re gods”.

Because that’s what they looked to us. They still do to me. Because at least one of them, with whom I’ve kept contact, is a moving CS encyclopedia. And then it struck me. They did not really want to do the work. They needed a platform to test all the cool things they read about, in production. They were architects before it was cool.

And that is what their “divine” power was: to know about all things CS. We did end up doing their work.

Somehow, that’s what drives me. I drop things off along the way (“I will not invest in learning this”) but still, I do not drop as many as the typical 9to5er. And that takes its toll, it is painful and once in a while rewarding.

Happy Sysadmin Day.

How to get all the indices in ElasticSearch with Python?

This seems to be a pretty common question. And the most common answer is to use the Python ElasticSearch client and the get_alias() method like this:

import Elasticsearch

es = elasticsearch.Elasticsearch(hosts=[ES_HOST], )
idx_list = [x for x in es.indices.get_alias("*").keys() ]

This is the most common answer one can see in StackOverflow. But ElasticSearch offers us the cat API which is better suited for such a query. So a better way to approach this can be:

import Elasticsearch

es = elasticsearch.Elasticsearch(hosts=[ES_HOST], )
idx_list ='foobar-20*', h='index', s='index:desc').split()

The above example asks an even more elaborate query: Of all the indices, return to us those who match the pattern foobar-20*, return only the index name from the fields that the cat API returns, and by the way, sort the returned index names in descending order.

If the database offers us a way to do things, it is best that we ask it to.