What if my Kubernetes cluster on AWS cannot pull from ECR?

When you run an RKE cluster on AWS, pulling images from an ECR is something to be expected. However, it is not the easiest of things to do that, since the credentials produced by aws ecr get-login expire every few hours and thus, you need something to refresh them. In Kubernetes world this means we should use a CronJob.

What we do in this post is a simple improvement on other work here and here. The biggest difference is using the Bitnami docker images for aws-cli and kubectl to achieve the same result.

So we need a CronJob that will schedule a pod to run every hour and refresh the credentials. This job will:

  • run in a specific namespace and refresh the credentials there
  • have the ability to delete and re-create the credentials
  • do its best not to leak them
  • use "well known" images outside the ECR in question

To this end our Pod needs an initContainer that is going to run aws ecr get-login and store it in an ephemeral space (emptyDir) for the main container to pick it up. The main container in turn, will pick up the password generated by the init Container and complete the credential refreshing.

Are we done yet? No, because by default the service account the Pod operates on, does not have the ability to delete and create credentials. So we need to create an appropriate Role and RoleBinding for this.

All of the above is reproduced in the below YAML. If you do not wish to hardcode the AWS region and ECR URL, you can of course make them environment variables.

---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: dns
  name: ecr-secret-role
rules:
- apiGroups: [""]
  resources:
  - secrets
  - serviceaccounts
  - serviceaccounts/token
  verbs:
  - 'create'
  - 'delete'
  - 'get'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: dns
  name: ecr-secret-rolebinding
subjects:
- kind: ServiceAccount
  name: default
  namespace: dns
roleRef:
  kind: Role
  name: ecr-secret-role
  apiGroup: ""
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: ecr-update-login
  namespace: dns
spec:
  schedule: "38 */1 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          initContainers:
          - name: awscli
            image: bitnami/aws-cli
            command:
            - /bin/bash
            - -c
            - |-
              aws ecr get-login --region REGION_HERE | cut -d' ' -f 6 > /ecr/ecr.token
            volumeMounts:
            - mountPath: /ecr
              name: ecr-volume
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/bash
            - -c
            - |-
              kubectl -n dns delete secret --ignore-not-found ecr-registry
              kubectl -n dns create secret docker-registry ecr-registry --docker-username=AWS --docker-password=$(cat /ecr/ecr.token) --docker-server=ECR_URL_HERE
            volumeMounts:
            - mountPath: /ecr
              name: ecr-volume
          volumes:
          - name: ecr-volume
            emptyDir: {}

You can now use the ECR secret to pull images by adding to your Pod spec:

 imagePullSecrets:
  - name: ecr-registry

And yes, it is possible to solve the same issue with instance profiles:

”ecr:GetAuthorizationToken”,
“ecr:BatchCheckLayerAvailability”,
“ecr:GetDownloadUrlForLayer”,
“ecr:GetRepositoryPolicy”,
“ecr:DescribeRepositories”,
“ecr:ListImages”,
“ecr:BatchGetImage”

This is a solution when for whatever reason, for when you cannot. Plus it may give you ideas for other helpers using the aws-cli and kubectl. Or for other clouds and registries even.

Beware of true in Pod specifications

true is a reserved word in YAML and this can bite you when you least expect it. Consider the following extremely simple Pod and livenessProbe:

apiVersion: v1
kind: Pod
metadata:
  name: true-test
spec:
  containers:
  - name: nginx
    image: nginx
    livenessProbe:
      exec:
        command:
        - true

Let’s create this, shall we?

$ kubectl create -f true.yaml 

Error from server (BadRequest): error when creating "true.yaml": Pod in version "v1" cannot be handled as a Pod: v1.Pod.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.LivenessProbe: v1.Probe.Handler: Exec: v1.ExecAction.Command: []string: ReadString: expects " or n, but found t, error found in #10 byte of ...|ommand":[true]}},"na|..., bigger context ...|age":"nginx","livenessProbe":{"exec":{"command":[true]}},"name":"nginx"}]}}
|...
$

All because in the above specification, true is not quoted, as it should:

    livenessProbe:
      exec:
        command:
        - "true"

A different take, if you do not want to mess with true would be:

   livenessProbe:
      exec:
        command:
        - exit

Beware though, if you want to exit 0 you need to quote again:

   livenessProbe:
      exec:
        command:
        - exit
        - "0"

Oh, the many ways you can waste your time…

Running redash on Kubernetes

Redash is a very handy tool that allows for you to connect to various data sources and produce interesting graphs. Your BI people most likely love it already.

Redash makes use of Redis, Postgres and a number of services written in Django as can be seen in this example docker-compose.yml file. However, there is very scarce information on how to run it on Kubernetes. I suspect that part of the reason is that while docker-compose.yml makes use of YAML’s merge, kubectl does not allow for this. So there exist templates that make a lot of redundant copies of a large block of lines. There must be a better way, right?

Since the example deployment with docker-compose runs all services on a single host, I decided to run my example deployment in a single pod with multiple containers. You can always switch to a better deployment to suit your needs if you like afterwards.

Next, was my quest on how to deal with the redundancy needed for the environment variables used by the different Redash containers. If only there was a template or macro language I could use. Well the most readily available, with the less installation hassle (if not already on your system) is m4. And you do not have to do weird sendmail.cf stuff as you will see. Using m4 allows us to run something like m4 redash-deployment-simple.m4 | kubectl apply -f - and be done with it:

divert(-1)
define(redash_environment, `
        - name: PYTHONUNBUFFERED
          value: "0"
        - name: REDASH_REDIS_URL
          value: "redis://127.0.0.1:6379/0"
        - name: REDASH_MAIL_USERNAME
          value: "redash"
        - name: REDASH_MAIL_USE_TLS
          value: "true"
        - name: REDASH_MAIL_USE_SSL
          value: "false"
        - name: REDASH_MAIL_SERVER
          value: "mail.example.net"
        - name: REDASH_MAIL_PORT
          value: "587"
        - name: REDASH_MAIL_PASSWORD
          value: "password"
        - name: REDASH_MAIL_DEFAULT_SENDER
          value: "redash@mail.example.net"
        - name: REDASH_LOG_LEVEL
          value: "INFO"
        - name: REDASH_DATABASE_URL
          value: "postgresql://redash:redash@127.0.0.1:5432/redash"
        - name: REDASH_COOKIE_SECRET
          value: "not-so-secret"
        - name: REDASH_ADDITIONAL_QUERY_RUNNERS
          value: "redash.query_runner.python"
')

divert(0)
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redash
  labels:
    app: redash
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redash
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: redash
    spec:
      containers:
      - name: redis
        image: redis
        ports:
        - name: redis
          containerPort: 6379
      - name: postgres
        image: postgres:11
        env:
        - name: POSTGRES_USER
          value: redash
        - name: POSTGRES_PASSWORD
          value: redash
        - name: POSTGRES_DB
          value: redash
        ports:
        - name: postgres
          containerPort: 5432
      - name: server
        image: redash/redash
        args: [ "server" ]
        env:
        - name: REDASH_WEB_WORKERS
          value: "2"
        redash_environment
        ports:
        - name: redash
          containerPort: 5000
      - name:  scheduler
        image: redash/redash
        args: [ "scheduler" ]
        env:
        - name: QUEUES
          value: "celery"
        - name: WORKERS_COUNT
          value: "1"
        redash_environment
      - name: schedulded-worker
        image: redash/redash
        args: [ "worker" ]
        env:
        - name: QUEUES
          value: "scheduled_queries,schemas"
        - name: WORKERS_COUNT
          value: "1"
      - name: adhoc-worker
        image: redash/redash
        args: [ "worker" ]
        env:
        - name: QUEUES
          value: "queries"
        - name: WORKERS_COUNT
          value: "1"
        redash_environment
---
apiVersion: v1
kind: Service
metadata:
  name: redash-nodeport
spec:
  type: NodePort
  selector:
    app: redash
  ports:
  - port: 5000
    targetPort: 5000

You can grab redash-deployment.m4 from Pastebin. What we did above was to define the macro redash_environment (with care for proper indentation) and use this in the container definitions in the Pod instead of copy-pasting that bunch of lines four times. Yes, you could have done it with any other template processor too.

You’re almost done. Postgres is not configured so, you need to connect and initialize the database:

$ kubectl exec -it redash-f8556648b-tw949 -c server -- bash
redash@redash-f8556648b-tw949:/app$ ./manage.py database create_tables
:
redash@redash-f8556648b-tw949:/app$ exit
$

I used the above configuration to quickly launch Redash on a Windows machine that runs the Docker Desktop Kubernetes distribution. For example no permanent storage for Postgres is defined. In a production installation it could very well be that said Postgres lives outside the cluster, so there is no need for such a container. The same might hold true for the Redis container too.

What I wanted to demonstrate, was that due to this specific circumstance, a 40+ year old tool may come to your assistance without needing to install any other weird templating tool or what. And also how to react in cases where you need !!merge and it is not supported by your parser.

Running tinyproxy inside Kubernetes

Now why would you want to do that? Because sometimes, you have to get data from customers and they whitelist specific IP addresses for you to get their data from. But the whole concept of Kubernetes means that, in general, you do not care where your process runs on (as long as it runs on the Kubernetes cluster) and also you get some advantage from the general elasticity it offers (because you know, an unspoken design principle is that your clusters are on the the cloud, and basically throwaway machines, alas life has different plans).

So assuming that you have a somewhat elastic cluster with some nodes that never get disposed and some nodes added and deleted for elasticity, how would you secure access to the customer’s data from a fixed point? You run a proxy service (like tinyproxy, haproxy, or whatever) on the fixed servers of your cluster (which of course the client has whitelisted). You need to label them somehow in the beginning, like kubectl label nodes fixed-node-1 tinyproxy=true. You now need a docker container to run tinyproxy from. Let’s build one using the Dockerfile below:

FROM centos:7
RUN yum install -y epel-release
RUN yum install -y tinyproxy
# This is needed to allow global access to tinyproxy.
# See the comments in tinyproxy.conf and tweak to your
# needs if you want something different.
RUN sed -i.bak -e s/^Allow/#Allow/ /etc/tinyproxy/tinyproxy.conf
ENTRYPOINT [ "/usr/sbin/tinyproxy", "-d", "-c", "/etc/tinyproxy/tinyproxy.conf" ]

Let’s build and push it to docker hub (obviously you’ll want to push to your own docker repository):

docker build -t yiorgos/tinyproxy .
docker push yiorgos/tinyproxy

We can now attempt to deploy a deployment in our cluster:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tinyproxy
  namespace: adamo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tinyproxy
  template:
    metadata:
      labels:
        app: tinyproxy
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: tinyproxy
                operator: In
                values:
                - worker-4
                - worker-3
      containers:
      - image: yiorgos/tinyproxy
        imagePullPolicy: Always
        name: tinyproxy
        ports:
        - containerPort: 8888
          name: 8888tcp02
          protocol: TCP```

We can apply the above with `kubectl apply -f tinyproxy-deployment.yml`. The trained eye will recognize however that this YAML file is somehow [Rancher](http://rancher.com) related. Indeed it is, I deployed the above container with Rancher2 and got the YAML back with `kubectl -n proxy get deployment tinyproxy -o yaml`.  So what is left now to make it usable within the cluster? A service to expose this to other deployments within Kubernetes:

apiVersion: v1 kind: Service metadata: name: tinyproxy namespace: proxy spec: type: ClusterIP ports:

  • name: 8888tcp02 port: 8888 protocol: TCP targetPort: 8888

We apply this with `kubectl apply -f tinyproxy-service.yml` and we are now set. All pods within your cluster can now access the tinyproxy and get access to whatever they need to by connecting to `tinyproxy.proxy.svc.cluster.local:8888`. I am running this in its own namespace. This may come handy if you use [network policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) and you want to restrict access to the proxy server for certain pods within the cluster.

This is something you can use with [haproxy](https://hub.docker.com/_/haproxy) containers instead of tinyproxy, if you so like.

Running monit inside Kubernetes

Sometimes you may want to run monit inside a Kubernetes cluster just to validate what you’re getting from your standard monitoring solution with a second monitor that does not require that much configuration or tinkering. In such cases the Dockerfile bellow might come handy:

FROM ubuntu:bionic
RUN apt-get update
RUN apt-get install monit bind9-host netcat fping -y
RUN ln -f -s /dev/fd/1 /var/log/monit.log
COPY monitrc /etc/monit
RUN chmod 0600 /etc/monit/monitrc
EXPOSE 2812
ENTRYPOINT [ "/usr/bin/monit" ]
CMD [ "-I", "-c", "/etc/monit/monitrc" ]

I connected to it via kubectl -n monit-test port-forward --address=0.0.0.0 pod/monit-XXXX-YYYY 2812:2812. Most people do not need --address=0.0.0.0, but I run kubectl inside a VM for some degree of compartmentalization. Stringent, I know…

Why would you need something like this you ask? Well imagine the case where you have multiple pods running, no restarts, everything fine, but randomly you get connection timeouts to the clusterIP address:port pair. If you have no way of reproducing this, don’t you want an alert the exact moment it happens? That was the case for me.

And also the fun of using a tool in an unforeseen way.

Rancher’s cattle-cluster-agent and error 404

It may be the case that when you deploy a new Rancher2 Kubernetes cluster, all pods are working fine, with the exception of cattle-cluster-agent (whose scope is to connect to the Kubernetes API of Rancher Launched Kubernetes clusters) that enters a CrashLoopBackoff state (red state in your UI under the System project).

One common error you will see from View Logs of the agent’s pod is 404 due to a HTTP ping failing:

ERROR: https://rancher-ui.example.com/ping is not accessible (The requested URL returned error: 404)

It is a DNS problem

The issue here is that if you watch the network traffic on your Rancher2 UI server, you will never see pings coming from the pod, yet the pod is sending traffic somewhere. Where?

Observe the contents of your pod’s /etc/resolv.conf:

nameserver 10.43.0.10
search default.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5

Now if you happen to have a wildcard DNS A record in example.com the HTTP ping in question becomes http://rancher-ui.example.com.example.com/ping which happens to resolve to the A record of the wildcard (most likely not the A RR of the host where the Rancher UI runs). Hence if this machine runs a web server, you are at the mercy of what that web server responds.

One quick hack is to edit your Rancher2 cluster’s YAML and instruct the kubelet to start with a different resolv.conf that does not contain a search path with your domain with the wildcard record in it. The kubelet appends the search path line to the default and in this particular case you do not want that. So you tell your Rancher2 cluster the following:

  kubelet:
    extra_args:
      resolv-conf: /host/etc/resolv.rancher

resolv.rancher contains only nameserver entries in my case. The path is /host/etc/resolv.rancher because you have to remember that in Rancher2 clusters, the kubelet itself runs from within a container and access the host’s file system under /host.

Now I am pretty certain this can be dealt with, with some coredns configuration too, but did not have the time to pursue it.

once again bitten by the MTU

At work we use Rancher2 clusters a lot. The UI makes some things easier I have to admit. Like sending logs from the cluster somewhere. I wanted to test sending such logs to an ElasticSearch and thus I setup a test installation with docker-compose:

version: "3.4"

services:
  elasticsearch:
    restart: always
    image: elasticsearch:7.5.1
    container_name: elasticsearch
    ports:
      - "9200:9200"
    environment:
      - ES_JAVA_OPTS=-Xmx16g
      - cluster.name=lala-cluster
      - bootstrap.memory_lock=true
      - discovery.type=single-node
      - node.name=lala-node
      - http.port=9200
      - xpack.security.enabled=true
      - xpack.monitoring.collection.enabled=true
    volumes:
      # ensure chown 1000:1000 /opt/elasticsearch/data please.
      - /opt/elasticsearch/data:/usr/share/elasticsearch/data

  kibana:
    restart: always
    image: kibana:7.5.1
    ports:
      - "5601:5601"
    container_name: kibana
    depends_on:
      - elasticsearch
    volumes:
      - /etc/docker/compose/kibana.yml:/usr/share/kibana/config/kibana.yml

Yes, this is a yellow cluster, but then again, it is a test cluster on a single machine.

This seemed to work for some days, and the it stopped. tcpdump showed packets arriving at the machine, but not really responding back after the three way handshake. So the old mantra kicked in:

It is a MTU problem.

Editing daemon.json to accommodate for that assumption:

{
  "mtu": 1400
}

and logging was back to normal.

I really hate fixes like this, but sometimes when pressed by other priorities they present a handy arsenal.

coreDNS and nodesPerReplica

[ It is always a DNS problem; or systemd]

It is well established that one does not run a Kubernetes cluster that spans more than one region (for whatever the definition of the region is for you cloud provider). Except when sometimes, one does do this, for reasons, and learns what leads to the rule stated above. Instabilities arise.

One such instability is the behavior of the internal DNS. It suffers. Latency is high and the internal services cannot communicate with one another, or things happen become very slow. Imagine for example your coreDNS resolvers running not in the same region where two pods that want to talk to each other are. You may initially think it is the infamous ndots:5, which while it may contribute, is not the issue here. The (geographical) location of the DNS service is.

When you are in a situation like that, maybe it will come handy to run a DNS resolver on each host (kind of a DaemonSet). Is this possible? Yes it is, if you take the time to read Autoscale the DNS Service in a Cluster:

The actual number of backends is calculated using this equation:
replicas = max( ceil( cores × 1/coresPerReplica ) , ceil( nodes × 1/nodesPerReplica ) )

Armed with that information, we edit the coredns-autoscaler configMap:

$ kubectl -n kube-system edit cm coredns-autoscaler
:
linear: '{"coresPerReplica":128,"min":1,"nodesPerReplica":1,"preventSinglePointFailure":true}'

Usually the default value for nodesPerReplica is 4. By assigning to it the value of 1, you’re ensuring you have #nodes of resolver instances, speeding up your DNS resolution in the unfortunate case where your cluster spans more than one region.

The things we do when we break the rules…

rkube: Rancher2 Kubernetes cluster on a single VM using RKE

There are many solutions to run a complete Kubernetes cluster in a VM on your machine, minikube, microk8s or even with kubeadm. So embarking into what others have done before me, I wanted to do the same with RKE. Mostly because I work with Rancher2 lately and I want to experiment on VirtualBox without remorse.

Enter rkube (the name directly inspired from minikube and rke). It does not do the many things that minikube does, but it is closer to my work environments.

We use vagrant to boot an Ubuntu Bionic box. It creates a 4G RAM / 2 CPU machine. We provision the machine using ansible_local and install docker from the Ubuntu archives. This is version 17 for Bionic. If you need a newer version, check the docker documentation and modify ansible.yml accordingly.

Once the machine boots up and is provisioned, it is ready for use. You will find the kubectl configuration file named kube_cluster_config.yml installed in the cloned repository directory. You can now run a simple echo server with:

kubectl --kubeconfig kube_cluster_config.yml apply -f echo.yml

Check that the cluster is deployed with:

kubectl --kubeconfig kube_cluster_config.yml get pod
kubectl --kubeconfig kube_cluster_config.yml get deployment
kubectl --kubeconfig kube_cluster_config.yml get svc
kubectl --kubeconfig kube_cluster_config.yml get ingress

and you can visit the echo server at http://192.168.98.100/echo Ignore the SSL error. We have not created a specific SSL certificate for the Ingress controller yet.

You can change the IP address you can connect to the RKE VM in the Vagrantfile.

Suppose you now want to upgrade the Kubernetes version. vagrant ssh into the VM and run rke config -l -s -a and pick the new version that you want to install. Look for the containers named hypercube. You now edit /vagrant/cluster.yml and run rke up --config /vagrant/cluster.yml.

Note that thanks to vagrant’s niceties, the /vagrant directory within the VM is the directory you cloned the repository into.

I developed the whole thing in Windows 10, so it should be able to run just about anywhere. I hope you like it and help me make it a bit better if you find it useful.

You can browse rkube here

PORT is deprecated. Please use SCHEMA_REGISTRY_LISTENERS instead.

I was trying to launch a schema-registry within a kubernetes cluster and every time I wanted to expose the pod’s port through a service, I was greeted by the nice title message:

if [[ -n "${SCHEMA_REGISTRY_PORT-}" ]]
then
  echo "PORT is deprecated. Please use SCHEMA_REGISTRY_LISTENERS instead."
  exit 1
fi

This happened because I had named my service schema-registry also (which was kind of not negotiable at the time) and kubernetes happily sets the SCHEMA_REGISTRY_PORT environment variable to the value of the port you want to expose. But it turns out that this very named variable has special meaning within the container.

Fortunately, I was not the only one bitten by this error, albeit for a different variable name, but I also used the same ugly hack:

$ kubectl -n kafka-tests get deployment schema-registry -o yaml
:
    spec:
      containers:
      - command:
        - bash
        - -c
        - unset SCHEMA_REGISTRY_PORT; /etc/confluent/docker/run
        env:
        - name: SCHEMA_REGISTRY_LISTENERS
          value: http://0.0.0.0:8081/
: