Stop the Dell Inspiron 7373 fan from making noise all the time.

This seems to be a problem with some Dells. It also seems to not be remedied with the usual tricks, like BIOS updates, tweaking the BIOS, running Dell Update or SupportAssist. I know because I’ve tried them all suggestions. It made participating to teleconferences (with any tool) extremely difficult.

The last thing I tried was to alter the Maximum Processor State:

Annotation 2020-04-07 102748

And it seems to be working.

Don’t disable IPv6, block the traffic instead

It is a somewhat common practice to disable IPv6 when you have to deal with no such traffic. However, doing so may have some unintended consequences, like when running rpcbind for NFS sharing:

Feb 20 11:27:12 server systemd[1]: rpcbind.socket failed to listen on sockets: Address family not supported by protocol

So, don’t disable IPv6 traffic. Aim to blocking it instead. Or if you must have it disabled, read here.

PS: This is not a post advocating not adopting IPv6. This is simply a workaround if the situation arises. You need to operate in an IPv6 world.

base64 and SASLprep: failed prohibited character check

I was trying something out on a minikube cluster the other day, and my python application could not connect to a mongo database that was deployed with helm. I had authentication enabled, the secret deployed in minikube and was greeted by the following error message:

SASLprep: failed prohibited character check

WTF? The password was definitely ascii printable characters. Could there be prohibited characters among the alphabet? I browsed through RFCs 4013 and 3454 but there was nothing I could immediately pinpoint there. However, this tweet by jpmens was circling my mind, and:

$ echo qwerty1234 | base64
cXdlcnR5MTIzNAo=

$ printf qwerty1234 | base64
cXdlcnR5MTIzNA==

$ echo -n qwerty1234 | base64
cXdlcnR5MTIzNA==

Yes, the kubernetes secret had a \n attached to it. Because two things are hard in CS:
– naming things
– cache invalidation
– off by one errors

aws ssm describe-instance-information for quick ansible dynamic inventory

The aws ssm agent is very useful when working both with EC2 instances and with machinery outside AWS. Once you add an outside instance by installing and configuring the SSM agent, be it on-premises or a VM at another provider, you can tag it for further granularity with aws ssm add-tags-to-resource --resource-type ManagedInstance --resource-id mi-WXYZWXYZ --tags Key=onpremise,Value=true --region eu-west-1 where mi-WXYZWXYZ is the instance ID you see at the SSM’s managed instances list (alternatively you can get this list with aws ssm describe-instance-information along with lots of other information).

It may the case that sometimes you want to apply with ansible a certain change to those machines that live outside AWS. Yes you can run ansible workbooks via the SSM directly, but this requires ansible installed on said machines. If you need the simplest of dynamic inventories, to $ ansible -u user -i ./lala all -m ping here is the crudest version of ./lala, one that happily ignores the --list argument:

#!/bin/bash
printf "%s%s%s" \
'{ "all": { "hosts": [' \
$(aws ssm describe-instance-information --region eu-west-1 --filter Key=tag:onpremise,Values=true --query "InstanceInformationList[].IPAddress" --output text | tr '[:blank:]' ',') \
'] } }'

You can go all the way scripting something like this for a proper solution though.

Why printf instead of echo above? Because jpmens suggested so.

Running monit inside Kubernetes

Sometimes you may want to run monit inside a Kubernetes cluster just to validate what you’re getting from your standard monitoring solution with a second monitor that does not require that much configuration or tinkering. In such cases the Dockerfile bellow might come handy:

FROM ubuntu:bionic
RUN apt-get update
RUN apt-get install monit bind9-host netcat fping -y
RUN ln -f -s /dev/fd/1 /var/log/monit.log
COPY monitrc /etc/monit
RUN chmod 0600 /etc/monit/monitrc
EXPOSE 2812
ENTRYPOINT [ "/usr/bin/monit" ]
CMD [ "-I", "-c", "/etc/monit/monitrc" ]

I connected to it via kubectl -n monit-test port-forward --address=0.0.0.0 pod/monit-XXXX-YYYY 2812:2812. Most people do not need --address=0.0.0.0, but I run kubectl inside a VM for some degree of compartmentalization. Stringent, I know…

Why would you need something like this you ask? Well imagine the case where you have multiple pods running, no restarts, everything fine, but randomly you get connection timeouts to the clusterIP address:port pair. If you have no way of reproducing this, don’t you want an alert the exact moment it happens? That was the case for me.

And also the fun of using a tool in an unforeseen way.

Rancher’s cattle-cluster-agent and error 404

It may be the case that when you deploy a new Rancher2 Kubernetes cluster, all pods are working fine, with the exception of cattle-cluster-agent (whose scope is to connect to the Kubernetes API of Rancher Launched Kubernetes clusters) that enters a CrashLoopBackoff state (red state in your UI under the System project).

One common error you will see from View Logs of the agent’s pod is 404 due to a HTTP ping failing:

ERROR: https://rancher-ui.example.com/ping is not accessible (The requested URL returned error: 404)

It is a DNS problem

The issue here is that if you watch the network traffic on your Rancher2 UI server, you will never see pings coming from the pod, yet the pod is sending traffic somewhere. Where?

Observe the contents of your pod’s /etc/resolv.conf:

nameserver 10.43.0.10
search default.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5

Now if you happen to have a wildcard DNS A record in example.com the HTTP ping in question becomes http://rancher-ui.example.com.example.com/ping which happens to resolve to the A record of the wildcard (most likely not the A RR of the host where the Rancher UI runs). Hence if this machine runs a web server, you are at the mercy of what that web server responds.

One quick hack is to edit your Rancher2 cluster’s YAML and instruct the kubelet to start with a different resolv.conf that does not contain a search path with your domain with the wildcard record in it. The kubelet appends the search path line to the default and in this particular case you do not want that. So you tell your Rancher2 cluster the following:

  kubelet:
    extra_args:
      resolv-conf: /host/etc/resolv.rancher

resolv.rancher contains only nameserver entries in my case. The path is /host/etc/resolv.rancher because you have to remember that in Rancher2 clusters, the kubelet itself runs from within a container and access the host’s file system under /host.

Now I am pretty certain this can be dealt with, with some coredns configuration too, but did not have the time to pursue it.

The Uprising (Revolution Book 1)

The Uprising (Revolution Book 1)The Uprising by Konstantinos Christidis
My rating: 3 of 5 stars

This is the first book from the author and if I am not mistaken, it is self-published. It could use some more help with editing. That is why I did not give it more stars.

Think of this book as a prequel to the Expanse. The setting is similar, the Earth is governed by the UN and there is a terraforming project for Mars (and Venus) and at Callisto there is an installation that ships material and water (ice) to the terraforming projects.

Think also of the existence of the equivalent of the East India Company with its own private army, monopoly status and control over the judicial system and the government. What can go wrong when some theoretical physicist backed with VC money (to put it in today’s terms) threatens the status quo with faster than light travel?

This is what the book is about.

Could it have been written better? Yes. Does it matter that at times the author does not manage to keep the pace and be a bit boring? I don’t know. Maybe. It took me more than I anticipated to finish it.

Did I have a good time ultimately reading it? Sure.

View all my reviews

once again bitten by the MTU

At work we use Rancher2 clusters a lot. The UI makes some things easier I have to admit. Like sending logs from the cluster somewhere. I wanted to test sending such logs to an ElasticSearch and thus I setup a test installation with docker-compose:

version: "3.4"

services:
  elasticsearch:
    restart: always
    image: elasticsearch:7.5.1
    container_name: elasticsearch
    ports:
      - "9200:9200"
    environment:
      - ES_JAVA_OPTS=-Xmx16g
      - cluster.name=lala-cluster
      - bootstrap.memory_lock=true
      - discovery.type=single-node
      - node.name=lala-node
      - http.port=9200
      - xpack.security.enabled=true
      - xpack.monitoring.collection.enabled=true
    volumes:
      # ensure chown 1000:1000 /opt/elasticsearch/data please.
      - /opt/elasticsearch/data:/usr/share/elasticsearch/data

  kibana:
    restart: always
    image: kibana:7.5.1
    ports:
      - "5601:5601"
    container_name: kibana
    depends_on:
      - elasticsearch
    volumes:
      - /etc/docker/compose/kibana.yml:/usr/share/kibana/config/kibana.yml

Yes, this is a yellow cluster, but then again, it is a test cluster on a single machine.

This seemed to work for some days, and the it stopped. tcpdump showed packets arriving at the machine, but not really responding back after the three way handshake. So the old mantra kicked in:

It is a MTU problem.

Editing daemon.json to accommodate for that assumption:

{
  "mtu": 1400
}

and logging was back to normal.

I really hate fixes like this, but sometimes when pressed by other priorities they present a handy arsenal.

coreDNS and nodesPerReplica

[ It is always a DNS problem; or systemd]

It is well established that one does not run a Kubernetes cluster that spans more than one region (for whatever the definition of the region is for you cloud provider). Except when sometimes, one does do this, for reasons, and learns what leads to the rule stated above. Instabilities arise.

One such instability is the behavior of the internal DNS. It suffers. Latency is high and the internal services cannot communicate with one another, or things happen become very slow. Imagine for example your coreDNS resolvers running not in the same region where two pods that want to talk to each other are. You may initially think it is the infamous ndots:5, which while it may contribute, is not the issue here. The (geographical) location of the DNS service is.

When you are in a situation like that, maybe it will come handy to run a DNS resolver on each host (kind of a DaemonSet). Is this possible? Yes it is, if you take the time to read Autoscale the DNS Service in a Cluster:

The actual number of backends is calculated using this equation:
replicas = max( ceil( cores × 1/coresPerReplica ) , ceil( nodes × 1/nodesPerReplica ) )

Armed with that information, we edit the coredns-autoscaler configMap:

$ kubectl -n kube-system edit cm coredns-autoscaler
:
linear: '{"coresPerReplica":128,"min":1,"nodesPerReplica":1,"preventSinglePointFailure":true}'

Usually the default value for nodesPerReplica is 4. By assigning to it the value of 1, you’re ensuring you have #nodes of resolver instances, speeding up your DNS resolution in the unfortunate case where your cluster spans more than one region.

The things we do when we break the rules…

on brilliant assholes

[ yet another meaningless micropost ]

From time to time people get to read the autobiography, or memoirs of a specific time when a highly successful individual reached their peak. Fascinated by their success and seeking to improve their own status, these followers* copy the behavior they read about. Interestingly, if this behavior is assholic and abusive even more easier. Someone with psychology studies would have more to say here, I’m sure. In my “communication radius” this is very easy to observe with people who want to copy successful sports coaches, and you can see this pattern crossing over to other work domains too.

It is not easy to understand that someone can be an asshole whose brilliance may make them somewhat tolerable to their immediate workplace, while the other way round does not stand: assholic behavior does not generate brilliance. Solutions do.

If you think you’re brilliant, just pick a hard problem and solve it. I know, it’s …hard.

[*] Leadership programs and books create followers, not leaders.