Rancher’s cattle-cluster-agent and error 404

It may be the case that when you deploy a new Rancher2 Kubernetes cluster, all pods are working fine, with the exception of cattle-cluster-agent (whose scope is to connect to the Kubernetes API of Rancher Launched Kubernetes clusters) that enters a CrashLoopBackoff state (red state in your UI under the System project).

One common error you will see from View Logs of the agent’s pod is 404 due to a HTTP ping failing:

ERROR: https://rancher-ui.example.com/ping is not accessible (The requested URL returned error: 404)

It is a DNS problem

The issue here is that if you watch the network traffic on your Rancher2 UI server, you will never see pings coming from the pod, yet the pod is sending traffic somewhere. Where?

Observe the contents of your pod’s /etc/resolv.conf:

nameserver 10.43.0.10
search default.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5

Now if you happen to have a wildcard DNS A record in example.com the HTTP ping in question becomes http://rancher-ui.example.com.example.com/ping which happens to resolve to the A record of the wildcard (most likely not the A RR of the host where the Rancher UI runs). Hence if this machine runs a web server, you are at the mercy of what that web server responds.

One quick hack is to edit your Rancher2 cluster’s YAML and instruct the kubelet to start with a different resolv.conf that does not contain a search path with your domain with the wildcard record in it. The kubelet appends the search path line to the default and in this particular case you do not want that. So you tell your Rancher2 cluster the following:

  kubelet:
    extra_args:
      resolv-conf: /host/etc/resolv.rancher

resolv.rancher contains only nameserver entries in my case. The path is /host/etc/resolv.rancher because you have to remember that in Rancher2 clusters, the kubelet itself runs from within a container and access the host’s file system under /host.

Now I am pretty certain this can be dealt with, with some coredns configuration too, but did not have the time to pursue it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s