rkube: Rancher2 Kubernetes cluster on a single VM using RKE

There are many solutions to run a complete Kubernetes cluster in a VM on your machine, minikube, microk8s or even with kubeadm. So embarking into what others have done before me, I wanted to do the same with RKE. Mostly because I work with Rancher2 lately and I want to experiment on VirtualBox without remorse.

Enter rkube (the name directly inspired from minikube and rke). It does not do the many things that minikube does, but it is closer to my work environments.

We use vagrant to boot an Ubuntu Bionic box. It creates a 4G RAM / 2 CPU machine. We provision the machine using ansible_local and install docker from the Ubuntu archives. This is version 17 for Bionic. If you need a newer version, check the docker documentation and modify ansible.yml accordingly.

Once the machine boots up and is provisioned, it is ready for use. You will find the kubectl configuration file named kube_cluster_config.yml installed in the cloned repository directory. You can now run a simple echo server with:

kubectl --kubeconfig kube_cluster_config.yml apply -f echo.yml

Check that the cluster is deployed with:

kubectl --kubeconfig kube_cluster_config.yml get pod
kubectl --kubeconfig kube_cluster_config.yml get deployment
kubectl --kubeconfig kube_cluster_config.yml get svc
kubectl --kubeconfig kube_cluster_config.yml get ingress

and you can visit the echo server at http://192.168.98.100/echo Ignore the SSL error. We have not created a specific SSL certificate for the Ingress controller yet.

You can change the IP address you can connect to the RKE VM in the Vagrantfile.

Suppose you now want to upgrade the Kubernetes version. vagrant ssh into the VM and run rke config -l -s -a and pick the new version that you want to install. Look for the containers named hypercube. You now edit /vagrant/cluster.yml and run rke up --config /vagrant/cluster.yml.

Note that thanks to vagrant’s niceties, the /vagrant directory within the VM is the directory you cloned the repository into.

I developed the whole thing in Windows 10, so it should be able to run just about anywhere. I hope you like it and help me make it a bit better if you find it useful.

You can browse rkube here

the case of the yellow cluster

org.elasticsearch.transport.ConnectTransportException or the case of the yellow cluster

Some time ago we needed to add two datanode in our ElasticSearch cluster which we happily ordered from our cloud provider. The first one joined OK and shards started moving around nicely. A happy green cluster. However upon adding a second node, the cluster started accepting shards but remained in yellow state. Consistently. Like hours. Even trying to empty the node in order to remove it was not working. Some shards would stay there forever.

Upon looking at the node logs, here is what caught our attention:

org.elasticsearch.transport.ConnectTransportException: [datanode-7][10.1.2.7:9300] connect_exception

A similar log entry was found in datanode-7‘s log file. What was going on here? Well, these two machines were assigned sequential IP addresses, 10.1.2.6 and 10.1.2.7. They could literally ping the whole internet but not find each other. To which fact the cloud provider’s support group replied:

in this case you need to configure a hostroute via the gw as our switch doesn't allow a direct communication.

Enter systemd territory then, and not wanting to make this yet another service, I defaulted to the oldest boot solution. Edit /etc/rc.local (in reality /etc/rc.d/rc.local) on both machines with appropriate routes:

ip route add 10.1.2.7/32 via 10.1.2.1 dev enp0s31f6

and enable it:

# chmod +x /etc/rc.d/rc.local
# systemctl enable rc-local

rc.local will never die. It is that loyal friend that will wait to be called upon when you need them most.

resizing a vagrant box disk

[ I am about to do what others have done before me and blog about it one more time ]

While I do enjoy working with Windows 10, I am still not using WSL (waiting for WSL2) and work with either chocolatey or a vagrant Ubuntu box. It so happens that after pulling a few docker images the default 10G disk if full and you cannot work anymore. So, let’s resize the disk:

The disk on my ubuntu/bionic64 box, is a VMDK one. So before resizing, we need to transform it to a VDI first, which is easier for VirtualBox to handle:

VBoxManage clonehd .\ubuntu-bionic-18.04-cloudimg.vmdk .\ubuntu-bionic-18.04-cloudimg.vdi --format vdi

Now we can resize it, to say 20G:

VBoxManage modifymedium disk .\ubuntu-bionic-18.04-cloudimg.vdi --resize 20000

We’re almost there. We need to tell vagrant to boot from the VDI disk now. To do so open VirtualBox and visit the storage settings of the vagrant VM. Remove the VDMK disk(s) there and add the VDI on SCSI0 port. That’s it. We’re one step closer. Close VirtualBox and vagrant up now to boot from the VDI.

Now you have a 20G disk, but still a 10G partition. parted to the rescue:

$ sudo parted /dev/sda
(parted) resizepart 

It will ask you the partition number. You answer 1 (which is the /dev/sda1). It will ask you for the end of the partition. You answer -1 (which means until the end of disk). quit and you’re out.

You have changed the partition size, but still the filesystem reports the old size. resize2fs (assuming a compatible filesystem) and:

$ sudo resize2fs /dev/sda1

Now you’re done. You may want to vagrant reload to check whether everything works fine. Once you’re sure of that you can delete the old VMDK disk.

PORT is deprecated. Please use SCHEMA_REGISTRY_LISTENERS instead.

I was trying to launch a schema-registry within a kubernetes cluster and every time I wanted to expose the pod’s port through a service, I was greeted by the nice title message:

if [[ -n "${SCHEMA_REGISTRY_PORT-}" ]]
then
  echo "PORT is deprecated. Please use SCHEMA_REGISTRY_LISTENERS instead."
  exit 1
fi

This happened because I had named my service schema-registry also (which was kind of not negotiable at the time) and kubernetes happily sets the SCHEMA_REGISTRY_PORT environment variable to the value of the port you want to expose. But it turns out that this very named variable has special meaning within the container.

Fortunately, I was not the only one bitten by this error, albeit for a different variable name, but I also used the same ugly hack:

$ kubectl -n kafka-tests get deployment schema-registry -o yaml
:
    spec:
      containers:
      - command:
        - bash
        - -c
        - unset SCHEMA_REGISTRY_PORT; /etc/confluent/docker/run
        env:
        - name: SCHEMA_REGISTRY_LISTENERS
          value: http://0.0.0.0:8081/
:

How I run a small (discussion) mailing list

I used to run a fairly large (for two Ops people) mail system until 2014. I did it all, sendmail, MIMEDefang, my own milters, SPF, and a bunch of other accompanying acronyms and technologies. I’ve stopped doing that now. The big companies have both the money and the workforce to do it better. But I still run a small mailing list. It used to run on majordomo but then it started having issues with modern Perl and I moved it to Mailman. Which is kind of an overkill for just a list with 300 or so people on. So for a time I even run it manually using free GSuite (yes there was a time it was free) hosting.

Lately I wanted to really run it automated again. So I thought I should try something different that would also contribute to the general civility and high SNR of it. I’d been lurking around the picolisp mailing list for years and thought I should use it. Because it comes with a mailing list program:

#!bin/picolisp lib.l
# 19apr17abu
# (c) Software Lab. Alexander Burger

# Configuration
(setq
   *MailingList "foobar@example.com"
   *SpoolFile "/var/mail/foobar"
   *MailingDomain "server.example.com"
   *Mailings (make (in "/home/foobar/Mailings" (while (line T) (link @))))
   *SmtpHost "localhost"
   *SmtpPort 25 )

# Process mails
(loop
   (when (gt0 (car (info *SpoolFile)))
      (protect
         (in *SpoolFile
            (unless (= "From" (till " " T))
               (quit "Bad mbox file") )
            (char)
            (while (setq *From (lowc (till " " T)))
               (line)  # Skip rest of line and "\r\n"
               (off
                  *Name *Subject *Date *MessageID *InReplyTo *MimeVersion
                  *ContentType *ContentTransferEncoding *ContentDisposition *UserAgent )
               (while (trim (split (line) " "))
                  (let L @
                     (while (and (sub? (peek) " \t") (char))  # Skip WSP
                        (conc L (trim (split (line) " "))) )
                     (setq *Line (glue " " (cdr L)))
                     (case (pack (car L))
                        ("From:" (setq *Name *Line))
                        ("Subject:" (setq *Subject *Line))
                        ("Date:" (setq *Date *Line))
                        ("Message-ID:" (setq *MessageID *Line))
                        ("In-Reply-To:" (setq *InReplyTo *Line))
                        ("MIME-Version:" (setq *MimeVersion *Line))
                        ("Content-Type:" (setq *ContentType *Line))
                        ("Content-Transfer-Encoding:" (setq *ContentTransferEncoding *Line))
                        ("Content-Disposition:" (setq *ContentDisposition *Line))
                        ("User-Agent:" (setq *UserAgent *Line)) ) ) )
               (if (nor (member *From *Mailings) (= "subscribe" (lowc *Subject)))
                  (out "/dev/null" (echo "^JFrom ") (msg *From " discarded"))
                  (unless (setq *Sock (connect *SmtpHost *SmtpPort))
                     (quit "Can't connect to SMTP server") )
                  (unless
                     (and
                        (pre? "220 " (in *Sock (line T)))
                        (out *Sock (prinl "HELO " *MailingDomain "^M"))
                        (pre? "250 " (in *Sock (line T)))
                        (out *Sock (prinl "MAIL FROM:" *MailingList "^M"))
                        (pre? "250 " (in *Sock (line T))) )
                     (quit "Can't HELO") )
                  (when (= "subscribe" (lowc *Subject))
                     (push1 '*Mailings *From)
                     (out "Mailings" (mapc prinl *Mailings)) )
                  (for To *Mailings
                     (out *Sock (prinl "RCPT TO:" To "^M"))
                     (unless (pre? "250 " (in *Sock (line T)))
                        (msg T " can't mail") ) )
                  (when (and (out *Sock (prinl "DATA^M")) (pre? "354 " (in *Sock (line T))))
                     (out *Sock
                        (prinl "From: " (or *Name *From) "^M")
                        (prinl "Sender: " *MailingList "^M")
                        (prinl "Reply-To: " *MailingList "^M")
                        (prinl "To: " *MailingList "^M")
                        (prinl "Subject: " *Subject "^M")
                        (and *Date (prinl "Date: " @ "^M"))
                        (and *MessageID (prinl "Message-ID: " @ "^M"))
                        (and *InReplyTo (prinl "In-Reply-To: " @ "^M"))
                        (and *MimeVersion (prinl "MIME-Version: " @ "^M"))
                        (and *ContentType (prinl "Content-Type: " @ "^M"))
                        (and *ContentTransferEncoding (prinl "Content-Transfer-Encoding: " @ "^M"))
                        (and *ContentDisposition (prinl "Content-Disposition: " @ "^M"))
                        (and *UserAgent (prinl "User-Agent: " @ "^M"))
                        (prinl "^M")
                        (cond
                           ((= "subscribe" (lowc *Subject))
                              (prinl "Hello " (or *Name *From) " :-)^M")
                              (prinl "You are now subscribed^M")
                              (prinl "****^M^J^M") )
                           ((= "unsubscribe" (lowc *Subject))
                              (out "Mailings"
                                 (mapc prinl (del *From '*Mailings)) )
                              (prinl "Good bye " (or *Name *From) " :-(^M")
                              (prinl "You are now unsubscribed^M")
                              (prinl "****^M^J^M") ) )
                        (echo "^JFrom ")
                        (prinl "^J-- ^M")
                        (prinl "UNSUBSCRIBE: mailto:" *MailingList "?subject=Unsubscribe^M")
                        (prinl ".^M")
                        (prinl "QUIT^M") ) )
                  (close *Sock) ) ) )
         (out *SpoolFile (rewind)) ) )
   (call "fetchmail" "-as")
   (wait `(* 4 60 1000)) )

# vi:et:ts=3:sw=3

You do not have to understand a lot of Lisp to know how this is configured.
– Line 7 is where you put your mailing list’s address
– Line 8 is where incoming mail for the mailing list is saved. If you have a mail server and you run this there, just decide to run the list as a plain user and point to the user’s mailbox under /var/mail. Or, if you run fetchmail point to the file fetchmail saves incoming mail. See line 96 for that.
– Related to the above: Since I run this on my mail server, I delete line 96.
– Line 9 is my machine’s HELO/EHLO name.
– Line 10 is where the list membership is saved.
– Line 11 is the outgoing mail server.
– Line 12 is the outgoing’s mail server SMTP listening port.

Since my own incoming mail server is on a cloud provider and does not enjoy good sending reputation, I am making use of mailgun as a forwarding service. My mail server forwards to mailgun and mailgun delivers to the recipients. If you need to run your own small mail server there are tons of tutorials out there. While I am a die-hard sendmail person, I am running Postfix these days.

The final issue that remains is how to run the mailing list processor. mailing is a console program and you need to run it as a daemon somehow. Docker to the rescuse. I am building an image with the following Dockerfile:

FROM debian:buster
RUN apt-get update && apt-get install -y picolisp
WORKDIR /usr/src
COPY ./mailing .
CMD ./mailing

and I am executing this with:

docker run -d --name=foobar --restart=unless-stopped -v /var/mail/foobar:/var/mail/foobar -v /home/foobar/Mailings:/home/foobar/Mailings foobar_image

I can even inspect the logs with docker logs foobar and have made it resilient through reboots with one command.

There. I hope this gives you some ideas on how to run your own (small) mailing list. With some tinkering, you do not even need to run your own incoming mail server. It can be a mailbox in Gmail or elsewhere, you fetch mails with fetchmail locally, and submit to a relay service and done. Also note: Relay services require payment after some threshold.

ansible, timezone and JDK8

While one might think that they can change the timezone of a machine with ansible with:

  tasks:
  - name: set /etc/localtime
    timezone:
      name: UTC

with some JDK apps it is not enought, because the JVM looks at /etc/timezone. So you need to update that file too. Maybe in a more cleaner way than:

  - name: set /etc/timezone
    shell: echo UTC > /etc/timezone
    args:
      creates: /etc/timezone

Kafka, dotnet and SASL_SSL

This is similar to my previous post, only now the question is, how do you connect to a Kafka server using dotnet and SASL_SSL? This is how:

// based on https://github.com/confluentinc/confluent-kafka-dotnet/blob/v1.0.0/examples/Producer/Program.cs

using Confluent.Kafka;
using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using System.Collections.Generic;


namespace Confluent.Kafka.Examples.ProducerExample
{
    public class Program
    {
        public static async Task Main(string[] args)
        {
            string topicName = "test-topic";

            var config = new ProducerConfig {
                BootstrapServers = "kafka-server.example.com:19094",
                SecurityProtocol = SecurityProtocol.SaslSsl,
                SslCaLocation = "ca-cert",
                SaslMechanism = SaslMechanism.Plain,
                SaslUsername = "USERNAME",
                SaslPassword = "PASSWORD",
                Acks = Acks.Leader,
                CompressionType = CompressionType.Lz4,
            };

            using (var producer = new ProducerBuilder<string, string>(config).Build())
            {
                for (int i = 0; i < 1000000; i++)
                {
                    var message = $"Event {i}";

                    try
                    {
                        // Note: Awaiting the asynchronous produce request
                        // below prevents flow of execution from proceeding
                        // until the acknowledgement from the broker is
                        // received (at the expense of low throughput).

                        var deliveryReport = await producer.ProduceAsync(topicName, new Message<string, string> { Key = null, Value = message } );
                        // Console.WriteLine($"delivered to: {deliveryReport.TopicPartitionOffset}");

                        // Let's not await then
                        // producer.ProduceAsync(topicName, new Message<string, string> { Key = null, Value = message } );
                        // Console.WriteLine($"Event {i} sent.");
                    }
                    catch (ProduceException<string, string> e)
                    {
                        Console.WriteLine($"failed to deliver message: {e.Message} [{e.Error.Code}]");
                    }
                }

                // producer.Flush(TimeSpan.FromSeconds(120));

                // Since we are producing synchronously, at this point there will be no messages
                // in-flight and no delivery reports waiting to be acknowledged, so there is no
                // need to call producer.Flush before disposing the producer.
            }
        }
    }
}

Since I am a total .NET newbie, I usually docker run -it --rm microsoft/dotnet and experiment from there.