Beware of true in Pod specifications

true is a reserved word in YAML and this can bite you when you least expect it. Consider the following extremely simple Pod and livenessProbe:

apiVersion: v1
kind: Pod
metadata:
  name: true-test
spec:
  containers:
  - name: nginx
    image: nginx
    livenessProbe:
      exec:
        command:
        - true

Let’s create this, shall we?

$ kubectl create -f true.yaml 

Error from server (BadRequest): error when creating "true.yaml": Pod in version "v1" cannot be handled as a Pod: v1.Pod.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.LivenessProbe: v1.Probe.Handler: Exec: v1.ExecAction.Command: []string: ReadString: expects " or n, but found t, error found in #10 byte of ...|ommand":[true]}},"na|..., bigger context ...|age":"nginx","livenessProbe":{"exec":{"command":[true]}},"name":"nginx"}]}}
|...
$

All because in the above specification, true is not quoted, as it should:

    livenessProbe:
      exec:
        command:
        - "true"

A different take, if you do not want to mess with true would be:

   livenessProbe:
      exec:
        command:
        - exit

Beware though, if you want to exit 0 you need to quote again:

   livenessProbe:
      exec:
        command:
        - exit
        - "0"

Oh, the many ways you can waste your time…

Πότε επιτέλους θα μάθουν οι υπεύθυνοι;

TL;DR: Ποτέ.

Ένα από τα καλύτερα μαθήματα που πήρα ποτέ, ήταν σε ένα από τα συνέδρια του Athens ISACA Chapter Infocom. Ο ομιλητής άρχισε να λέει κάτι σαν:

– Πόσοι από εδώ έχετε εντοπίσει θέματα που αν “σκάσουν” θα είναι πραγματικά προβλήματα και διαπιστώνετε πως το management δεν κάνει κάτι για αυτά;

Το κοινό κοιταζόταν μεταξύ τους, γιατί πολύ απλά η απάντηση ήταν: όλοι.

Ο ομιλητής, consultant σε μια από τις big 4, είπε μετά: Αυτό συμβαίνει because management takes a bet: Δεν θα σκάσει στην βάρδια μου.

Πόσο είναι αυτή η βάρδια; 2 χρόνια; Τρία χρόνια; Μετά έχει αποχωρήσει για άλλο πόστο ή έχει αποσυρθεί. Όταν σκάσει είναι αλλουνού το πρόβλημα, αυτού που το έχει στα χέρια του.

Για αυτό λοιπόν, οι υπεύθυνοι ενδιαφέρονται μόνο για έργα που έχουν κορδέλα. Επειδή προάγουν την καριέρα και ετοιμάζουν το επόμενο βήμα τους. Μια αντιπυρική ή ένας καθαρισμός, τι φωτογραφίες να δώσει; Καμία.

Καλά θα μου πεις και ο υπάλληλος από κάτω που έχει διάρκεια στην παραμονή του στην υπηρεσία; Ο υπάλληλος, φιλότιμος ή αφιλότιμος, μια χαρά είναι καλυμμένος από την συμπεριφορά των πολιτικών υπευθύνων. Όταν του πουν “αδερφέ κι εσύ τι έκανες τόσο καιρό;” θα ανοίξει το συρτάρι, θα βγάλει τα σημειώματα (με αριθμό πρωτοκόλλου) προς κάθε διοίκηση και θα πει “Aυτό” .

Το Δημόσιο δεν ευνοεί το προσωπικό ρίσκο του εργαζομένου (αν πάνε καλά τα πράγματα δεν ακούει ούτε μπράβο γιατί χάλασε την σούπα, αν πάνε κακά, δεν τον στηρίζει κανείς γιατί ρίσκαρε μόνος του) και διαχέει την ευθύνη αριστοτεχνικά.

Οπότε κάθε φορά, ο υπεύθυνος, παίζει ένα στοίχημα: θα σκάσει όσο κάθομαι στην καρέκλα; Αν ναι, ας το προλάβω. Αλλιώς σε δυο χρόνια ποιος με θυμάται αν δεν έκανα κάτι;

Management avoids errors of commission by making errors of omission.

[Originally a FB post]

Vagrant was unable to mount VirtualBox shared folders

After upgrading my ubuntu/focal64 box I got greeted by this wonderful message:

Vagrant was unable to mount VirtualBox shared folders. This is usually
because the filesystem "vboxsf" is not available. This filesystem is
made available via the VirtualBox Guest Additions and kernel module.
Please verify that these guest additions are properly installed in the
guest. This is not a bug in Vagrant and is usually caused by a faulty
Vagrant box. For context, the command attempted was:

mount -t vboxsf -o uid=1000,gid=1000 vagrant /vagrant

The error output from the command was:

: Invalid argument

The solution was rather simple: A sudo apt-get install -y virtualbox-guest-dkms inside the VirtualBox guest followed by a vagrant reload at the host.

In case it matters, the VirtualBox host machine was an Ubuntu 21.04.

F1 (random) thoughts

I had not watched a F1 championship for years. Maybe the occasional race once or twice per year. My interest in the sport was renewed by Formula 1: Drive to Survive. It offered a unique (although with a bit of reality) insight in the sport. So I watched the second half of last year’s championship and am watching the 2021 also.

I started wondering about the telemetry, monitoring, observability tools the teams use. After all, using your current understanding of things to understand something new is what we humans do most of the time. I understand monitoring and analytics infrastructures, I have an interest how people setup these in F1. Atlas 10 was mentioned in my FB feed by a friend.

I then started paying attention to the small advertising stickers on the cars. Not for the usual suspects like Oracle, Kaspersky and Citrix. JuliaHub what are you doing there? Julia is a programming language for scientific programming for those who don’t know about it. Not everything is about Python in computational science. Fascinating.

And then there was the most interesting observation, Tezos. I’ve seen it on McLaren and Red Bull. And to show that advertising works, I had $5 invested once in Etherium. I converted it to Tezos :)

While you’re here, bored, check this WIPO decision about the f1.com domain name once upon a time.

So on which Jenkins system am I running on?

It is often the case that you run a staging / test Jenkins server that has identically configured jobs as the production one. In such cases you want your pipeline to be able to distinguish in which system it runs on.

One way to do so it by checking the value of the BUILD_URL environment variable. However, this is not very helpful when you’re running the master inside a container, in which case you get back the container hostname in response.

There are also a number of solutions in StackOverflow you can look at, but you may opt to utilise the fact that you can add labels to each master accordingly and then query the master for the value of the labels it carries. Our solution depends on the httpRequest plugin in order to query the master.

import groovy.json.JsonSlurper

def get_jenkins_master_labels() {
    def response = httpRequest httpMode: 'GET', url: "http://127.0.0.1:8080/computer/(master)/api/json"
    def j = new JsonSlurper().parseText(response.content)
    return j.assignedLabels.name
}

def MASTER_NODE = get_jenkins_master_labels()

pipeline {
    agent {
        label 'docker'
    }
    stages {
        stage("test") {
            steps {
                println MASTER_NODE
            }
        }
    }
}

The trick here is that the part outside of the pipeline { ... } block runs directly on the master, so we can go ahead and call http://127.0.0.1:8080/computer/(master)/api/json to figure out stuff. get_jenkins_master_labels() queries the master and returns a list of all the labels assigned to the master (or a single string, master if no other labels are assigned to it). By checking the values of the list, one can infer in which Jenkins environment they are running on and continue from there.

So is any string proper for a docker image tag?

There was a failure for a build in our system that looked like this:

docker build -t yiorgos/my-cool-application:service/1.1.2 .

invalid argument "yiorgos/my-cool-application:service/1.1.2" for "-t, --tag" flag: invalid reference format
See 'docker build --help'.

Sometimes you get this error when you forget a space between the dot (the current directory where your Dockerfile usually lives and the tag). But this was not the case. Docker actually did not like the tag for the image.

My hunch was that it does not like slashes inside the actual tag part (right of the :).

And indeed by checking out the source code of podman we that this is indeed the case:

//	tag                             := /[\w][\w.-]{0,127}/

reference.go contains the full specification of a tag for anyone interested.

What does the file $JENKINS_HOME/.owner do?

I have four books that on Jenkins and have read numerous posts on the Net that discuss weird Jenkins details and internals (more than I ever wished to know about), but none that explains what the file $JENKINS_HOME/.owner does (even though they include listings like this ). I found out about it recently because I was greeted by the message:

Jenkins detected that you appear to be running more than one instance of Jenkins
that share the same home directory. This greatly confuses Jenkins and you will
likely experience strange behaviours, so please correct the situation.

This Jenkins:  1232342241 contextPath="" at 2288@ip-172.31.0.10
Other Jenkins: 863352860 contextPath="" at 1994@ip-172.31.0.14

[Ignore this problem and keep using Jenkins anyway]

Indeed it appears that Jenkins, after initialisation, does run a test to check whether another process already runs from the same directory. When the check is run, it creates the file $JENKINS_HOME/.owner, The .owner part of the name is hardcoded.

Even more interesting is the fact, that in order to avoid having the two processes write information on .owner at the same time, randomises when the process is going to write on the file, so even if both processes start at the same time, chances that their writes coincide are slim.

What does it write in this file, you ask? There you go. When was this feature added? 2008/01/31. The mechanism is documented in the comments of the code:

The mechanism is simple. This class occasionally updates a known file inside the hudson home directory, and whenever it does so, it monitors the timestamp of the file to make sure no one else is updating this file. In this way, while we cannot detect the problem right away, within a reasonable time frame we can detect the collision.

You may want to keep that in mind, especially in cases when you’re greeted by the above message but know for a fact that a second process is not running. Some abrupt ending of the previous process occurred and you did not take notice. Or indeed a second process is messing with your CI

“I tell them that there are three mistakes that people make in their careers, and that those three mistakes limit their potential growth.”

The first mistake is not having a five-year plan. I meet so many people who say: I just want to contribute. But that doesn’t necessarily drive you to where you want to go in your career. You have to know: Where do you want to be? What do you want to accomplish? Who do you want to be in five years?

The second mistake is not telling somebody. If you don’t go talk to your boss, if you don’t go talk to your mentors, if you don’t go talk to people who can influence where you want to be, then they don’t know. And they’re not mind readers.

The third thing is you have to have a mentor. You have to have someone who’s watching out, helping you navigate the decision-making processes, how things get done, how you’re perceived from a third-party view.

source

PS: Funny how sometimes I repeat myself

Mass disabling all Jenkins jobs

There are times that you need to disable all jobs on a Jenkins server. Especially when you’ve made a backup copy for testing or other purposes. You do not want jobs to start executing from that second server before you’re ready. Sure you can start Jenkins in quiet mode but sometime you have to exit it and scheduled jobs will start running. What can you do?

Well, there are plenty of pages that show Groovy code that allows you to stop jobs, and there are even suggestions to locate and change every config.xml file by running something like sed -i 's/disabled>false/disabled>true/' config.xml on each of them. Or even better use the Configuration Slicing plugin. Firstly, you may feel uneasy to mass change all config.xml file from a process external to Jenkins. Secondly, the Configuration Slicing plugin does not give you a "select all option" nor does it handle Multibranch Pipeline jobs. Thirdly, the Groovy scripts I’ve found shared by others online, also do not handle Pipelines and Multibranch Pipelines. If you’re based on Multibranch Pipelines, you’re kind of stuck then. Or you have to go and manually disable each one of them.

Thankfully there’s a solution using Jenkins’s REST API and python-jenkins. An example follows:

import jenkins

server = jenkins.Jenkins('http://127.0.0.1:8080', username='USERNAME', password='PASSWORD_OR_TOKEN')
#print(server.jobs_count())

queue_info = server.get_queue_info()
for i in range(len(queue_info)):
    print(queue_info[i]['id'])
    server.cancel_queue(queue_info[i]['id'])

I hope it helps you out maintaining your Jenkins.