If you need to host your own GitHub Actions runners that will execute your workflow, you have multiple options: you can go the traditional route and host them on your own VM’s. You can even host multiple runners on the same VM, if you want to increase the density and reuse your CPU or RAM capacity on your VM. The first downside of this is that VM’s are harder to scale and often a bit more costly to run then other options. Even more important reason not to use VM’s because it is not a great option for having an environment that is clean for every run: GitHub Actions will leave several files on disk for both reuse (the actions downloaded and used for example, or the docker images you run on). Even the checkout action will only cleanup the source code when it is executed, to make sure that the latest changes are checked out. You can include a cleanup action at the end, but often that is not added.
Even worse are the potential security pitfalls that come from reusing an environment between runs of a workflow, or different workflows in different repositories: the first run could leave some files behind, like from a package manager you use, or overwrite a local docker image for example. Subsequent runs on that machine will look in the local cache first, and use the (potentially) compromised files. These are some of the examples of supply chain attacks that are more and more common these days.
To combat those risks, you want to have ephemeral runners: the environment only exists during the execution of the workflow: after it is done, everything is cleaned up. The does mean that caching things becomes a little harder. There are ways to combat that with for example a proxy close by your environment that can do the caching for your (note: this is still a potential risk!).
Photo by ian dooley on Unsplash
Ephemeral runners
GitHub does not have any support for hosting your runner inside a container with some ‘bring your own compute’ options. It uses that setup on the GitHub hosted runners, where a runner environment is created just for your run and destroyed afterwards, but hasn’t released anything for their customers. When you start looking for options, you will find a community curated lists of awesome runners that have been made by the community to host your runners inside k8s, AWS EC2, AWS Lambda’s, Docker, GKE, OpenShift or Azure VM’s (at the time of writing 😄).
Actions runner controller
The one that got recommended to me was the actions-runner-controller: it had an active community (82 contributors, including me now! lots of stars) with a lot of communication on the issues and discussions list.
Hosting in Azure Kubernetes Service
For testing to see if I could get things to working with my bare minimum of k8s knowledge, I created an Azure Kubernetes Service cluster with all the defaults and installed the actions runner controller in it, with all the information in the of the project. Even used a GitHub app for authentication and things worked straight out of the box. You can just run one runner for quick testing, or use the build in scaling options to scale up and down between your limits (for example 0 runners when there is nothing to run and 100 runners as a maximum), or scale up and down based on time windows you define (so scale up to 30 runners at 7AM on workdays, if your company still folows tradditional time slots people are working on).
A remark on scaling
During testing I found that the scaling options for actions-runner-controller have a downside: it can only look at the current queue of workflows and not the set of jobs inside those workflows. That is because GitHub currently does not support loading the queue based on jobs inside workflows. There is development being done on that side, but I have not seen any progress yet.
Hosting on internal Rancher server
My customer that wanted this setup for their own GitHub Enterprise Server (GHE) to have a local runner as well. The security team also wanted to do a security check on the setup and didn’t want to use AKS for that (and notify Microsoft of active pen testing activities on the cluster). They had an internal Rancher setup available that they wanted me to use. The thing is, that this cluster was already tight down a lot: it only could pull internal Docker images, and it had a lot of other restrictions, like using a proxy for all its traffic. This is where things got a little bit more complicated. Their internal images host was a private registry hosted on Artifactory and pulling from public container registries was not available.
Configuring images used by the controller manager
The first thing I ran into was that our Rancher setup sat behind an internal proxy / load balancer that forced all images to be downloaded from an internal Artifactory registry. Depending on the original registry a lookup was done in the allow listed images in Artifactory. These where the images we added to the allow list in Artifactory:
- summerwind/actions-runner-controller:v0.18.2
- summerwind/actions-runner:v0.18.2
- quay.io/brancz/kube-rbac-proxy:v0.8.0
- docker:dind
The only image that could not be pulled transparently with our setup was the one from quay.io
: this registry was not mirrored transparently, which meant that the label was different in Artifactory. As an initial fix I choose to override the image name manually after it has been deployed, using this command:
kubectl <span class="nb">set </span>image deployment/controller-manager kube-rbac-proxy<span class="o">=</span>registry.artifactory.mydomain.com/brancz/kube-rbac-proxy:v0.8.0 <span class="nt">-n</span> actions-runner-system
This means that the controller-manager deployment gets a new image assigned that has the name kube-rbac-proxy and it will reload that container. After that, things actually started running and I could the runner to be available on either the organization or repository level.
Docker in docker (DinD) with internal certs
Our Rancher setup used an internal proxy to pull our images from an Artifactory server that was signed with an internal certificate (without a full trust chain). This meant that the Docker client used to pull the images had to be configured to trust the internal certificate as well or you only got pull errors with an untrusted cert. To accomplish this I build a new DinD container on our runners that where running on a VM and had the certificates installed locally.
Action.yml that build the image:
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Build</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">cd dind</span>
<span class="s"># certs on RHEL are found here:</span>
<span class="s">cp -R /etc/pki/ca-trust/source/anchors/ certificates/</span>
<span class="s">docker build -t ${DIND_NAME}:${TAG} -f Dockerfile --build-arg http_proxy="$http_proxy" --build-arg https_proxy="$http_proxy" --build-arg no_proxy="$no_proxy" .</span>
So that we could load the local certificates
folder into the image (note that you can also used the commented RUN command to hardcode a specific certificate):
FROM docker:dind
<span class="c"># Add the certs from the VM we are running on to this container for secured communication with Artifactory</span>
COPY /certificates /etc/ssl/certs/
<span class="c"># add the crt to the local certs in the image and call system update on it:</span>
<span class="c">#RUN cat /usr/local/share/ca-certificates/docker_registry.crt >> /etc/ssl/certs/ca-certificates.crt</span>
RUN update-ca-certificates
Note: tested with $DOCKER_TLS_CERTDIR
as well: didn’t work
<span class="c"># Add the certs to this image for secured communication with Artifactory</span>
COPY docker_registry.crt <span class="nv">$DOCKER_TLS_CERTDIR</span> <span class="c"># Docker should load the certs from here, didn't work</span>
Note: tested with daemon.json as well: didn’t work
I also tested by adding a daemon.json
<span class="p">{</span><span class="w">
</span><span class="nl">"insecure-registries"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>
And then copied that json file over:
COPY daemon.json /etc/docker/daemon.json <span class="c"># see https://docs.docker.com/registry/insecure/, didn't work</span>
Somehow this file seemed to be ignored and pulling the images from the internal registry still failed.
Loading the new DinD image
Loading the new DinD image could not be done by the same ‘hack’ as used for the image from quay.io. After reaching out to the community they helped me with actually overwriting the controller-manager deployment with both the image from quay.io
and the new DinD
image:
<span class="c1"># controller-manager.yaml:</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Deployment</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">control-plane</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">namespace</span><span class="pi">:</span> <span class="s">actions-runner-system</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">selector</span><span class="pi">:</span>
<span class="na">matchLabels</span><span class="pi">:</span>
<span class="na">control-plane</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">template</span><span class="pi">:</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">control-plane</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">containers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">args</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">--metrics-addr=127.0.0.1:8080</span>
<span class="pi">-</span> <span class="s">--enable-leader-election</span>
<span class="pi">-</span> <span class="s">--sync-period=10m</span>
<span class="pi">-</span> <span class="s">--docker-image=registry.artifactory.mydomain.com/actions-runner-dind:latest</span>
<span class="na">command</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">/manager</span>
<span class="na">env</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">GITHUB_TOKEN</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">secretKeyRef</span><span class="pi">:</span>
<span class="na">key</span><span class="pi">:</span> <span class="s">github_token</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">optional</span><span class="pi">:</span> <span class="no">true</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">GITHUB_APP_ID</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">secretKeyRef</span><span class="pi">:</span>
<span class="na">key</span><span class="pi">:</span> <span class="s">github_app_id</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">optional</span><span class="pi">:</span> <span class="no">true</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">GITHUB_APP_INSTALLATION_ID</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">secretKeyRef</span><span class="pi">:</span>
<span class="na">key</span><span class="pi">:</span> <span class="s">github_app_installation_id</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">optional</span><span class="pi">:</span> <span class="no">true</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">GITHUB_APP_PRIVATE_KEY</span>
<span class="na">value</span><span class="pi">:</span> <span class="s">/etc/actions-runner-controller/github_app_private_key</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">summerwind/actions-runner-controller:v0.18.2</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">manager</span>
<span class="na">ports</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">containerPort</span><span class="pi">:</span> <span class="m">9443</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">webhook-server</span>
<span class="na">protocol</span><span class="pi">:</span> <span class="s">TCP</span>
<span class="na">resources</span><span class="pi">:</span>
<span class="na">limits</span><span class="pi">:</span>
<span class="na">cpu</span><span class="pi">:</span> <span class="s">100m</span>
<span class="na">memory</span><span class="pi">:</span> <span class="s">100Mi</span>
<span class="na">requests</span><span class="pi">:</span>
<span class="na">cpu</span><span class="pi">:</span> <span class="s">100m</span>
<span class="na">memory</span><span class="pi">:</span> <span class="s">20Mi</span>
<span class="na">volumeMounts</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/tmp/k8s-webhook-server/serving-certs</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">cert</span>
<span class="na">readOnly</span><span class="pi">:</span> <span class="no">true</span>
<span class="pi">-</span> <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/etc/actions-runner-controller</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">readOnly</span><span class="pi">:</span> <span class="no">true</span>
<span class="pi">-</span> <span class="na">args</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">--secure-listen-address=0.0.0.0:8443</span>
<span class="pi">-</span> <span class="s">--upstream=http://127.0.0.1:8080/</span>
<span class="pi">-</span> <span class="s">--logtostderr=true</span>
<span class="pi">-</span> <span class="s">--v=10</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">registry.artifactory.mydomain.com/brancz/kube-rbac-proxy:v0.8.0</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">kube-rbac-proxy</span>
<span class="na">ports</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">containerPort</span><span class="pi">:</span> <span class="m">8443</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">https</span>
<span class="na">terminationGracePeriodSeconds</span><span class="pi">:</span> <span class="m">10</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">cert</span>
<span class="na">secret</span><span class="pi">:</span>
<span class="na">defaultMode</span><span class="pi">:</span> <span class="m">420</span>
<span class="na">secretName</span><span class="pi">:</span> <span class="s">webhook-server-cert</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">controller-manager</span>
<span class="na">secret</span><span class="pi">:</span>
<span class="na">secretName</span><span class="pi">:</span> <span class="s">controller-manager</span>
After that, we could deploy our own Single-runner.yaml, which has an option to specify the image to use for the runner:
# runner.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
name: gh-runner
spec:
repository: robbos/testing-grounds
image: registry.artifactory.mydomain.com/actions-runner # overwriting the runner to use our own runner image, the DinD runner comes from the controller
Runner fix
I actually had to patch the runner image as well, because our setup had a tmp folder on a different device, which was causing errors during the bootup of the container. I copied the container definition over from the actions-runner-controller repo and fixed the script, published our own image and told the runner deployment to use it by setting the spec.image
as in the example above.
Other observations
Namespace
Something that took me a while to figure out: the namespace actions-runner-system
is hardcoded in all deployment files, so you cannot change it (easily). Keep that in mind if you want to land in a pre existing namespace with internal pod security policies for example.
Community
The community creating these runner setups is active on both creating the solutions and helping people out. Given that the used setup is actively maintained and they supported me with my questions is a great sign of a good community. Without the community, I would not have been able to get this done, so thanks a lot!