OAuth2 Explained

OAuth2 Explained

I’ve recently involved myself with the implementation of OAuth2 in my current position. During this involvement, I’ve had to read through the OAuth2 specifications and audit our system for compliance. It’s pretty cut and dry as far as compliance goes.. it either works, or it doesn’t work. Since it was working and nobody on the team understood it, I decided to document our OAuth2 flow through our system when a user attempts to access a protected resource. This proved extremely valuable for all of the members of the team to understand the entire flow. When OAuth2 related problems did arise or something needed to be tweaked this document was paramount in the team understanding the problem and addressing it quickly. We pride ourselves on going fast… We are Go developers after all.

What is OAuth2?

You may have heard of the term before, but you definitely have used it before. Whenever you sign into something using your social media account (Google, Facebook, etc.) you are most likely using OAuth2. One of the main hallmarks of using OAuth2 is granting access to some piece of information on your social media account to another service. When you decide to log into a website using your Facebook account, Facebook asks you some questions. These questions are the core of what’s important in OAuth2: delegation. These questions are OAuth2’s way of asking you “Do you trust and are you OK to share this data with that third-party over there?” Facebook is making you aware of what this third party is requesting and if you are OK with that. You can generally decide what access permissions you want to delegate to the app you are logging into and you get to see what this third party needs in order to operate. You can certainly tell if the third party is mining data if it asks for everything about you to simply log in. This is a great layer of transparency! When you agree to these permissions, you are delegating access to this third party. If you’re alarmed, you should be.

One of the many problems OAuth2 is looking to address is transparency. Knowing exactly what has access to  your data and putting that access in your control is one of the many, many, many, reasons I prefer OAuth2. When I want to revoke access, I can! I simply log into my Facebook (or whatever account I used to log in) and see what has access to my account. I can revoke access to anything I want immediately. This control is precisely why OAuth2 has become omnipresent: control of fine-grained third party access to your private data is in your hands.

Now, this isn’t necessarily the case everywhere. Some OAuth2 providers (like Facebook and Google) provide well documented OAuth2 APIs. These APIs defined the backbone of access control: scopes. Each OAuth2 authorization request includes a list of these scopes. Each scope represents pre-defined access to your account data. Some of them (like Facebook’s email) are generally benign and required as something equivalent to a username so the third party knows who you are within their own system. Others (like the Google’s `https://www.googleapis.com/auth/admin.datatransfer` scope for Google) may be used maliciously. In the case of Google, this particular scope allows the relying party to “View and manage data transfers between users in your organization.” Sounds pretty bad right? This is why you have the opportunity to review the requested scopes when granting (or denying) authorization to any third party. You get to audit permissions before granting them! What a wonderful solution! So, if you are using an app that manages your Google organization, this scope is perfectly normal. However, there is a caveat with this auditing. In some certain circumstances, scopes may not be granular enough. A single scope may over deliver in protected data.

When a non-granular scope is used, more information than the user expects is shared. This seems to be the case in Facebook’s Cambridge Analytica fiasco. Facebook has since addressed the issue, but this seems like a permissions issue related to third party access and improper scope compliance. This is all speculation, but it appears that one or more scopes were providing access to more information than they should have. This security loophole would have allowed a third party to access and mine data that was not intended to be shared, but ultimately was accessible. This particular case did not provide transparency for data access. But fear not! There is definitely something you can do about some of this!

Revoking previously granted access to third parties involves logging into your provider account (Facebook, Google, etc) and checking on existing access. For Facebook, check your Account Settings for Apps and Websites (for Google check here). You can review which sites and apps have access to your data and what they have access to. You can also see what access has expired and what has previously been removed. You should review these periodically for anything you don’t use anymore. Apps and websites you don’t use anymore may still have access to your data! Cleaning this up takes about 1 minute and helps lock down your data. While this won’t prevent data leakage (in the case of the Cambridge Analytica scenario) it will maximize protection of your account details.

Now that you know how authorization delegation works, OAuth2 makes this more secure by requiring third party websites and applications to pre-register with them. This creates a trust between the third party and the OAuth2 provider and grants extreme control to the OAuth2 provider as they can revoke ALL access to the third party by blocking their OAuth2 account. Once this trust is established, third parties are granted an ID and a Secret. These two pieces of information are heavily used in various OAuth2 flows.

As a third party, you can obtain Access Tokens in a number of ways called grants. The preferred grant flow is the Code Grant flow (detailed below). Others include Token, Password, and Implicit Grant flows. I won’t cover these in this article but you can read more about them in the OAuth2 Specifications.

Code Grant

The Code Grant. The most secure grant for all parties. This grant requires the third party to begin the OAuth2 delegation flow by requesting the OAuth2 provider begin the Code Grant flow. In this flow, the third party directs the user to the OAuth2 provider where they are prompted to log in and accept (or deny) the requested permissions (via scopes). If you look in the URL, you can see the Client ID, the grant type, and the requested scopes. Once accepted (or denied) the OAuth2 provider redirects the user back to the third party’s callback handler with an OAuth2 code. The third party must handle this code by exchanging it for an Access Token. This is accomplished by presenting the code received from the OAuth2 provider, the third party’s registered ID with the OAuth2 provider, and the secret provided to the third party by the OAuth2 provider. This proves to the OAuth2 provider that the third party initiated this call and it’s not an attempt to hijack an OAuth2 code. Once this is received and validated by the OAuth2 provider, the third party receives an Access Token and an optional Refresh Token.

A simplified view of the OAuth2 Code Grant Flow

Refreshing

The Access Token has a short life depending on what the OAuth2 provider has set. The Refresh Token generally has a longer life (around 2 weeks in many cases). This means the third party has unrestricted access to the data you agreed to provide it whenever it wants to go retrieve it until the Access Token expires (or is revoked by you security minded people out there). Once an Access Token expires (or is revoked) the third party may attempt to renew the Access Token using the Refresh Token by providing the client ID, secret, and Refresh Token to the OAuth2 provider. If access has not been revoked, a new Access Token (and optional Refresh Token) are provided renewing the life of the access. If access has been revoked, neither the Access Token nor the Refresh Token are returned and the third party is notified it is no longer authorized. If the Refresh Token is not used before it expires, then the entire access is automatically expired and a user will need to log in and accept the permissions again.

OAuth2 Refresh Flow

Refresh Tokens have a longer expiration date. Facebook’s is 90 days. On any Facebook activity for that app, the Refresh Token can be renewed. A third party could access your data every day to keep it up to date. This automatically renews the Access Token on Facebook’s servers. A Refresh Token can also be requested every day to keep that from expiring as well. I know I haven’t used Charm Farm in forever, but the access hasn’t expired yet on Facebook!

Improvements

I could see an addendum to the OAuth2 spec to increase transparency by allowing an optional expiration that can be changed by the user during the permission review page or the OAuth2 providers allowing the manipulation of the Access Token and Refresh Token expiration dates. This wouldn’t hinder the operations of an app or website, but will have the added affect of automatically expiring application access you don’t use often at the option of the user. It would also lessen the burden for requiring the user to revoke access for long forgotten apps that are still mooching personal data because they have kept up the refresh cycle. Anything free costs something. OAuth2 has made it easier to cost information. Putting more control into users’ hands is the logical next step. The best thing you can do right now to stay safe is to be vigilant. You should periodically review the access other apps and websites have to your data on all your social media accounts. This will reduce the avenues available for leaking private information.

Managed Website

Managed Website

As I had previously mentioned I recently wiped my host machine for my website. Go ahead, check out the Wayback Machine for it. From beginning to end it’s been rather static and consuming the entire purpose of a remote server. During this “reset” I wanted to more fully utilize my server. I wanted to do this using Docker and isolate any state (the database, as in this article you are reading right now) to a second storage container so that the main container is easily managed in the event things go sideways. This article explains how I did this using Docker and GitLab’s Continuous Integration framework.

Plan the Work

From the start I knew I wanted a WordPress website. WordPress uses MariaDB (or, the MySQL Community Edition) for storage. So, reasonably, I’d want a container for WordPress and a container for MariaDB to separate container responsibilities. Easily enough they already exist. So, we are going to keep the whole orchestration simple by using a single GitLab CI script. To keep things even more simple, there is only one step: deploy. This step will handle everything that is required to deploy the website as an entire solution. This entails:

  1. Create the Storage Container
  2. Create the WordPress Container
  3. Create a Backup Container with scheduled backup job
  4. Create a Dropbox Container with scheduled upload job

Trust me, it sounds way more complicated than it is. We are simply creating 4 containers and 2 volumes then doing some periodic shuffling of data.

Work the Plan

Now that we know all that we need to do, let’s automate it! Our GitLab Runner is a Shell based Runner. This means that the commands are run on the host computer and not within a Docker-in-Docker container. This lets us create and manipulate containers on the Docker host. Pretty handy for automation! Let’s focus on the script block…

Create the Storage Container

The first step in launching any container is to create it. We create the Strorage Container in 5 lines:

  - docker pull mariadb:latest
  - docker stop -t 0 ${STORAGE_CONTAINER} || true
  - docker rm ${STORAGE_CONTAINER} || true
  - docker volume create ${STORAGE_VOLUME}
  - docker run -d --name ${STORAGE_CONTAINER} -v ${STORAGE_VOLUME}:/var/lib/mysql -e "MYSQL_ROOT_PASSWORD=${DB_ROOT_PASSWORD}" -e "MYSQL_DATABASE=${DB_NAME}" -e "MYSQL_USER=${DB_USER}" -e "MYSQL_PASSWORD=${DB_PASSWORD}" --restart always mariadb:latest

On lines 1 – 3 we update the MariaDB image, stop any existing storage containers, then remove any stopped storage containers. This sets up the Docker host for creating a brand spanking new container hot off the update MariaDB image in the Docker repository. The next step (line 4) we create a new named volume. If this volume exists nothing happens. Finally, in line 5, we start a new container with the storage volume mounted it.

Create the WordPress Container

In a very similar process, we create the WordPress container. This container will simply host the WordPress installation and be configured to talk to the MariaDB database.

  - docker pull wordpress:latest
  - docker stop -t 0 ${WEBSITE_CONTAINER} || true
  - docker rm ${WEBSITE_CONTAINER} || true
  - docker volume create ${WEBSITE_VOLUME}
  - docker run -d -P --link ${STORAGE_CONTAINER}:mysql --name ${WEBSITE_CONTAINER} -v ${WEBSITE_VOLUME}:/var/www/html/wp-content -e WORDPRESS_DB_USER=${DB_USER} -e "WORDPRESS_DB_PASSWORD=${DB_PASSWORD}" -e WORDPRESS_DB_HOST=mysql -e WORDPRESS_DB_NAME=${DB_NAME} -e LETSENCRYPT_EMAIL=${EMAIL} -e "LETSENCRYPT_HOST=${HOSTS}" -e "VIRTUAL_HOST=${HOSTS}" --restart always wordpress:latest

Once again, lines 1 – 3 update the WordPress image from the Docker repository, stops any existing containers, then removes the stopped containers. Line 4 we create a new named volume before finally launching a new WordPress container. Line 5 is a little different though. Here, we actually link the Storage Container to the WordPress container. This creates a private network between the two for network traffic. This keeps the Storage Container inaccessible from the Internet at large while still allowing the WordPress container to communicate with the databases it’s hosting. The volume we mount to this container stores plugins and uploads and has nothing to do with the database. Oh, and the container starts with configurations to automatically integrate with out NGINX proxy service that handles automatic SSL configurations (maybe a post later on that one)!

Create a Backup Container with scheduled backup job

What’s a website if it’s not backed up? Risk. So, let’s reduce that! Since we have modular storage for each of the containers we are running, we can mount them to other containers as well. This, admittedly, is a bit risky and I don’t recommend this for very busy sites for the simple reason as the data being backed up has the potential to be written to as it’s being read for backup. This can obviously corrupt a backup or fail a web request.

  - docker pull aveltens/wordpress-backup:latest
  - docker stop -t 0 ${BACKUP_CONTAINER} || true
  - docker rm ${BACKUP_CONTAINER} || true
  - docker run --name ${BACKUP_CONTAINER} -v ${BACKUP_VOLUME}:/backups --volumes-from=${WEBSITE_CONTAINER} --link=${STORAGE_CONTAINER}:mysql -e "BACKUP_TIME=0 5 * * *" -d aveltens/wordpress-backup:latest

Lines 1-3 update the container image, stops it, then cleans up. Line 4 starts a new Backup container with a Backup volume mounted. It also has any volume mounted to the WordPress container also mounted and it’s linked to the MariaDB container. This lets the Backup container copy files from the volumes mounted on the WordPress container and it allows database access to the MariaDB. This particular image we use will automatically compress files from the mounted volumes and perform a database dump to a text file before compressing that as well. This all runs on a configurable schedule defined by the BACKUP environment variable. Each interval defined (in this case every 5 hours) these backup files are created then copied to the backups directory, which we conveniently mounted the Backup Volume.

Create a Dropbox Container with scheduled upload job

Now that we have a Backup Volume containing all our important data, we need to store that somewhere less volatile. I picked Dropbox since it has free storage and I don’t intend to keep every backup I make. Thankfully, there’s a Docker image for that!

    - docker pull janeczku/dropbox:latest
    - docker stop -t 0 ${DROPBOX_CONTAINER} || true
    - docker rm ${DROPBOX_CONTAINER} || true
    - docker run -d --restart=always --name=${DROPBOX_CONTAINER} -v ${BACKUP_VOLUME}:/dbox/Dropbox janeczku/dropbox

Once again, lines 1 – 3 are all about updating, stopping, and cleaning up. Line 4 starts the container using the Dropbox image. We mount that Backup Volume to the Dropbox directory. The Dropbox directory is linked with my Dropbox account and anything that gets placed in this directory is automatically uploaded to the cloud. Viola! We have automated backups! Since this container automatically removes backups older than 90 days, I don’t have to worry for a while about reaching the maximum data for the free tier of Dropbox.

Pulling It All Together

We have several stages of our build we are orchestrating: the container, the database, the backup, and the upload. These are all handled in about 20 lines of code. While this isn’t the smoothest of processes I haven’t had any problem and have been able to replicate it for other websites I also am hosting. This brings me a managed method of automating deployments. It’s pretty cheap to put together and I’m sure it has some room for improvements. If you have any ideas on how to improve this solution, share it with the world! Leave a comment below and I will most likely see if I can incorporate your idea into the next iteration.

Hacktoberfest: Minecraft Server

Hacktoberfest: Minecraft Server

Hacktoberfest is upon us! This month I’m hacking on small projects each week and sharing them.

Previously…

A few weeks ago we found a problem with our GitLab Runner and fixed it. This week, we attempted to make a persistent Minecraft Server using a Dockerfile and the new GitLab Runner to deploy it. We hope to get backups running on the Minecraft Server.

Our Minecraft Server we managed to get working was working great… until we realized it didn’t have any backups. We tried looking for something akin to the WordPress Backup container solution. This didn’t quite pan out as it required a bit of container-to-container communications. I’d like to scale Minecraft hosting out, so while this is a solution, it isn’t a very clean one. Plus, I don’t really want to rely on a 3rd party to update the Dockerfile. So, here we are.

Redefined Requirements

Knowing what we want is half the battle. Figuring out how to do it is the actual hard part. So, we kicked back, grabbed some cookies, and started to think. What do we really want in a perfect Minecraft Server?

  1. We want maximum uptime. If there’s an update, rebooting should pick it up. Done!
  2. We want security. If we need to ban someone or whitelist someone this should persist across reboots. TODO
  3. We want safety. Rebooting should reload the existing world. If something corrupts it, we should be able to recover from a previous backup. TODO

Safety First

For this week, we focused on safety. We want to save our hard work building amazing things so we don’t lose it unexpectedly. To do this, we will need to safely stop the auto save, manually save the world state, back up all of the world files, then start the auto save. This is ideally scheduled as some sort of scheduled task that kicks off every day (or hour). To have the server interact with Minecraft, we will need some sort of RCON utility. So. we leveraged out new-fangled GitLab Runner to help us out.

Getting an RCON utility into a Docker image seemed rather straight-forward. Go get it, make it available to the build context, then copy it to the image giving it executable permissions. Seems easy enough, we can eve use GitLab artifacting since its in the same pipeline!

<br>
rcon-setup:<br>
  stage: stage<br>
  image: golang:latest<br>
  script:<br>
    - "go get github.com/SeerUK/minecraft-rcon/..."<br>
    - "go install github.com/SeerUK/minecraft-rcon/..."<br>
    - "mkdir bin"<br>
    - "cp $GOPATH/bin/minecraft-rcon ./bin"<br>
  artifacts:<br>
    paths:<br>
      - bin/<br>

Here we have a stage (conveniently called stage. I know, so creative!) this runs on the latest Go container and simply pulls the source code to the local Go source path then compiles and installs the binary to the Go binary path. We copy it to the bin directory and artifact it! Now the artifact is in GitLab and is available to downstream dependencies. Let’s build the Docker image!

<br>
build:<br>
  stage: build<br>
  image: docker:latest<br>
  dependencies:<br>
    - stage<br>
  services:<br>
    - docker:dind<br>
  before_script:<br>
    - "docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY"<br>
  script:<br>
    - "docker build -t ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_NAME} --pull ."<br>
    - "docker push ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_NAME}"<br>
  after_script:<br>
    - "docker logout ${CI_REGISTRY}"<br>
  tags:<br>
    - docker<br>

Simple stuff here. Let’s take a look at the Dockerfile itself

<br>
FROM alpine:latest<br>
ARG MC_VERSION=1.13.1<br>
ARG MC_JAR_SHA1=fe123682e9cb30031eae351764f653500b7396c9<br>
ARG JAR_URL=https://launcher.mojang.com/mc/game/${MC_VERSION}/server/${MC_JAR_SHA1}/server.jar<br>
ARG MIN_MEMORY='256M'<br>
ARG MAX_MEMORY='1024M'<br>
ARG MC_CLIENT="c2technology"<br>
ENV CLIENT ${MC_CLIENT}<br>
ENV _JAVA_OPTIONS '-Xms${MIN_MEMORY} -Xmx${MAX_MEMORY}'<br>
RUN mkdir -pv /opt/minecraft /etc/minecraft<br>
RUN adduser -DHs /sbin/nologin minecraft<br>
COPY bin/minecraft-rcon /usr/bin/minecraft-rcon<br>
COPY backup /usr/bin<br>
COPY entrypoint.sh /etc/minecraft<br>
RUN apk add --update ca-certificates openjdk8-jre-base tzdata wget \<br>
    &amp;amp;&amp;amp; wget -O /opt/minecraft/minecraft_server.jar ${JAR_URL} \<br>
    &amp;amp;&amp;amp; apk del --purge wget \<br>
    &amp;amp;&amp;amp; rm -rf /var/cache/apk/* \<br>
    &amp;amp;&amp;amp; chown -R minecraft:minecraft /etc/minecraft /opt/minecraft \<br>
    &amp;amp;&amp;amp; chmod +x entrypoint.sh<br>
EXPOSE 25565<br>
USER minecraft<br>
WORKDIR /etc/minecraft<br>
ENTRYPOINT ["./entrypoint.sh"]<br>

Starting with a minimal Linux Alpine container, we set some arguments to the Dockerfile. These can be overwritten as arguments passed to the docker build command. They must be defined in the Dockerfile in order to override them. We have some reasonably safe defaults here. We set some environment variables in the resulting container, make a directory, add a user, then we copy the RCON Go binary (from the artifacts copied into the Docker build context by GitLab’s artifact system) over to the container as well as the backup script we wrote. Then we install some dependencies, expose the Minecraft server port, switch to the Minecraft user, set the working directory, then run the entrypoint.sh script. Let’s take a look at that entrypoint.

<br>
echo 'eula=true' &amp;gt; /etc/minecraft/eula.txt<br>
crontab -l | { cat; echo "0 */6 * * * backup"; } | crontab -<br>
java -jar /opt/minecraft/minecraft_server.jar nogui<br>

Not too complicated. This auto-accepts the EULA (Minecraft requires this to run) then sets up a job that runs every 6 hours to execute a backup command. Then finally runs the Minecraft server. This is what we wanted to be able to do in the first place — back things up on a schedule. We could have the scheduled interval for running the backup command configurable, which we will most likely do after we get this thing working (this is Hacktoberfest after all). So… let’s take a look at that backup script.

<br>
#!/bin/sh<br>
minecraft-rcon save-off<br>
minecraft-rcon save-all<br>
tar czf /opt/backups/$(date +%Y-%m-%d)-mc-${CLIENT}.tar.gz /opt/minecraft/<br>
minecraft-rcon save-on<br>

Easy peasy! using that new minecraft-rcon binary, we turn automatic saving of the Minecraft world off so we can access it without it changing on us (and corrupting backup). We make one final save, tar it all up, then turn automatic saving back on. This seems to be the right thing to do so we don’t corrupt the world or save a corrupted version. We’ll see if this actually works when we get it running. If not, this is the file we can update to get it to correctly work — even if it means stopping the Minecraft service then restarting it.

Now that we have the Docker container published to our repository, we can update the existing Minecraft Server YAML to use it!

<br>
deploy:<br>
  script:<br>
    - docker pull minecraft-docker:latest<br>
    - docker exec minecraft backup<br>
    - docker stop minecraft || true<br>
    - docker rm minecraft || true<br>
    - docker run -d --name minecraft -p 25565:25565 \<br>
        -v minecraft-world:/opt/minecraft/data/world \<br>
        -v minecraft-config:/opt/minecraft/config \<br>
        -v minecraft-mods:/opt/minecraft/mods \<br>
        -v minecraft-plugins:/opt/minecraft/plugins \<br>
        --restart always minecraft-docker:latest<br>
    - docker cp ./config/* minecraft:/opt/minecraft/config/<br>
    - docker cp ./data/* minecraft:/opt/minecraft/data/<br>
    - docker cp ./mods/* minecraft:/opt/minecraft/mods/<br>
    - docker cp ./plugins/* minecraft:/opt/minecraft/plugins/<br>
    - docker exec minecraft backup<br>
    - docker restart minecraft<br>

We kick things off by pulling the latest minecraft-docker image. This will pull the private repository image we just published into the local Docker-in-Docker container that’s running this build. Then we backup the existing world if it exists before stopping the current Minecraft server. After that, we remove it and create a new container with various mounts. We then copy over the configurations and anything else we have version controlled before backing it up once again and restarting it. We back it up so many times right now because we’re not sure if this will corrupt the world data. Once we do know what happens, we will come back and clean this up a bit.

Conclusion

Ultimately, we didn’t hit our goal to get this working in a week. However, we will continue to work on this so our world can be saved (if only it were that easy)! If you have any tips or thoughts on this, please comment below! I’d love to hear about your solutions or for you to share your experience if you’ve done something similar.

Hacktoberfest: Gitlab Artifacts

Hacktoberfest: Gitlab Artifacts

Hacktoberfest is upon us! This month I’m hacking on small projects each week and sharing them.

Background

Gitlab is a great alternative to GitHub. One of the main features I find is the ability to have unlimited private repositories for free.  This lets me work on things without having them publicly exposed until I’m ready for them to be. In addition to private repositories, it also has a private Docker Registry you can store your Docker images in. GitLab also has other built in CI/CD capabilities like secrets that are passed to the CI/CD orchestration file. GitHub has CI/CD capabilities too, but I feel like GitLab seems less involved for setting it all up.

All of my CI/CD jobs are orchestrated in a gitlab-ci.yml file that sits in the repository. Couple this with a self-hosted GitLab Runner with Docker installed and I have a true CI/CD solution where tagging master triggers an automatic build of the tagged code, publishing the Docker image built during this job, and a deployment of that Docker image to a container (replacing the existing on if present). While this does require some thought into how to persist data across deployments, it does make it very easy for automatic deployments. In the event something does go wrong (and it will) you can easily re-run any previous build making rollbacks extremely simple with a one click solution. Repeatability for the win!

Artifacts

So, this week I was attempting to learn the Artifact system of the GitLab API. Instead of mounting a directory to pass files between jobs, GitLab has an Artifacts API that allows GitLab to store (permanently or temporarily) any number of artifacts defined in a successful pipeline execution. These artifacts are available via the web and the Artifacts API. I have several Go projects that could benefit from cross-compiling the binaries. Why not store these compilations so they are easy to grab whenever I need them for a specific environment? As an added benefit, after compiling once, I could deploy to various environments using these artifacts in downstream jobs. So, I jumped at this chance and found it less than ideal.

The Job

There is a Job API document defining how to create artifacts within a pipeline. It looks as simple as definining artifacts in your gitlab-ci.yml file:

<br>
artifacts:<br>
  paths:<br>
    - dist/<br>

Creating artifacts is the easiest bit I came across. This will auto upload the contents of the dist folder to the orchestration service which will make it available on the GitLab site for that specific job. Easy peasy!

Getting those artifacts to a downstream job is pretty easy as well if you keep in mind the concept of a build context and have navigated GitLab’s various API documents. Thankfully, I’ve done that boring part and will explain (with examples) how to get artifacts working in your GitLab project!

The Breakdown

The API documentation that shows how to upload artifacts also shows how to download artifacts. How convenient! Unfortunately, this is not within the framework of the  gitlab-ci.yml file. The only API documentation you should need for multi-job single-pipeline artifact sharing is here:  YAML API. For other uses (cross-pipeline or scripting artifact downloads) you can see the Jobs API (warning: cross-pipeline artifacts are a premium only feature at the time of writing).

Looking at the Dependencies section the dependencies definition should be used in conjunction with artifacts. Defining a dependent job will cause an ordered execution and any artifacts on the dependent job will be download and extracted to the current build context. Here’s an example:

<br>
dependencies:<br>
  - job:name:<br>

So, what’s a build context? It’s the thing that’s sent to Docker when a build is triggered. If the artifacts aren’t part of the build context, they won’t be available for a Dockerfile to access (COPY, etc).

Here’s the gitlab-ci.yml example:

<br>
stages:<br>
  - build<br>
  - release</p>
<p>build-binary:<br>
  stage: build<br>
  image: golang:latest<br>
  variables:<br>
    PROJECT_DIR: "/go/src/gitlab.com/c2technology"<br>
  before_script:<br>
    - mkdir -p ${PROJECT_DIR}<br>
    - cp -r $CI_PROJECT_DIR ${PROJECT_DIR}/${CI_PROJECT_NAME}<br>
    - go get github.com/tools/godep<br>
    - go install github.com/tools/godep<br>
    - cd ${PROJECT_DIR}/${CI_PROJECT_NAME}<br>
    - godep restore ./...<br>
  script:<br>
    - ./crosscompile.sh<br>
  after_script:<br>
    - cp -r ${PROJECT_DIR}/${CI_PROJECT_NAME}/dist $CI_PROJECT_DIR<br>
  tags:<br>
    - docker<br>
  artifacts:<br>
    paths:<br>
      - dist/<br>
    expire_in: 2 hours</p>
<p>publish-image:<br>
  stage: release<br>
  image: docker:latest<br>
  services:<br>
    - docker:dind<br>
  only:<br>
    - "master"<br>
  variables:<br>
    DOCKER_HOST: tcp://docker:2375/<br>
    DOCKER_DRIVER: overlay2<br>
  before_script:<br>
    - "docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY"<br>
  script:<br>
    - docker build -t ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_NAME} --pull $CI_PROJECT_DIR<br>
    - docker push ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_NAME}<br>
  after_script:<br>
    - "docker logout ${CI_REGISTRY}"<br>
  tags:<br>
    - docker<br>
  dependencies:<br>
    - build-binary<br>

And the crosscompile.sh

<br>
#!/bin/sh<br>
echo "Cross compiling alexa-bot..."<br>
for GOOS in darwin linux windows; do<br>
  for GOARCH in 386 amd64; do<br>
    echo "Building $GOOS-$GOARCH"<br>
    export GOOS=$GOOS<br>
    export GOARCH=$GOARCH<br>
    go build -o dist/alexa-bot-$GOOS-$GOARCH<br>
  done<br>
done<br>
echo "Complete!"<br>

In this example, we cross-compile a Go binary and artifact it in the build-binary job before executing the public-image job (which is dependent on build-binary). This downloads and extracts the artifacts from build-binary and sends the project directory (including the downloaded artifacts) as the build context to Docker when building the Dockerfile. Let’s look at the Dockerfile:

<br>
FROM alpine:latest<br>
RUN apk add --no-cache --update ca-certificates<br>
COPY dist/alexa-bot-linux-amd64 /bin/alexa-bot<br>
RUN chmod +x /bin/alexa-bot<br>
EXPOSE 443<br>
# Run the binary with all the options<br>
ENTRYPOINT [ "alexa-bot -c " + $CLIENT_ID + " -s " + $CLIENT_SECRET + " -x " + $SERVER_SECRET + " -t " + $PRODUCT_TYPE_ID + " -n 123 -v " + $VERSION + " -r " + $REDIRECT_URL + " -z " + $TOKEN_URL ]<br>

You can see here that the Dockerfile starts with Alpine Linux as the base Docker image, updates the CA certificates then copies the dist/alexa-bot-linux-amd64 binary from the Docker build context to the compiled Docker image then gives it executable permissions. The rest of the file sets up the port the binary will listen to and passes the configurations to the image.

Once this Docker image is pushed to GitLab’s private repository, the Docker image is available to run (provided with some environment configurations)!

Conclusion

All in all, this seems to work out great for single pipeline builds. When you get into multi-pipeline builds things get even trickier. I found that the artifacts system didn’t quite meet my requirements and I opted for condensing builds and not artifacting the compiled binaries. I could, however, trigger a pipeline that cross-compiles and artifacts the compiled binaries and run a second pipeline that also cross-compiles the binaries (duplicated work) then creates an image. Ultimately, I didn’t really care about the artifacts as my purpose was always to create the Docker images. As usual, your mileage may vary!