Continuous Integration is an organisation problem

After 10 years in DevSecOps, there is an assumption that that every project is unique and needs to deploy their own Continuous Integration (CI) instance and write their own Continuous Integration/Continuous Deployment (CI/CD) pipelines

 

However within a CI Pipeline you should be producing a build artefact which is supplied into a CD pipeline. Many CD Pipelines can be triggered by various means. This allows you to manage CI and CD pipelines separately. 

While the mechanism and configuration of a deployed product can vary greatly between software products the CI pipelines face the same constraints which mean each project implementation is highly limited in the process it must implement. This blog will outline the reasoning behind the last statement

There are only so many build systems

Modern software languages have build automation systems and Dependency Management Systems, these aim to automate the mundane tasks in building, testing and releasing a software project. The result of a build automation system is a 'build artefact', this is a container/wrapper for the software which can be executed or referenced by other software projects.

The key advantage of using a common system over a bespoke solution is it becomes easier to hire staff with those skills, you don't have to support the build automation system and there are external training and support options. As a result every software programming language will focus on a few build automation solutions. Below are some examples, although Wikipedia has a complete list.

C Java Node.JS Python Ruby
cmake Apache Ivy NPM Anaconda RubyGems
Meson Apache Maven Yarn Setuptools

Gradle


Build systems are very similar in how they work

From Ivy to Setuptools  each build automation system expects a configuration file (ivy.xml, package.json, rubygems, setup.py, etc..) with the details required to build, test & release a software project. This means any Continuous Integration pipeline for Language X and Build Management Y should call the same sequence of commands. Again I've provided a table below to show some examples.


Maven NPM SetupTools
Retrieve Dependencies mvn process-resources npm install
python -m setup.py build
Compile mvn compile npm run build –if-present N/A
Test mvn test
mvn integration-test
npm test python -m setup.py test
Package mvn package N/A N/A
Publish mvn deploy npm publish python -m twine upload

The fact we have a defined set of build steps, means there is a limited number of permutations for building a software project using a specific build automation system.

Limited Build Systems and Limited Workflows

While each team can operate under its own software development methodology, each organisation will have its own processes, requirements and constraints.

When you look at the software methodologies teams are using and the organisational needs you will discover there are only a limited number of ways a CI can be used by a software team (typically Smoke Testing and a Release pipeline).

Example of a generic release pipeline

At this point we have identified there is a limited set of build automation systems a organisation will use, there are a limited set of pipelines teams can implement and within those pipelines all teams are going to implement the same steps.

Worked Example

Your organisation might require all Java projects to use the latest Java JDK, to run the SonarQube scanner on all releases and capture the code coverage.

From an implementation perspective every project using the Apache Maven build automation system will have a pipeline that runs the Maven clean test, test-integration lifecycle phases and will call the SonarQube Plugin to scan the source code. Since SonarQube only supports Jacoco code coverage every project will use the Jacoco plug-in.

Do we need every team to implement that pipeline, or could we do it once and reuse the solution?

Continuous Integration and Build Pipelines

Around 2016 Continuous Integration systems started developing their own domain specific language syntax so the build process could be scripted in a file which could be managed similar to other source code. This means we can manage build scripts centrally by a single team

The ideal structure of the scripts is to define tools and configuration which should be deployed to the build agent and then a sequence of build automation commands to run on the build agent.

You can see this in the GitHub Action script I've outlined below
	
name: Build Verification of Maven project

on:
  pull_request:
    types: [opened, synchronize, reopened]
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Check out Source Code
        uses: actions/checkout@v2.5.0

      - name: Set up JDK 17
        uses: actions/setup-java@v3
        with:
          cache: 'maven'
          distribution: 'temurin'
          java-version: 17

      - name: Build with Maven
        run: mvn --batch-mode --settings .github/maven_settings.xml clean install
        env:
          GITHUB_USER: ${{ secrets.SCM_USER }}
          GITHUB_TOKEN: ${{ secrets.SCM_TOKEN }}

  analyse:
    runs-on: ubuntu-latest
    steps:
      - name: Check out Source Code
        uses: actions/checkout@v2.5.0

      - name: Set up JDK 17
        uses: actions/setup-java@v3
        with:
          cache: 'maven'
          distribution: 'temurin'
          java-version: 17

      - name: Jacoco Coverage
        run: mvn --batch-mode --settings .github/maven_settings.xml org.jacoco:jacoco-maven-plugin:0.8.8:prepare-agent test org.jacoco:jacoco-maven-plugin:0.8.8:report install
        env:
          GITHUB_USER: ${{ secrets.SCM_USER }}
          GITHUB_TOKEN: ${{ secrets.SCM_TOKEN }}

      - name: SpotBugs
        run: mvn --batch-mode --settings .github/maven_settings.xml com.github.spotbugs:spotbugs-maven-plugin:4.6.0.0:spotbugs -Dspotbugs.xmlOutput=true -Dspotbugs.effort=max -Dspotbugs.failOnError=false -Dspotbugs.threshold=low
        env:
          GITHUB_USER: ${{ secrets.SCM_USER }}
          GITHUB_TOKEN: ${{ secrets.SCM_TOKEN }}

      - name: SpotBugs Annotation
        uses: jwgmeligmeyling/spotbugs-github-action@master
        with:
          path: '**/spotbugsXml.xml'

      - name: PMD
        run: mvn --batch-mode --settings .github/maven_settings.xml --file pom.xml org.apache.maven.plugins:maven-pmd-plugin:pmd org.apache.maven.plugins:maven-pmd-plugin:cpd -Daggregate=true -DminimumPriority=1 
        env:
          GITHUB_USER: ${{ secrets.SCM_USER }}
          GITHUB_TOKEN: ${{ secrets.SCM_TOKEN }}

      - name: PMD Annotation
        uses: jwgmeligmeyling/pmd-github-action@master
        with:
          path: '**/pmd.xml'

      - name: SonarCloud Analysis
        # Fixed version of the plugin as the newer one has a Java 11 dependency.
        run: mvn --batch-mode --settings .github/maven_settings.xml --file pom.xml org.sonarsource.scanner.maven:sonar-maven-plugin:sonar
        env:
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

This script grabs an Ubuntu docker image and deploys the Eclipse Temurin 17 Java Development Kit and Maven 3.

A M2 User Settings file is held within the project repository  which contains the Id/Location of of our GitHub Package Repositories and we supply user credentials to access them. Then all we are doing is running Maven commands to build and analyse our software project.

This build file will work on any Java 17 Compatible Maven built Java project, it doesn't matter if that project is an Eclipse Rich Client Program or a Spring Boot micro service the commands don't change and so the file can be reused for both.

But we use <Insert latest cool CI>!

One of the common early pitfalls of writing CI Pipeline scripts is not embracing the Domain Specific Language of the CI. People will often write large bash scripts where they manually download a tool, or define a docker image with all their configuration.

The result is the CI solution becomes incredibly bespoke, it isn't that the solution won't work on a different CI, it wouldn't work on a different deployment of the same CI.

Embracing the CI DSL and accepting you are going to run preset commands actually makes it easier to port pipelines between CI instances, for instance the GitHub Workflow above was originally ported from this Jenkinsfile

pipeline {
    agent any
    environment {
        scmUser="scmHTTPUser"
        scmURL=""
        // Maven Tool Configuration
        jdkTool="openjdk-17"
        mavenTool="maven-3.6.3"
        mavenConfig="maven-user-settings-file"

        // Analysis Specific Configuration
        checkstyleConfig="java-checkstyle-file"
        sonarEnv="sonarqube"
    }
    stages {
        //  Used to confirm the project has a valid POM and so the other steps should work correctly
        stage('clean') {
            steps {
               withMaven(jdk: "${env.jdkTool}", maven: "${env.mavenTool}",  mavenSettingsConfig: "${env.mavenConfig}") {
                   sh "mvn clean"
               }
            }
        }
        // Used to execute Jacoco Coverage reports to ensure even if it is not specified within the POM we get a
        // coverage report to submit to sonar
        stage('test') {
            steps {
                withMaven(jdk: "${env.jdkTool}", maven: "${env.mavenTool}", mavenSettingsConfig: "${env.mavenConfig}") {
                    sh "mvn org.jacoco:jacoco-maven-plugin:0.8.8:prepare-agent test org.jacoco:jacoco-maven-plugin:0.8.8:report install"

                }
            }
        }
        stage('analysis') {
            steps {
                configFileProvider([configFile(fileId:"${env.checkstyleConfig}", targetLocation:"/tmp/checkstyle.xml")]) {
                    withMaven(jdk: "${env.jdkTool}", maven: "${env.mavenTool}", mavenSettingsCon                        echo "Run standard set of Java analysis on project"
                        sh "mvn org.apache.maven.plugins:maven-pmd-plugin:cpd -Daggregate=true -DminimumPriority=1 || true"
                        sh "mvn org.apache.maven.plugins:maven-pmd-plugin:pmd -Daggregate=true -DminimumPriority=1 || true"
                        sh "mvn com.github.spotbugs:spotbugs-maven-plugin:spotbugs -Dspotbugs.xmlOutput=true -Dspotbugs.failOnError=false -Deffort=Max -Dthreshold=Low || true"                    }
                }
            }
        }
        stage('sonar-analysis') {
            steps {
                withSonarQubeEnv("${env.sonarEnv}") {
                    withMaven(jdk: "${env.jdkTool}", maven: "${env.mavenTool}", mavenSettingsConfig: "${env.mavenConfig}") {
                        echo "Package build and ensure a version is available for analysis tools"
                        sh "mvn org.sonarsource.scanner.maven:sonar-maven-plugin:sonar"
                    }
                }
            }
        }
    }
    post {
        always {
            jacoco exclusionPattern: '**/*Test*.class', inclusionPattern: '**/*.class'
            junit allowEmptyResults: true, testResults: '**/target/test-reports/*.xml'
            recordIssues(tools: [
                                 cpd(pattern: '**/target/cpd.xml'),
                                 findBugs(pattern: '**/target/spotbugsXml.xml', useRankAsPriority: true),
                                 java(),
                                 mavenConsole(),
                                 pmdParser(pattern: '**/target/pmd.xml')
                                ])
        }
    }
}

	

You'll notice instead of calling --settings, we use the Jenkins 'withMaven' DSL to deploy a configuration file along with OpenJDK 17 and Maven 3, apart from that the Maven commands we use haven't changed.

The only other real change is Jenkins collected build results in a 'post' phase of the build process, while we need to call GitHub Actions during the build process to achieve the same thing.

But What About Deployment?

Each build automation system is designed to produce a 'build artefact'. When a build automation system is coupled with dependency management (as almost all of them are) you will find a large centralised managed repository and typically the ability to self host a build artefact repositories.

The CD pipeline should pull artefacts from the artefact repository and deploy them into the target environment. This creates a nice point of separation, everything before the artefact repository is 'CI' and everything after is 'CD'.

Artefact Versioning

When using a dependency or deploying a solution you want to deploy a known version of that artefact. This is so you can identify and assign issues to that version of the artefact and allows you to easily reproduce the problem.

When we look at build systems we can find they all offer the ability to generate a versioned artefact.


Maven NPM SetupTools
Publish mvn release npm publish python -m twine upload

The fact there is a standard way to release an artefact using a build automation system means you can write a single pipeline script to perform this action and reuse it across projects.

Lastly I've included to common issues when you don't implement  artefact versioning.

Maven Snapshots and a common pitfall

Maven has the concept of 'SNAPSHOT', this is a time value appended to the version, it allows you to use have Maven projects with multiple 'modules' and use the latest module version.

Maven has the concept of a local and remote repository, you 'install' to the local repository and 'deploy' to the remote. A common pitfall is configuring Maven dependencies to use 'SNAPSHOT' and then setting the CI smoke test pipeline to 'deploy'.

This can result in the developing team hitting weird errors the the CI build of the pull request might be the latest version of a Maven module and so is used on each developers machine over their local artefacts.

Docker the Latest headache

Versioning in Docker is performed using tags, a common pattern is to use the tag 'latest' to indicate the latest release of a docker image.

When retrieving a docker image, the docker host will look to see what images it has via the name/tag if it has a image with that name it doesn't reach out to any repositories. This means you can think you are on the latest deployment, but actually your on a 6 month old one.

But I have a really unique requirement!

A lot of projects will justify building their own CI/CD pipeline because there is a unique aspect. While this can be true (and is usually around deployment) we can focus on building a pipeline just for the unique aspect.

This is because a CI is designed to detect triggering events and schedule runs of a script. There is no requirement that only one script/job/workflow can be triggered for each software project.

Which means our organisation can provide common build pipelines (e.g. a smoke test pipeline, a release pipeline, etc..) and teams can configure the CI to run project specific pipelines.


From a technical debt perspective its actually better to have multiple pipelines. It allows each pipeline to have a specific focus which can be documented, it means we can quickly identify what went wrong in a build and the more targeted nature makes it easier to reuse a pipeline across projects.

Comments

Popular posts from this blog

Why you should always check your requirements

How tools have their own workflow