Blog

How To Manage Cloud Build Alerts

19 Jan, 2023

After automating several processes using Cloud Build you get bored by checking build statuses. Gladly, Cloud Build offers Cloud Build notifiers to get that sorted. Sadly, you need to deploy the notifier to every project. Therefore, this blogs leverages Custom Metrics and Notification channels to centrally manage alerts.

Build Alert Management

Application projects publish Cloud Build status metrics. The Monitoring project uses Alert policies to raise incidents and/or send notifications for failed builds.

Implementation

Build Status Metric

Cloud Build, currently, doesn’t publish any metric. Therefore, the gcp-cloud-build-status-publisher-utility publishes a status metric based on Cloud Build notifications.

The status metric, custom.googleapis.com/build/status_count, is a generic_task resource representing a Build Trigger. The status label contains the build status (failure, succeeded, etc.) at a given time. The mapping from Cloud Build event to status metric is shown below.

{
  "event_time": "most recent time of cloud build event, latest(createTime, startTime, finishTime)",
  "metric": {
    "type": "custom.googleapis.com/build/status_count",
    "labels": {
      "status": "build status (queued, working, etc.)",
      "failure_type": "failure reason (timeout, user error, etc.)",
      "failure_detail": "failure information (failed step logs)"
    }
  },
  "resource": {
    "type": "generic_task",
    "labels": {
      "location": "cloud build trigger location (global or region)",
      "namespace": "cloud build trigger project id",
      "job": "cloud build trigger id",
      "task_id": "cloud build job id"
    }
  }
}

Quickly deploy the status publisher using the deployment example on GitHub.

Failed Build Alerts

You most likely check build statuses for errors. Therefore, let’s configure a Failed Build-alert. This alert sends an email as soon as a build fails.

Alert policies configure alerts. Each policy consists of two elements: the conditions for raising the alert, and the notification channel to send the alert. This policy triggers on FAILURE status events for each individual build e.g. count(custom.googleapis.com/build/status_count{status=FAILURE}) by (task_id) and uses an email-notification channel:

resource "google_monitoring_alert_policy" "failed_build" {
  display_name = "Failed Build"
  ...

  conditions {
    condition_threshold {
      filter = 'resource.type = "generic_task" AND metric.type = "custom.googleapis.com/build/status_count" AND metric.labels.status = "FAILURE"'

      aggregations {
        group_by_fields = [ "resource.label.task_id" ]
      }

      comparison = "COMPARISON_GT"
      trigger {
        count = 1
      }
  }

  notification_channels = [
    google_monitoring_notification_channel.email.id,
  ]
}

Get the full example from GitHub.

Bonus: Alert Documentation

The default Alert firing! notification doesn’t include instructions to handle the alert. Gladly, this information can be included as documentation.

The documentation below, for example, shares the error details and points the receiver to the build logs.

resource "google_monitoring_alert_policy" "failed_build" {
  ...

  documentation {
    content   = <<EOT
    ## A build has failed!

    A build failed for project $${resource.label.namespace}, due to the following error:

    $${metric.label.failure_detail}

    For additional information, check the [build logs](https://console.cloud.google.com/cloud-build/builds/$${resource.label.task_id}?project=$${resource.label.namespace})
    EOT
  }

  ...
}

Discussion

Cloud Monitoring is great for alerts. For informational notifications, you’ll need to bring your own notification mechanism such as Cloud Build notifiers. These mechanisms are necessary to tailor the notification to your needs.

The proposed implementation stream uses custom metrics. You might opt for log-based metrics to prevent the deployment of the status publisher and rely on log sink(s). I didn’t do this, because I prefer using a built-in event over my own understanding of the log lines. Besides preferences, the event is richer in terms of information. While this doesn’t matter for the status metric, you could also leverage the infrastructure to add a execution time metric. Adding more custom metrics, however, doesn’t seem like the right path. Ideally, these metrics are provided by the Cloud Build service. It’s unclear why this isn’t the case. However, given that there are several feature requests for this, I expect to see some traction on this topic in the future. Especially since Cloud Workflows offers similar metrics.

An alternative implementation would forward the Cloud Build events to a (central) monitoring Pub/Sub topic. Allowing you to configure the Cloud Build notifiers on the monitoring topic, thereby centralizing notification management. I didn’t opt for this, because this is not as flexible as Cloud Monitoring metrics scopes. These scopes can span multiple projects and projects can belong to multiple metrics scopes. In Pub/Sub terms this translates to subscriptions for each metrics scope in your Cloud Build project – although you could also use a central topic and fan-out from there. As a result you’ve added Pub/Sub subscription management to your job. Since we set out to decrease boring operations, I wonder if we succeed with this alternative implementation.

Conclusion

Monitoring the status of your builds is a boring task. Improve your working life by implementing alerts based on Cloud Build events. For a single project, stick to the Cloud Build notifiers. For a multi project, use the custom build status metric to centrally manage alerts.

Image by Mircea Ploscar from Pixabay

As a cloud consultant I enjoy taking software engineering practices to the cloud. Continuously improving the customers systems, tools and processes by focusing on integration and quality.

Explore related posts