Blog

How to alert on inactive Windows scheduled tasks in Google Cloud

07 Jul, 2023
Xebia Background Header Wave

Recently a customer reported several incidents with a data pipeline, causing the business to complain about the lack of visibility on the data transfer processes. Sadly, the processes spanned multiple systems and suppliers, making it difficult to provide a single pane of glass. To sort this, checks and balances are added to data pipeline steps to identify the bottleneck, and enable a single pane of glass.

One of these checks is to ensure that the scheduled task runs. The implementation of this check is shared in this blog.

Curious about monitoring scheduled task results? Check out this blog.

Scheduled Task Monitoring

The next steps are needed to monitor your scheduled tasks:

  1. Enable Task Scheduler logging
  2. Forward Windows Event Logs to Cloud Logging
  3. Monitor inactive tasks with Cloud Monitoring

Try it yourself using the reference implementation on GitHub.

Scheduled Task Logging

“The Task Scheduler service allows you to perform automated tasks. With this service, you can schedule any program to run at a convenient time for you or when a specific event occurs. The Task Scheduler monitors the time or event criteria that you choose and then executes the task when those criteria are met.” – About the Task Scheduler.

Task Scheduler execution details are logged to the Windows Event Log by enabling all tasks history. These logs indicate that a task and its associated actions started and completed e.g. “Task Scheduler successfully completed task ‘YourTask’ , instance ‘{72efc060-52b3-4a0a-a656-c7527c912082}’ , action ‘C:\Program Files\PowerShell\7\pwsh.exe’ with return code 0.”. You enabled these logs using the following command:

wevtutil set-log Microsoft-Windows-TaskScheduler/Operational /enabled:true

Forwarding Windows Event Logs

The Windows Event Log is forwarded to Cloud Logging by the Ops Agent. More specifically, the Task Scheduler Event Log is forwarded using the following configuration:

logging:
  receivers:
    windows_event_log:
      type: windows_event_log
      channels: [System, Application, Security, "Microsoft-Windows-TaskScheduler/Operational"]
      receiver_version: 2

Monitoring Inactive Tasks

A Log-based metric tracks the task activity by counting recent runs based on scheduled task logs. When no relevant logs are detected in the past 10 minutes, a metric-absence alert is fired:

resource "google_logging_metric" "failuretask_not_running" {
  project = var.project_id
  name   = "task-runner/failuretask_run_count"

  filter = <<EOT
  resource.type="gce_instance"
  jsonPayload.Channel="Microsoft-Windows-TaskScheduler/Operational"
  jsonPayload.EventID="201"
  jsonPayload.StringInserts="\\FailureTask"
  EOT

  metric_descriptor {
    metric_kind = "DELTA"
    value_type  = "INT64"
  }
}

resource "google_monitoring_alert_policy" "failuretask_not_running" {
  project = var.project_id
  display_name = "FailureTask not running"

  combiner = "OR"
  conditions {
    display_name = "FailureTask not running"

    condition_absent {
      filter = <<EOT
      resource.type = "gce_instance" AND metric.type = "logging.googleapis.com/user/${google_logging_metric.failuretask_not_running.name}"
      EOT
      aggregations {
        group_by_fields = [
          "metadata.system_labels.name",
        ]
        alignment_period = "300s"
        per_series_aligner = "ALIGN_SUM"
        cross_series_reducer = "REDUCE_COUNT"
      }

      duration = "600s"
      trigger {
        count = 1
      }
    }
  }
}

Discussion

You might consider alerting on inactive tasks using a Windows agent. While this method could offer faster results, it requires you to ensure that the Windows agent is up and running. Effectively, forcing you to another mean of monitoring processes outside of the virtual machine.

Furthermore, you might consider different means to run the scheduled task such as Cloud Scheduler or Cloud Workflows. These means offer additional visibility in the task execution history and allow for automation to retrigger the pipeline on failures.

Conclusion

Alerting on inactive scheduled tasks is easy with metric-absence alerting policies based on application logs. Furthermore, the metric serves as checks and balance in the customer data pipeline, allowing other systems to tie into the metric to identify issues and bottlenecks.

Image by vined mind from Pixabay

Laurens Knoll
As a cloud consultant I enjoy taking software engineering practices to the cloud. Continuously improving the customers systems, tools and processes by focusing on integration and quality.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts