Tuning Alerts in SCOM – Part 1

Introduction

Hi Everyone,

One of the obstacles when deploying SCOM for the first time is getting a handle on the amount of alerts. One of the reasons in my opinion, why SCOM sometimes has a bad reputation.
Luckily, there are a few things you can do to relieve you of some of the ‘alert burden’ :). This post is part one of hopefully many to get your alerts under control.

The first piece of advice I can give you is to set specific SCOM related alerts to informational.

Some alerts include:

  • Operations Manager failed to start a process.
  • Workflow Initialization: Failed to start a workflow that runs a process or script.
  • Operations Manager failed to run a WMI query.

Whilst the alerts are not completely unimportant, they are often categorized as ‘Warning’ or ‘Critical’ alerts, making them seem like a bigger issue than they actually are.
Once you have a few management packs imported you will see these alerts reoccurring a lot, sometimes comprising of up to 40% of the alert count, for alerts that are just related to SCOM!
The cause of the alerts are usually temporary issues like backups, and if they do not reoccur, they are not worthy of any attention. Furthermore to troubleshoot these alerts, you need good knowledge of SCOM as you may want to analyze how the rule or monitor is retrieving its data.
Furthermore the very critical agent alerts such as Heartbeat failures / Failed to connect to computer are monitors, which are not affected by these overrides.
In other words, for most operators, these alerts do not offer a lot of value.

Configuration

By setting these alerts as informational, you can then filter them from the Active Alerts view by only showing the Critical / Warning alerts.2017-09-04 12_07_28-domav401.belgianrail.be - Remote Desktop Connection

If you still want to view these alerts, you can go to the Operations Manger folder. I would then focus on alerts that have a high repeat count, as this may indicate an issue with WMI or other resources.

2017-09-04 12_08_22-prdsc001.belgianrail.be - Remote Desktop Connection

Using this approach, you still have a clue of which servers are having a lot of SCOM issues, as opposed to disabling the rule completely.

To create overrides for this, simply go to the authoring pane in the SCOM console, and scope to Health Service.

2017-09-04 12_53_57-domav401.belgianrail.be - Remote Desktop Connection

2017-09-04 12_56_28-Clipboard

I would recommend changing the severity of these alert rules:

A generic error occurred during computer verification from the discovery wizard
Alert on Backward Compatibility Script Errors.
Alert on Dropped Multi instance Performance Module
Alert on Dropped Power Shell Scripts
Alert on Failed Power Shell Scripts
Alert on Failure to Create PowerShell Run space for Power Shell Script
Replacement Failure For Suppression During Alert Creation
An error occurred during computer verification from the discovery wizard
Workflow Initialization: Failed to start a workflow that queries WMI
Workflow Initialization: Failed to start a workflow that queries WMI for performance data
Workflow Initialization: Failed to start a workflow that queries WMI for WMI events
Workflow Runtime: Failed to run a WMI query
Workflow Runtime: Failed to run a WMI query for performance data
Workflow Runtime: Failed to run a WMI query for WMI events

Right click the rule you want to change the severity
2017-09-04 13_10_54-Clipboard.png

Change the severity to and store in your SCOM override management pack. Click OK.
2017-09-04 13_13_32-Clipboard
If you are responsible for your SCOM environment, do not forget to check on the Operations Manager alerts, especially when you have imported a new management pack.
This wraps up this blog post, hopefully this has helped you getting those alerts under control!

Br,

Jasper

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s