One of the obstacles when deploying SCOM for the first time is getting a handle on the amount of alerts. One of the reasons in my opinion, why SCOM sometimes has a bad reputation.
Luckily, there are a few things you can do to relieve you of some of the ‘alert burden’ :). This post is part one of hopefully many to get your alerts under control.
The first piece of advice I can give you is to set specific SCOM related alerts to informational.
Some alerts include:
- Operations Manager failed to start a process.
- Workflow Initialization: Failed to start a workflow that runs a process or script.
- Operations Manager failed to run a WMI query.
Whilst the alerts are not completely unimportant, they are often categorized as ‘Warning’ or ‘Critical’ alerts, making them seem like a bigger issue than they actually are.
Once you have a few management packs imported you will see these alerts reoccurring a lot, sometimes comprising of up to 40% of the alert count, for alerts that are just related to SCOM!
The cause of the alerts are usually temporary issues like backups, and if they do not reoccur, they are not worthy of any attention. Furthermore to troubleshoot these alerts, you need good knowledge of SCOM as you may want to analyze how the rule or monitor is retrieving its data.
Furthermore the very critical agent alerts such as Heartbeat failures / Failed to connect to computer are monitors, which are not affected by these overrides.
In other words, for most operators, these alerts do not offer a lot of value.
By setting these alerts as informational, you can then filter them from the Active Alerts view by only showing the Critical / Warning alerts.
If you still want to view these alerts, you can go to the Operations Manger folder. I would then focus on alerts that have a high repeat count, as this may indicate an issue with WMI or other resources.
Using this approach, you still have a clue of which servers are having a lot of SCOM issues, as opposed to disabling the rule completely.
To create overrides for this, simply go to the authoring pane in the SCOM console, and scope to Health Service.
I would recommend changing the severity of these alert rules:
|A generic error occurred during computer verification from the discovery wizard|
|Alert on Backward Compatibility Script Errors.|
|Alert on Dropped Multi instance Performance Module|
|Alert on Dropped Power Shell Scripts|
|Alert on Failed Power Shell Scripts|
|Alert on Failure to Create PowerShell Run space for Power Shell Script|
|Replacement Failure For Suppression During Alert Creation|
|An error occurred during computer verification from the discovery wizard|
|Workflow Initialization: Failed to start a workflow that queries WMI|
|Workflow Initialization: Failed to start a workflow that queries WMI for performance data|
|Workflow Initialization: Failed to start a workflow that queries WMI for WMI events|
|Workflow Runtime: Failed to run a WMI query|
|Workflow Runtime: Failed to run a WMI query for performance data|
|Workflow Runtime: Failed to run a WMI query for WMI events|
Right click the rule you want to change the severity
Change the severity to 0 and store in your SCOM override management pack. Click OK.
If you are responsible for your SCOM environment, do not forget to check on the Operations Manager alerts, especially when you have imported a new management pack.
This wraps up this blog post, hopefully this has helped you getting those alerts under control!