Today I want to talk about some more-or-less undocumented SCOM 2019 upgrade tips which I had noticed during my latest upgrade. It was my first real ‘in-place’ upgrade from 1807 to 2019.
It was quite an undertaking, as I had to move the databases to a supported SQL version, as well as installing new management servers, as the version that was running on the old management servers was no longer supported (2012 R2).
You could ask yourself, why not perform a side-by-side upgrade? Well, the environment was very healthy and was also quite heavily customized on the Squared Up part as well. It would take an enormous effort to migrate this side-by-side.
So, without further ado, here are some things I think you should definitely pay attention to whilst upgrading to SCOM 2019.
- When migrating your databases, make sure to check the collation of your original SQL server. Since 2012 R2 you can choose different collations than the standard SQL_Latin1_General_CP1_CI_AS. The previous consultant had installed this with a french collation. Long story short, I had to redeploy the instance and migrate the databases again as SCOM does not play nicely when the system databases collation is not the same as the OperationsManager DB’s.
- Size your TEMPDB’s sufficiently before upgrading, by default these are too small and autogrowing these is performance intensive. In general you want to avoid this happening during the upgrade.
- Make an inventory of all your paid third party management packs. These usually require additional software to be installed on the management servers themselves (NiCe MP’s, HYCU F5 etc…)
- If you have custom management packs that require certain powershell modules, make sure they are installed on the new management servers as well. (f.e. SQLServer module etc)
- If you use a proxy server, make sure the proxy exclusions are also configured for your Linux machines! This was by far the most puzzling issue I had with cross platform monitoring so far. The linux agent uses a webservice, thus, if a system wide proxy is configured, and the exclusions are not in place, all monitoring and installations will fail! Everything seems to be working well, but in fact data collection is failing and you get zero errors in eventviewer.
You will get a very vague error during the upgrade /installation of the agents, which I had described here in a forum post.
Error in question:
Unexpected DiscoveryResult.ErrorData type. Please file bug report.ErrorData: System.ArgumentNullException
Value cannot be null.
Parameter name: s
- Make sure sudo rules are in place before upgrading, so you can guarantee an easy linux upgrade experience. Also, keep into account any custom sudo rules that are in place for custom management packs or third party management packs. Don’t delete the old sudo rules from previous SCOM version before the migration is done!
List of sudo rules per version and distro can be found here
- When moving linux agents to a new pool, make sure you import and export all certificates to the new management servers as well.
- Make sure you add the new management servers on the SNMP ACL of all your network devices. Some require explicit IP addresses to allow SNMP communication. Put these in place before you migrate your network monitoring!
- In my case, I had issues with SCOM 1807 and SCOM 2019 management servers running side-by-side. The system center configuration service was no longer working on the old management servers. (minimum version issue).
- Due to the point above, I would recommend to install the new management servers with the same version of the existing ones, and first migrate all workloads to the new management servers (network devices, linux / windows agents etc).
Once that is done, decommission the old ones before upgrading! After you have decommed and upgrade your first management server to 2019, try to make the upgrade period for your other management servers as short as possible (but respect a 30 minute waiting period between each management server).
- Make sure to move any powershell scripts that are running locally on the management server to the new one. For example, the ones that are used for command channels.
- Do a manual check of performance data of agentless objects like network devices as well as linux agents to verify data collection is occurring properly after the upgrade. I noticed only quite late after the upgrade that linux monitoring was not functioning as it should.
- If you have a lot of eventid 2115 errors after upgrading and putting in place new management servers, increase the timeouts of your “C:\Program Files\Microsoft System Center\Operations Manager\Server\ConfigService.config” file on your management servers.
Specifically, under CMDB the ‘DefaultTimeoutSeconds‘ to 300, and under ConfigStore ‘DefaultTimeoutSeconds‘ to 300.
- Another reason you get 2115 errors, is usually due to a discovery running on the management servers that has a very low interval (60 secs – 5 minutes). For example, you set this temporary as a test but then forgot to delete it (been there :))
- Make sure to add any run as profile as ‘Logon as a service’ on the management servers as well as agents that need it, this is documented but cannot stress this enough.
These were more or less the things I ran into when upgrading a heavily customized SCOM environment. It was a learning experience for sure! Good luck with anyone upgrading to SCOM 2019, and as always, make sure you have a good migration plan before starting!