I’ve recently stumbled on issue where the SDK and Configurations services couldn’t start. Let me first give a little overview of the SCOM environment where issue occurred,:
SCOM01 – RMS server, HP OpenView connector service – Active
SCOM01 – MS server, OpsMgr DB, DW DB, Report server, HP OpenView connector service – Passive
OMU01 – HP Operations Manager for Unix server
So we have two SCOM servers one is RMS and the other is MS. Both servers have HP OpenView Connector service installed but the active one is only on SCOM01. The service send SCOM alerts to HP Operations Manager for Unix server.
The issue occurred when the following two alerts arrived in SCOM:
At first I though it was something wrong with the connector service. So I logged on SCOM01 and saw that the connector service was down. I started it. Connector service was now running but still alerts were not send to OMU01. As the Connector service is using the SDK service I thought it will be good idea to restart the SDK service also. So I’ve stopped the Health service and Configuration service on SCOM01 first and than stopped the SDK service. When I tried to start the SDK service I received the error: “The service did not respond to the start or control request in a timely fashion”. As this is general message I tried to look at the event log for more information. I first looked at the Operations Manager event log but there was no trace of any error that could lead me to where the problem is. Then I wen to the System log and the only error I’ve found there was this:
First the next thing I tried is to restart the SQL service on SCOM02 and than start the SDK service and Configuration service on SCOM01. The result was the same:
Then I rebooted SCOM02 and as soon as SCOM02 was started I rebooted SCOM01. Both SDK and Configuration services couldn’t start again.
At this point I’ve remembered this article from Kevin Holman (thank you Kevin):
On the Know Issues/Troubleshooting section there is issue: “3. CU5 fails to apply. The SDK or config service may not start”. SCOM was updated to CU5 a couple of months ago and for sure I am not updating it now. But the issue seems so similar. So I open article:
and I see that this article applies to Windows 2003 and SCOM01 is Windows 2008 but in this situation I will give it a try. I open the regedit on HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control . It seems on Windows 2008 there is no ServicesPipeTimeout entry. I created it according to the instructions in the article and assign it value 180000 as Kevin is recommending. I rebooted SCOM01. When I logged to the server after the reboot SDK and Configuration services were started this time. The solution resolved this issue.
I’ve checked if Connector service is sending alerts again but this was not resolved. Alerts were not sent to OMU01. As SCOM01 was working fine and I was able to manage the server from the console I’ve excluded this time SCOM to be the problem. I’ve pinged OMU01 from SCOM01 and there was no ping. This is why alerts were not sent. Contacted network team and found out that switch was down. Later on the network issue was fixed and SCOM01 was able to send alerts to OMU01 again.