29 November 2012

Monitoring servers in DMZ using SCOM

I know there is a lot of information on this subject out there, and I spended a lot of time reading blogs and KBs, but still I was left with unanswered questions.
So – taking some bits here and there, putting them together, done some trying, and trying - this is the steps I will have to take next time I have a server in DMZ, that should be monitored by SCOM.


At my company, we have a lot of servers in DMZ, we wanted to monitor in our SCOM (still on SCOM 2007 R2, cu5).

We have a functioning CA-server and I have a SCOM Gateway server.


First – we need a Trusted Root Certificate.

This certificate has to be imported on all involved servers (RMS, GTW, DMZ servers).

  • Browse to http://CA/certsrv
  • Download a CA certificate, certificate chain, or CRL
  • Download CA certificate chain
  • Save certnew.p7b in a folder for your certs (ie. c:\certs)

This certificate needs to be copied to the servers, and imported.

  • Open MMC with Certificates (Local Computer) snap-in
  • Import the certificate under “Trusted Root Certification Authorities”


Second – we need a certificate for every server.
  • Browse to http://CA/certsrv
  • Request a certificate
  • “Or, submit an advanced certificate request”
  • Create and submit a request to this CA
  • Name: FQDN of server
  • Type of Certificate Needed: Other…
  • OID: 1.3.6.1.5.5.7.3.1,1.3.6.1.5.5.7.3.2
  • Create new key set
  • CSP: Microsoft Enhanced Cryptographic Provider v1.0
  • Select: “Mark keys as exportable”
  • Store certificate in the local computers certificate store.
  • Friendly Name: FQDN of server
  • Submit

Now, the certificate request have to be issued.

  • On CA server, open Certification Authority console
  • In “Pending Requests”, right-click the certificate > All-Tasks > “Issue”

Save the certificate

  • Browse to http://CA/certsrv
  • View the status of a pending certificate request
  • Select your sertificate
  • Install this certificate
  • Open MMC with Certificate (Local Computer) snap-in
  • Personal > Certificates
  • right-click your certificate > All Tasks > Export
  • Yes, export the private key
  • Personal Information Exchange > Enable strong protection
  • Type a password (remember.. remember.. you will need it later)
  • Save it in your cert-folder as FQDN.pfx

Do this for all servers, and copy the file to the servers.


On the server in DMZ

Hosts – file
Can your DMZ server resolve the Gateway or RMS ?
Else – put these into the hosts file (C:\Windows\System32\Drivers\etc)

Manually install scom agent
Copy the installation files to your server (also copy MOMCertImport.exe, we will need it) and launch MOMAgent.msi
You will need to supply the name of your Management Group and the FQDN of your GTW.

MOMCertImport
Ok, you copied MOMCertImport, and the certificate file is here too, then:
MOMCertImport <path>\<certificate file>
- and here you will need your password for the certificate.

Now you need to bounch the “System Center Management”-service
and go to Pending Management in your OpsMgr console and approve.

Install CU
Go back to your DMZ server and install current cu.

Manageable
Your agent is now manually installed, which means that it won’t get updated automatically.
Fortunately, using an query on your OpsMgrDB, you can alter a bit, and in this way make the agent manageable.

29 August 2012

Windows could not start the System Center Management on Local Computer

Today, checking on my scom health, I found several grey servers.
I thought that this was just another "Stop System Center Management service - delete Health Service State - start System Center Management service"...
BUT.... no.... when I started the service, I got a pop-up:


...and the service didn't start.
hmmm....
Further investigation - and a little help from this article:
http://blogs.technet.com/b/smsandmom/archive/2008/04/30/opsmgr-2007-healthservice-service-fails-to-start-with-25362-warning.aspx
send me in this direction:
check the WindowsAccountLockDownSD key in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\.
And.. sure... the key was not present on these grey machines.
So, I found a healthy machine, exported the WindowsAccountLockDownSD key, copied the reg-file to the "defect" machines and merged it.

And... VOILA.... the service started nicely again.

BUT.... the service started nicely, but the server was still grey.
Looking into the eventlog I found an error 7005 with the following text:
The Health Service was unable to publish its public key to management group [MyMG] and will be unable to receive secure messages until this key is published. Attempts to publish the key will continue.
As long as the agent can't publish its public key it will not communicate with the SCOM management server.
It turned out that two more keys was missing in the registry.
In the following location there should be two keys with a long coded name (string of about 30 characters):
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\mymgmtgroup\SSDB\References\
If it is not there you can pick it up from another machine in the same management group and merge it.
Then restart the System Center Management service and ... Voila... the servers go green.

29 June 2012

VMPD - it's finally arrived

Finally - The System Center 2012 Visio MP Designer—VMPD— has arrived.
VMPD is an add-in for Visio 2010 Premium that allows you to visually design Management Packs for  System Center Operations Manager.

You can download it from here: http://www.microsoft.com/en-us/download/details.aspx?id=30170

I will install it right away, and can't wait to get started.

28 March 2012

Installing a new Management Server in SCOM R2


I installed a new management server in my SCOM R2 environment but got an error when it was started up.
In the Operations Manager log on the new management server, I received a HealthService 7022 Error.
7022
As I was sure I did nothing wrong during the setup, I called for Google and came up with a solution.
Microsoft KB2027535: New Management Server unable to get configuration in System Center Operations Manager 2007.
As stated in the KB article, I changed the “Default Action Account” runas profile for my new server to “Local System Action Account” and everything was running again.
After a while I would change the”Default Action Account” back to the domain account (as stated in the article).
But I got an error, and I couldn’t save the profile. I then went to properties for the Action Account – retyped the password (yes… typed in the same password !!!) and now I was able to change the runas profile for the server and save it.

31 January 2012

Health Service Store has reached its maximum...

Error
Source: Health Service ESE Store
Event ID: 623
Task Category: Transaction Manager

HealthService (5664) Health Service Store: The version store for this instance (0) has reached its maximum size of 60Mb. It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back.

Possible long-running transaction:

SessionId: 0x0000000000E020C0

Session-context: 0x00000000

Session-context ThreadId: 0x0000000000000F0C

Cleanup: 1
--------------------------------
I have installed Veeam nWorks for VMware, and on the 2 agents, used as collectors, this error came up.
Deleting the Health Service Store and restarting only helped for a short while.

Then - a little change in registry - and a restart of the service did the trick..
This is the registry updates:

1 - Update ‘Version Store Size’ (the Ops Mgr Agent queue/cache Db)
"HKLM\System\CurrentControlSet\Services\HealthService\Parameters\"Persistence Version Store Maximum".
Value should be 5120 (decimal) (equates to 80MB).

2 - Update value for ‘MaximumQueueSizeKb’ HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\
Value should be 102400 (decimal)


3 - Create DWORD value (if not exist) for State Queue Items
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters Value should be 4096 (decimal)

Cannot start "System Center Management" service

When starting the System Center Management service you get an error that says that the
"service terminated with service-specific error %%-2130771964".



After diggin' in to this, I found that it was the Healt Service store (or the cache) that was corrupted.
I renamed the Health Service State-folder (c:\program files\system center operations manager 2007),
and then I could again start the service.