The start off point is a sql query i got from Kevin Hollman that returns server names of all unhealthy servers thats not pingeable and that does not report back into the SCOM environment
SELECT bme.DisplayName FROM state AS s, BaseManagedEntity AS bme
WHERE s.basemanagedentityid = bme.basemanagedentityid
AND s.monitorid IN (SELECT MonitorId FROM Monitor
WHERE MonitorName = ‘Microsoft.SystemCenter.HealthService.ComputerDown’)
AND s.Healthstate = ‘3’ AND bme.IsDeleted = ‘0’
ORDER BY s.Lastmodified DESC
Then it does a status check for each computer on the network to see if the computer is accessible. Based on that result it either does a WMI query to see if the server is not available, then if the server has not communicated in the last 7 days to the SCOM infrastructure it kicks off a powershell script to remove the server from the SCOM infrastructured – clean-up.
If the server is available it then checks the status of the HealthService. If the service is stopped it starts the service. If the service is running it stops the service, clear the healthservice files and starts the service up.
If the service is not installed it then executes the MSI package and installs the agent on the server. AD then configures the agent to the correct management server.
There is still some gaps in this workflow but for a start this is sufficient for now.