Server Health State Overview

Here is another report showing the user at a glimpse what the health state are of the servers in the infrastructure.

The report shows General Server health including server up time, disk below 10% free space and some of the critical windows services.

The second part of the report shows hard ware health of the servers obtained from the different hardware management packs ie: Dell and HP.

SCOM 2012 – SCOM Agent Health Checks Part 1 of 4

In SCOM there is several monitors that monitor agent health and alert accordingly. This series of blogs will aim in providing in-depth details of the functioning of these monitors.

For a start I’ve decided to blog about two of the most intriguing monitors for agent health.

They are

1. Event Collection Health.

2. Performance data collection health.

The monitor target for both these monitors are the “Health Service Watcher” and they are living in the “System Center Core Monitoring” management pack.

These monitors are disabled by default

These monitors has now Diagnostic or Recovery actions linked to it.

Screenshot

Parameters of interest for these monitors are

1. Max Event Age in Hours – Any event/performance counter older than this number of hours is deemed old data and makes the monitor red. ($Config/MaxEventAgeHR$ in the below queries)

2. Query Timeout – Query timeout of the SQL query used

3. Watch Period in Hours – This is the timespan that the query uses to gather event details for ($Config/WatchPeriodHR$ in the below queries used by the OLEDB)

4. Interval – This is how often the monitor performs it check

Performance data collection health checks for any performance data that exists in the OperationsManager database for each and every agent for which this monitor is enabled.

The monitor makes use of a OLEDB datasource with the following SQL query

select CAST(ME.Path as nvarchar(255)), CAST(Max(TimeSampled) As nvarchar(50)) As ‘LastSample’, CASE WHEN Isnull(MAX(TimeSampled),’01-01-80′) < DateAdd(hh,-$Config/MaxPerfAgeHr$,getutcdate()) Then ‘KO’ Else ‘OK’ END from dbo.ManagedEntityGenericView ME inner join dbo.ManagedTypeView MT on ME.MonitoringClassId=MT.Id AND MT.Name = ‘Microsoft.SystemCenter.HealthService’ left join dbo.PerformanceCounterView C on ME.Id = C.ManagedEntityId left join dbo.PerformanceDataAllView P on C.PerformanceSourceInternalId=P.PerformanceSourceInternalId and P.TimeSampled > dateadd(hh,-$Config/WatchPeriodHr$,getutcdate()) where ME.IsDeleted=0 group by ME.Path

This query returns the date of the last performance counter received per object of class “Microsoft.SystemCenter.HealthService”, with a status of either “KO” if there is a problem or “OK” if performance data was received in the specified time.

Event collection health

This monitor also utilizes a OLEDB datasource with the following SQL query

select #T.Path, CAST(MAX(LastTime) as nvarchar(50)) As ‘LastEvent’, CASE WHEN Isnull(MAX(LastTime),’01-01-80′) < DateAdd(hh,-$Config/MaxEventAgeHr$,getutcdate()) Then ‘KO’ Else ‘OK’ END As ‘Status’ From ( Select CAST(ME.Path as nvarchar(255)) As [Path], CASE WHEN IsNull([Path], ”)=” THEN ” WHEN CHARINDEX(‘.’, [Path]) = 0 Then [Path] ELSE SUBSTRING(Path,1,CHARINDEX(‘.’, Path)-1) END As ‘Netbios’ From dbo.ManagedEntityGenericView ME Inner join dbo.ManagedTypeView MT on ME.MonitoringClassId=MT.Id AND MT.Name = ‘Microsoft.SystemCenter.HealthService’ where IsDeleted=0 ) As #T left join ( select distinct LoggingComputer, MAX(TimeGenerated) As ‘LastTime’ from dbo.EventView where TimeGenerated > dateadd(hh,-$Config/WatchPeriodHr$,getutcdate()) group by LoggingComputer ) As #E on #E.LoggingComputer = #T.Path or #E.LoggingComputer=#T.[Netbios] group by Path

The query returns the date and time of the last event received and again when the last event was received inside specified date/time the status is “OK” if not the status returned is “KO”

Extract from the “Microsoft.SystemCenter.2007” management (System Center Core Monitoring friendlyname)

Example of the alert as it appears in the SCOM console

With these two monitors you can monitor and get alerted on any performance and event collection problems experienced by the SCOM agents, it’s really bad come the end of the month and you are unable to run availability or performance reports or the reports does not contain all the servers because the servers has stopped sending performance and event data through several weeks earlier.

By enabling these monitors the SCOM administrator can be proactively notified of agent problems and fix them.

SCOM 2012 Product Connector for Alert Enrichment

I have a requirement at one of my clients to perform alert enrichment on the SCOM alerts before the alerts are forwarded into BMC BPPM.

The alert enrichment is a simple thing whereby the customfield 2 of the SCOM alert must be populated with instructions to the 24/7 service desk on what must be done with the alert, ie: who must be notified/contacted for the alerts.

I’ve decided to implement a fully fledge SCOM 2012 product connector and to call this connector “CustomField2”, this connector will “listen” for any new SCOM alerts and then using the NetbiosName, AlertName, MonitoringObjectName of the SCOM alert, reference a SQL table with these fields to get the CustomField2 contents – “AlertText” in the table.

The structure of the SQL table below

The product connector does 7 checks against the SQL table.

1. Reads the table and look for entries where the netbiosname and alertname and monitoringobject has a match

Example: ServerX, Disk Free Space and C: drive

2. Reads the table and look for entries where the netbiosname and alertname has a match

Example: ServerX and Disk Free Space

3. Reads the table and look for entries where netbiosname has a match

Example: ServerX

4. Reads the table and look for entries where alertname has a match

Example: Disk Free Space

5. Reads the table and look for entries where the netbiosname and monitoringobject has a match

Example: ServerX and C:

6. Reads the table and look for entries where the monitoringobject and alertname has a match

Example: C: and Disk Free Space

7. Reads the table and look for entries where the monitoringobject has a match

C:

Having different checks allows for a very granular setup and alert enrichment to take place.

The product connector checks every 8 seconds for new unacknowledged alerts and where the CustomField10 <> “Processed” and process only these alerts.

The product connector also updates the Customfield10 with “Processed” and updates the alert history with “Alert updated with custom fields”

If there is no match in the 7 checks the CustomField10 is updated with “Processed” and the alert history is updated with “no record found for customfield2”

The product connector also writes out events in a custom event log called “AmmendSCOM”, these events is used for troubleshooting and tracking of what the connector is doing

Screenshot of an updated SCOM alert with the customfields populated

History tab of the SCOM Alert

Screenshot of unsuccessful update

Screenshot of what this looks like in the Active Alerts view

For the alert to be picked up by the BMC connector the Customfield2 connector sets the connectorid for the updated alert equal to the id of the BMC connector and also updates the Resolution State of the alert to the BMC BPPM connector resolution state.

When the alert arrives and the BMC BPPM connector the Customfield2 field and the SCOM alert description is then concatenated into the BPPM Event Message.

The 24/7 service desk then knows who should be contacted by referencing a call out portal with the team name as per the BPPM message.

PM me for more details about the connector.

REST API – Forward SCOM alerts into ManageEngine ServiceDesk Plus (IT Assist)

I had a requirement from one of my clients to forward SCOM alerts to ManageEngine ServiceDesk(ITAssist)Plus

I’m using the powershell “Invoke-RestMethod” to call the URL and pass the different values to the URL for the Incident to be logged into ServiceDesk Plus.

Here is the complete script. The script is scheduled in Task Scheduler on the Windows server to run every 130 seconds and look for any new alerts in the last 125 seconds

For each alert that is found the script calls the “new-helprequest” function that creates the URI for ServiceDesk.

#param($alertid)

function new-helprequest{

param(

[string]$technician=”<tech name of user in ServiceDesk>”,

[string]$requester=”SCOM”,

[string]$subject=”<replace this with the alert name or any generic subject>”,

[string][Parameter(Mandatory=$true)]$description,

[string]$category=”<Category in ServiceDesk>”,

[string]$subcategory=”<Subcategory in ServiceDesk>”,

[string]$mode=”Web Form”,

[string]$impact=”Affects User <From Service Desk>”,

[string]$urgency=”Low”,

[string]$priority=”Low”,

[string]$asset=”Low”,

[string]$site=”<Site information in Service Desk>”,

[string]$group=”<Group in Service Desk for assignment of incident>”,

[string]$requesttemplate=”<The template in ServiceDesk>”,

[string]$extension=”<extension number of technician in ServiceDesk”)

#this is an xml template to generate new SDP requests

$requestXML=[xml](“<Operation>

<Details>

<technician></technician>

<requester></requester>

<subject></subject>

<description></description>

<mode></mode>

<category></category>

<subcategory></subcategory>

<impact></impact>

<urgency></urgency>

<priority></priority>

<site></site>

<asset></asset>

<group></group>

<requesttemplate></requesttemplate>

<extension></extension>

</Details>

</Operation>”)

$url = “<>” #url to your helpdesk server

$api = “<>” #api path, which is pretty much the same for SDP 8+

$operation = “ADD_REQUEST” #the operation

$apiKey = “<Api of the technician account defined above>” #the api key you generated from one of your technician accounts.

#Configure the parameters for the xml with the content from the command parameters

$requestXML.operation.details.technician= $technician

$requestXML.operation.details.requester = $requester

$requestXML.operation.details.subject = $subject

$requestXML.operation.details.description = $description

$requestXML.operation.details.category = $category

$requestXML.operation.details.mode = $mode

$requestXML.operation.details.subcategory = $subcategory

$requestXML.operation.details.impact = $impact

$requestXML.operation.details.urgency = $urgency

$requestXML.operation.details.priority = $priority

$requestXML.operation.details.site = $site

$requestXML.operation.details.group = $group

$requestXML.operation.details.asset = $asset

$requestXML.operation.details.requesttemplate = $requesttemplate

$requestXML.operation.details.extension = $extension

$uri = $url + $api + “OPERATION_NAME=” + $operation + “&TECHNICIAN_KEY=” + $apiKey + “&INPUT_DATA=” + $requestXML.InnerXml #assemble a URI. SDP expects a paramatized URI method to generate requests.

#write-host $uri

Invoke-RestMethod -Method post -Uri $uri

}

#Creates a text file for troubleshooting

add-content -Path c:servicedeskoutput.txt -Value “Started the run”

#$alertid=$alertid.tostring()

#Import-Module OperationsManager

add-pssnapin “Microsoft.EnterpriseManagement.OperationsManager.Client”

new-managementgroupconnection -ConnectionString:<MSServerName>

set-location “OperationsManagerMonitoring::”

#$alert = Get-SCOMAlert -Id $alertid

$todaydate = get-date

add-content c:servicedeskoutput.txt $todaydate

#The criteria below is specific to my requirement where I’m looking for a devices with specific words in the Path of the object

#You can change the criteria for your application

$alerts = get-alert | where {$_.Timeraised -ge $todaydate.addminutes(-125) -and $_.MonitoringObjectPath -like ‘*<domainname>*’ -and $_.ResolutionState -ne 255}

foreach ($alert in $alerts)

{

add-content c:servicedeskoutput.txt $alert.description

#$alert = Get-Alert -Id $alertid

$alertdescription =$alert.description

$alertname = $alert.name

$alertseverity = $alert.severity

$alertpriority = $alert.priority

#The steps below maps the SCOM alert severity to the severities in ServiceDesk Plus

if ($alertseverity -eq “Error”) { $alertseverity = “Critical” }

if ($alertseverity -eq “Warning”) { $alertseverity = “High” }

#write-host $alertseverity

$computername = $alert.NetBiosComputerName

if (!$computername)

{

$computername=$alert.MonitoringObjectDisplayName

}

else

{

$computername=$alert.PrincipalName

}

$computernamefqdn = $computername

$where = $computername.indexof(“.”)

$computername=$computername.substring(0,$where)

[xml] $a = new-helprequest -requester “SCOM” -subject “$computername – $alertname” -asset $computernamefqdn -urgency “$alertseverity” -priority “$alertpriority” -description “$alertdescription” -group “<ServiceDesk GroupName>” -requesttemplate “Default Request”

}

Screenshot of the Requests as they appear in ServiceDesk

 

Export Effective Configuration of SCOM Objects

Viewing Effective Configuration of the monitoring objects in a SCOM environment has always been a bit of a challenge.

There is tools available including powershell commands to perform the extract for you, there is even a SCOM dashboard view that will show you the effective configuration of the object/class.

I’ve created a .Net application front-end where the user can select the Classes in his SCOM environment, the utility then displays all the objects for the chosen class.

The user can then select the specific objects for which the Effective Configuration export is needed.

Screen shot of the application

Explanation of some the fields

“File Name: “ This is the directory where the csv files for each object in the class or objects selected will be placed

“Waiting Time: “ This is the milliseconds the utility will wait between batches.

“Simultaneous Process: “ This is the number of objects targeted in parallel to export their effective configuration.

15 Simultaneous process will be kicked off and the utility will then wait for 300,000 seconds (5 minutes), after the 5 minutes the utility will launch additional process to get to 15 total again.

When you run the utility you can change the values above for your specific environment, and I would advise you do that, each thread uses considerable memory, CPU resources on the server where you run the utility. Each thread is also a separate connection to your SCOM environment.

Example of the output csv file

Expansion of the configuration items for the D: disk in above screenshot for the M: Logical Disk Free Space monitor (R: is for SCOM rules and M: is for SCOM monitors)

Directory contents for all the csv files:

The file name of the csv file is the SCOM object FullName with underscores “_” replacing all other characters for example “;” and “”.

If you choose an object (For example and object of “Microsoft.Windows.Server.Computer” class) that has related/child objects, the utility exports all related/child objects and corresponding monitors and rules as well.

Screenshot showing the tree-like structure of the output csv file.

So there you have it, a csv file per object containing the effective configuration for the rules and monitors.

I’ve timed this utility on a huge SCOM environment exporting 4432 “Microsoft.Windows.Server.Computer” class objects, the utility ran for 11 Hours and 25 minutes.