Skip to main content

USER_DEFINED Section

Theme: Configure
Who Is It For? System Administrator

What is it?

Reference for the SMA_RM USER_DEFINED section, which allows custom scripts or programs to report normal/alarm status and return loggable values during each scan cycle.

  • Use a <user_defined> section when you need to monitor a resource that SMA_RM's built-in disk and process monitoring cannot address — for example, the number of files in a directory, the number of records in a database, or the number of users currently logged in to the system.
  • Use a <user_defined> section when your monitoring logic requires a custom script or program to determine the normal/alarm status of a resource, and you want SMA_RM to send an OpCon event to SAM or run a local action when that status changes.
  • Use multiple <user_defined> sections with the same <alarm_group> and ascending <alarm_level> values when you need graduated responses to the same custom resource — for example, sending a warning event at one threshold and an escalation event at a higher threshold — using the same alarm-level processing that applies to disk and process monitors.
  • Use a <user_defined> section with a <window> entity when user-defined monitoring should be active only during a specific time of day.

User-defined monitors are specified via one or more <user_defined> sections, per the following template:


<user_defined>

<alarm_group>____</alarm_group>

<alarm_level>____</alarm_level>

<log_name>___</log_name>

<window>____</window>

<start_image>____</start_image>

<log>NONE | SCANS | EVENTS</log>

<outputs>_x_ _y_ ...</outputs>

<alarm>>

<event>____</event>

<action>____</action>

<sleep>____</sleep>

</alarm>

<normal>

<event>____</event>

<action>____</action>

<sleep>____</sleep>

</normal>

</user_defined>

All entities within <user_defined> are optional except <log_name> and <start_image>. Duplicate entities constitute an error.

<alarm_group> and <alarm_level> are discussed in the "Multiple Alarm Levels" section. Either both must be specified or neither may be specified.

<log_name> is the identifier to be associated with this user-defined monitor, is limited to 64 characters (alpha, numeric, '-' and '_'), and should be unique across multiple <user_defined> sections.

<start_image> is the complete pathname of a Shell script or program, along with any required start parameters, to run and produce values to be logged and to determine if an alarm condition exists. The time stamp for the current scan cycle will be appended to <start_image> to run the monitor. Its format is as returned by the standard UNIX 'date' command, e.g., "Tue Aug 14 11:50:07 CDT 2016". <window> defines when the specification is to be active. An example follows in a couple paragraphs.

<outputs> define the values to be generated by <start_image>. "_x_" and "_y_" serve as place-holders in the template; you are free to choose any name desired. As many as required can be defined. Value names are limited to 64 characters (alpha, numeric, '-' and '_') and need not be unique across multiple <user_defined> sections.

<start_image> runs synchronously, i.e., SMA_RM will wait for it to complete. It must produce, to standard output (stdout), an indicator of status and a listing of its return values as specified in <outputs>. The first line of output must be either ALARM or NORMAL. Each additional line of output, one for each generated value, is of the form "VALUE_NAME=value" (no quotes). "VALUE_NAME" is one of the names in the associated <outputs> spec, and must appear in the order listed within the <outputs>. Spaces may be used around the '='. "value" constitutes everything from the first non-space/tab after the '=' to end-of-line (including trailing spaces/tabs). To include leading spaces/tabs in "value", change the '=' to '=' (no embedded spaces), in which case "value" extends from the character immediately following the '' to end-of-line.

Upon completion of <start_image>, SMA_RM will inspect the returned standard output and log returned values as directed by the <log> spec. If, from one scan to the next, the first line of output changes from NORMAL to ALARM (or is initially ALARM), <alarm> defines what is to occur (as discussed in the next section). If the first line changes from ALARM to NORMAL, then <normal> runs. Along with %LOG_NAME%, which is the contents of <log_name>, the returned values for the defined output variables will be available as event variables for use within <alarm> and <normal>.

For an example of a user-defined monitor, consider the following spec:


<user_defined>

<log_name>checker</log_name>

<start_image>/usr/me/check this that "the other"</start_image>

<log>SCANS</log>

<outputs>hokus pokus rokus disk</outputs>

<alarm>

<event>$%log_name% -- hokus=%HOKUS%, pokus=%POKUS%</event>

</alarm>

</user_defined>

When the timestamp for the current scan is included, the monitor would run with a command line similar to:

/usr/me/check this that "the other" "Tue Aug 14 11:50:07 CDT 2016"

If 'check' is a UNIX Shell script, it would see five command-line arguments:


$0 = /usr/me/check

$1 = this

$2 = that

$3 = the other

$4 = Tue Aug 14 11:50:07 CDT 2016

And it might return, via standard output, something like:


ALARM

HOKUS = 25

POKUS =\ not so fast!!!

ROKUS = does not include preceding spaces!!!

DISK=Just tossing a monkey wrench into the system...

%HOKUS%, %POKUS%, %ROKUS% and %DISK% would get each logged, without regard to the alarm/normal status, since <log> is set to SCANS. Then, assuming that two spaces appear after "25", and that this is the initial execution (the first line of output is ALARM), the event specified in <alarm> would get sent to SAM, i.e.:

$checker -- hokus=25 , pokus= not so fast!!!

Notice that since no <normal> is defined, nothing will occur when the first line of output changes to NORMAL.

The following is an example of a script for a user-defined monitor to report the number of files in the specified directory and to base the reported normal/alarm status on that number being less than a specified value:


#!/bin/sh

if [ $2 -eq 80 ]

then

rm -f lll 2>&-

ls -l $1|wc -l >lll

fi

n=`cat lll`

if [ $n -ge $2 ]

then

echo ALARM

else

echo NORMAL

fi

echo directory=$1

echo number_of_files=$n

Start parameter $1 is the directory to be sized, and $2 is the alarm boundary. The significance of the '80' in line 2 is that 80 is the first level alarm (refer to Multiple Alarm Levels). If used in a multi-level alarm set-up, higher-level alarms must alarm at values greater than 80, i.e., the first reference to this script in a <user_defined> must have '80' as the second start parameter within entity <start_image>. The output variables would be defined with an "<outputs>directory number_of_files</outputs>" entity within the <user_defined>.

Examples

The following example uses the directory-sizing script described above to monitor the number of files in /usr/spool/data. SMA_RM runs the script during each scan cycle, passing the directory path and the alarm boundary of 80. If the script returns ALARM (80 or more files present), the event defined in <alarm> is sent to SAM. When the file count drops back below 80, the <normal> specification sends an all-clear event. The <log> entity is set to EVENTS so that the directory and file-count values are logged only when the alarm status changes.


<user_defined>

<log_name>spool-file-count</log_name>

<start_image>/usr/local/monitors/count_files /usr/spool/data 80</start_image>

<log>EVENTS</log>

<outputs>directory number_of_files</outputs>

<alarm>

<event>$CONSOLE:DISPLAY,<%DIRECTORY%> has %NUMBER_OF_FILES% files -- ACTION REQUIRED></event>

</alarm>

<normal>

<event>$CONSOLE:DISPLAY,<%DIRECTORY%> file count normal: %NUMBER_OF_FILES% files></event>

</normal>

</user_defined>