Unix LSAM Messages
This section covers messages generated by the LSAM which may appear in the LSAM's log and/or error file. Whether informational or to report an error condition, messages begin with text generated by the LSAM, followed by (in parentheses):
- The message ID
- The system-specific error number returned by the operating system
- The symbolic name for this error (common to the vast majority of UNIX systems)
- A brief [system-specific] description of the error returned by the operating system
Not every message will contain all the items within the parentheses. In general, if the message was not the result of failure within a system routine, then only the message ID will be included. The system-specific error number and description varies from system to system. Moreover, a system call often has multiple failure modes; consequently, the same message ID may be followed by a variety of error codes.
The following example shows a Detailed LSAM Message which might be returned for a LSAM exit code of 000010205:
[sma_lsam] (monitor) - Unable to execute [/usr/john/job_1] for [JOB1] (120 / 2 [ENOENT] - No such file or directory)
In this case, the user should check what was entered for "Start Image" on the Job Details screen.
There are currently over 500 messages which might be output, making a listing of each impractical. The following table lists the messages which address conditions that can generally be understood and/or corrected by the user without intervention of SMA Technologies Support personnel. Within the Message Text, "[x]" indicates variable data.
UNIX LSAM Messages
[x] multiply-defined
The indicated step [STARTING_STEP, ENDING_STEP, RESTART_STEP] was found to be included more than once. Check "Start Image" and "Parameters" in the job's Job Details screen.
The [x] multiply-defined message text applies only to jobs based on the SMA Technologies-supplied template job script discussed in Job Script Requirements.
Invalid job step [x]
Shell variable 'next_step' set to a non-existent job Step Label as displayed. Check all settings of 'next_step' – especially for use of incorrect letter case.
The Invalid job step [x] applies only to jobs based on the SMA Technologies-supplied template job script discussed in Job Script Requirements.
Infinite loop detected
The job experienced 'number_of_steps' iterations and was terminated. Check the following:
- Starting or Restart Step comes after Ending step in script
- Step gets repeated, possibly via intervening steps
- Job logic correct but 'number_of_steps' set too small
The Infinite loop detected applies only to jobs based on the SMA Technologies-supplied template job script discussed in Job Script Requirements.
Unable to change access for [x]
Access permissions for the indicated job stdout/dtderr output file could not be restricted per the LSAM Configuration setting.
Unable to change ownership for [x]
Ownership of the indicated job stdout/dtderr output file could not be changed from 'root/root' per the LSAM Configuration setting.
Process [x] did not start
The LSAM health monitor has determined that the indicated LSAM process did not start (or started but them immediately terminated). Stop the LSAM. If the cause for the lost process cannot be determined from other error messages, corrected, and the LSAM successfully re-started, then wait five minutes and re-start the LSAM. It should then start and run properly if the cause was of a momentary nature.
Process [x] has terminated
The LSAM health monitor has determined that the indicated LSAM process abnormally terminated. Stop the LSAM. If the cause for the lost process cannot be determined from other error messages, corrected, and the LSAM successfully re-started, then wait five minutes and re-start the LSAM. It should then start and run properly if the cause was of a momentary nature.
Sending SIGUSR1 (die) to all LSAM processes
The "bin/lsam<SAM_Socket> stop"
command was given; this message is confirmation that the LSAM is in normal termination processing.
Unable to find [x]
"[x]" cannot be located for processing.
Unable to create [x]
"[x]" cannot be created for processing.
Unable to open [x]
"[x]" cannot be opened for processing.
Unable to read [x]
Data cannot be read from "[x]".
Unable to write [x]
Data cannot be written to "[x]".
Deleting [x]
"[x]" was deleted. This is normally just an informational message, and not indicative of a problem unless "[x]" is later referenced in an "Unable to ..." message.
Unable to connect to [x]
A socket-based connection cannot be made to the indicated remote service or system. Determine if the indicated system or service is up-and-running, and/or if the network is operational, and correct the condition.
Unable to send [x1] to [x2]
A message "[x1]" cannot be sent over the socket-based connection to the indicated remote service or system ("[x2]"). Determine if the indicated system or service is up-and-running, and/or if the network is operational, and correct the condition.
LSAM health monitor operating
Informational message to confirm that the LSAM started with the LSAM health monitor in proper operation.
No LSAM health monitor
Informational message to confirm that the LSAM started without the LSAM health monitor in proper operation. Execute the LSAM Configuration program to confirm that LSAM health monitoring has been disabled.
LSAM health monitor disabled
The LSAM health monitor has ceased operation. This is normal after issuance of the "LSAM stop" command.
Sender not allowed by the LSAM configuration
The portion of the LSAM which communicates with the SAM has received a message over a socket-based connection from an address not configured to be acceptable. The received message was ignored. It may be necessary to run the LSAM Configuration program to update the LSAM's configuration.
Received unauthorized message
Some portion of the LSAM has received a message over a socket-based connection from an unexpected address. The received message was ignored. It may be necessary to run the LSAM Configuration program to update the LSAM's configuration.
Received a message with a blank machine name
The portion of the LSAM which communicates with the SAM has received a message without the required machine name which identifies to the LSAM the intended recipient for the message. Mark the LSAM down in the Enterprise Manager and stop the LSAM. Wait five minutes, then re-start the LSAM and mark it back up in the Enterprise Manager.
Received a message from SAM with invalid action specified
The portion of the LSAM which communicates with the SAM has received a message with an invalid action specified. Mark the LSAM down in the Enterprise Manager and stop the LSAM. Wait five minutes, then re-start the LSAM and mark it back up in the Enterprise Manager.
Received a duplicate message from SAM
The LSAM has received a duplicate message from SAM. If this happens only occasionally, it indicates timing issues which sometimes arise in distributed processing systems like OpCon, and is not indicative of a problem. If it happens in a series over a few minutes, it indicates an acute problem with the SAM or the intervening network hardware.
Missing environment variable [x]
The indicated UNIX shell environment variable is not defined. The variable may be either expected to be defined at the system level, or it may be an LSAM-defined variable which is not getting passed around as required.
Environment variable [x] is not defined
The indicated UNIX shell environment variable is not defined. The variable may be either expected to be defined at the system level, or it may be an LSAM-defined variable which is not getting passed around as required.
Received a SIGUSR1
Each process in the LSAM should output this message during normal termination of the LSAM. If the "bin/lsam<SAM_Socket> stop
" command was not given, then some process or user on the system issued a "kill" command on the indicated process.
Received order to stop LSAM Health Monitor This message is output during normal termination processing.
[x] Operational
Each process in the LSAM should output this message during normal start-up of the LSAM.
Too many arguments for [x]
The addition of the text for "Start Image" and "Parameters" in the Enterprise Manager resulted in > 100 arguments. An "argument" is any text but a space. For example, the Start Image/Parameters combination "/usr/john/job x y abc123 1", the start image is "/usr/john/job", and the four arguments are "x", "y", "abc123", and "1". (It is possible to include parameters in the "Start Image" text box in the Enterprise Manager.)
Blank start image for [x]
The indicated job contained no start image, i.e., script or program to execute. Check the Job Details screen in the Enterprise Manager.
Job [x] contained no UID
The indicated job had no User ID.
Job [x] contained no GID
The indicated job had no Group ID.
Job [x1] contained an invalid UID [x2]
The indicate User ID for the indicated job was either non-existent or incorrectly formatted.
Job [x1] contained an invalid GID [x2]
The indicate Group ID for the indicated job was either non-existent or incorrectly formatted.
Not allowed to start job [x] as 'root'
The LSAM is configured to not allow jobs to execute with 'root' privileges. This can be changed by running the LSAM Configuration program.
Unable to access start image [x1] for [x2]
The indicated Start Image could not be executed for the indicated job. The parenthesized portion of the message provides additional details, on which corrective action can be based.
Unable to execute [x1] for [x2]
The indicated item could not be executed for the indicated job. The parenthesized portion of the message provides additional details, on which corrective action can be based.
Unable to exec() [x1] to start [x2]
The indicated Start Image could not be executed for the indicated job. The parenthesized portion of the message provides additional details, on which corrective action can be based. If the LSAM determines that this was caused by an invalid reference to a shell/interpreter, e.g., the first line of the job script reads, "!#/bin/bogus", an additional message will be generated to indicate this possibility.
Unable to 'cd' to $HOME for [x]
LSAM configuration parameter require_HOME_directory is enabled and the indicated job cannot be executed within the associated user's HOME directory. The most likely cause is an error defining the HOME directory when the user was added to the system, or that the HOME directory was either never created or it was deleted.
Received TX0 for pid = [x] - [x]
This is an informational message output to confirm receipt of a command from SAM to terminate the indicated job after the user executed the "Kill Job" command from the Enterprise Manager. This does not indicate that the job actually was terminated; that event/inability will be confirmed in a subsequent message.
Can't fork() for [x] - decrementing max jobs to [x]
The LSAM is unable to create a process in which to run the indicated job, and is decreasing the number of jobs it will attempt to simultaneously run in an effort to preclude further fork() errors. If this is a recurring problem, the LSAM Configuration program can be executed to lower the number of allowed jobs and/or the OS may need tuning to handle the load.
Job [x] completed
Informational message upon proper termination of an OpCon job – doesn't reflect the OK/Failed termination status.
Data has not arrived in [x] seconds - closing socket
The LSAM has not received any communication from the SAM in the indicated amount of time. The LSAM will close the connection in preparation for SAM initiating a new connection. "[x]" can be set with the LSAM Configuration program.