User Documentation

Skip to end of metadata
Go to start of metadata

The LoadLeveler Parallel Job Scheduling System

To maximize resource utilization the LoadLeveler parallel job scheduling system balances the processing needs and priority of jobs against available resourcesr. This means that when you submit a job to LoadLeveler, the order in which it executes depends on criteria such as priority, resource requirements, and special instructions. Examples of special instructions include running long jobs during off-hours, fitting short jobs in around ong ones, or giving certain users or groups priority.

Submit a Batch Job

The command and usage for submitting a job via LoadLeveler is

Obtain Queue Information

You can obtain information about each queue using the command:

Example output from entering the # llclass command is:

View Job Status

The llq command is used to obtain information on the status of your job. You can enter options with the command to view:

  • all jobs in the queue:
  • all of your own jobs:
  • details about why a job has not yet started:

The key information is located at the end of the output, and will look similar to the following:

Or it will indicate the estimated start time:

  • detailed job information

where values for the job status states are:

Canceled CA The job has been canceled as by the llcancel command.
Completed C The job has completed.
Complete Pending CP The job is completed. Some tasks are finished.
Deferred D The job will not be assigned until a specified date. The start date may have been specified by the user in the Job Command file or it may have been set by LoadLeveler because a parallel job could not obtain enough machines to run the job.
Idle I The job is being considered to run on a machine though no machine has been selected yet.
NotQueued NQ The job is not being considered to run. A job may enter this state due to an error in the command file or because LoadLeveler can not obtain information that it needs to act on the request.
Not Run NR The job will never run because a stated dependency in the Job Command file evaluated to be false.
Pending P The job is in the process of starting on one or more machines. The request to start the job has been sent but has not yet been acknowledged.
Rejected X The job did not start because there was a mismatch or requirements for your job and the resources on the target machine or because the user does not have a valid ID on the target machine.
Reject Pending XP The job is in the process of being rejected.
Removed RM The job was canceled by either LoadLeveler or the owner of the job.
Remove Pending RP The job is in the process of being removed.
Running R The job is running.
Starting ST The job is starting.
Submission Error SX The job can not start due to a submission error. Please notify the Bluedawg administration team if you encounter this error.
System Hold S The job has been put in hold by a system administrator.
System User Hold HS Both the user and a system administrator has put the job on hold.
Terminated TX The job was terminated, presumably by means beyond LoadLeveler's control. Please notify the Bluedawg administration team if you encounter this error.
User Hold H The job has been put on hold by the owner.
Vacated V The started job did not complete. The job will be scheduled again provided that the job may be reschellued.
Vacate Pending VP The job is in the process of vacating.
Cancel a Job

Use the llcancel command to cancel:

  • a specific job
  • all of your jobs
Job History and Usage Summaries

On each cluster, there is a file named /var/loadl/archive/history.archive that contains the history of all jobs run under LoadLeveler. You can query this file using the llsummary command. An example of usage would:

with the output similar to:

There are several options you can use with llsummary command, which are described in its man pages.

Check status of each node

You can check the status of each node by using the command:

The output from entering this command would be similar to:


Back to Running Jobs

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.