Skip to content

JobExecution

stockiNail edited this page Oct 12, 2015 · 3 revisions

Process building

JEM node's goal is to execute a job, defined by its JCL, on the machine where the node is running on. Each node can execute many jobs at a time, using the node configuration property (you can update by web user interface) or with an affinity loader (if defined). To avoid to have memory leaks or other problems inside the node itself, it creates a new process every time. In this way, it's also possible to cancel the job during its execution.

http://www.pepstock.org/resources/01-05-processing.png

The job can communicate via RMI with JEM node to:

  • notify start and end for all steps and for the job itself (for JCL on java frameworks, it passes the job process-id, used to extract some resources allocation information and to cancel it). JEM node writes a summary about step execution every time a step is ended (see Outputs section).
  • request locks for resources
  • request common resources definitions (currently only for java)

Each job is composed by one or more steps and each step ends with a specific return code. The job return code (or result) is the higest steps' return code, with some exception.

Here is the table of JEM return codes:

Return Code Description
O Execution successful
1 Execution ended with errors: an exception occurred
12 When a node is crashed and a job was executing. The coordinator move the job in output queue setting return code to 12. Check the execution of job and possible inconsistent data
16 When an exception occurs during the job creation by JCLFactory, usually for some mistakes inside the JCL (syntax error during the validation phase)
222 Job execution has been cancelled by operator

Due to JEM starts a new process every time, OS can return different return codes if new process creation fails.

Shells

JEM executes a job starting new process. To do that, it must use a shell which is based on OS where JEM is running on. Java provides the feature that Process class is able to choose the right shell, based on OS, without any specification by user. But we found some little problem to redirect correctly standard output and error without specifying any shell.

For this reason JEM uses BASH shell to execute job on UNIX like system and uses the usual cmd.exe shell on Windows. Due to fact that BASH is the default shell in some OS, it's necessary that JEM is able to call it. On Unix system, with the right JCL factories configuration, it is possible to launch jobs with another user, by sudo. For more information, have a look to security page how it works.

Outputs

JEM node manages all output produced by the job, both standard error and standard output and system output streams used during the execution of single steps. For each job, JEM node creates a specific folder (in output folder) as in the following picture:

http://www.pepstock.org/resources/01-06-folders.png

The JCL file contains the code used for that job. Here's an ANT JCL file example:

<project name="JEMSTO1" default="run" basedir=".">
    <description>test</description>

    <property environment="env"></property>
    <property name="jem.job.name" value="JEMSTO1"></property>
    <property name="jem.job.environment" value="ENV-1"></property>
    <!--property name="jem.job.domain" value="domain"/-->
    <!--property name="jem.job.affinity" value="affinity1, affinity2"/-->
 
    <target name="run" description="run">
        <echo> Embed another:${user.name} </echo>
        <echo> Embed another:${env.TEMP} </echo>
        <echo> Embed another:${jem.ant.source.code.path} </echo>
    </target>

</project>

The JEM log file (called JEM.log) contains a summary of job execution from JEM perspective. The format is always the same for all jobs and looks like this:

2012 10 24 14:19:28   J E M  job log -- Node xxx.xxx.xxx.xxx
2012 10 24 14:19:28   ---- WEDNESDAY, 24 OCTOBER 2012 ----
2012 10 24 14:19:28   USERID XXXXX IS ASSIGNED TO THIS JOB
2012 10 24 14:19:28   JEMSTO1 ENVIRONMENT ENV-1 - DOMAIN *** - AFFINITY ***
2012 10 24 14:19:28   JEMSTO1 STARTED - TIME=14:19:27
2012 10 24 14:19:28   JEMSTO1 PROCESS-ID (xxxxx@xxxxxxx) - TIME=14:19:27
2012 10 24 14:19:28
2012 10 24 14:19:28   STEPNAME         RC   CPU(ms)    MEMORY(kb)       
2012 10 24 14:19:28   [init]           -    1138       57588            
2012 10 24 14:19:28   run              0    16         57588            
2012 10 24 14:19:28
2012 10 24 14:19:28   JEMSTO1 ENDED - TIME=14:19:28  - RC=0

The JOB log file (called JOB.log) contains all standard output and standard error of job execution. The information are different, based on different frameworks which can manage different JCLs. Here is the example of an ANT job:

Build sequence for target(s) "run" is [run]
Complete build sequence is [run, ]

run:
     [echo]  Embed another: XXXXX
     [echo]  Embed another:C:\Users\XXXXX\AppData\Local\Temp
     [echo]  Embed another:D:/XXXXX/JEM/resources/jcl/ant

BUILD SUCCESSFUL
Total time: 0 seconds

The sysout folders and their contents depend on JCL definition and its steps: in the JCL, you can define some data set as SYSOUT, allowing the job to write on system output file instead of on data set. For more details, see job control languages section.

Clone this wiki locally