Skip to content

ConfiguringJEMCluster

stockiNail edited this page Jul 5, 2016 · 6 revisions

Configuration

The configuration folder of the environment is present in the persistence folder of the GFS (global file system) contains all configuration relative to a specific JEM environment (cluster).

The files are the following:

  • emailTemplate.xml is the template to use in conjunction with the EmailEndJobListener
  • jem-env.xml is the common configuration relative to each node of the cluster, we will see in detail each part of this configuration file
  • jem-env-hazelcast.xml is Hazelcast configuration file
  • customResourcesConfiguration.xml is the configuration file used to handle custom resources
  • datasetsRules.xml is the configuration file used to configure the multi data paths *log4j.xml_is Apache Log4J XML configuration file that will be used by all cluster node. In this way with a single file you will be able to modify the log policy for the entire environment (cluster).

See the details for configuration files.

jem-env.xml

JEM environment configuration file is an XML file which contains the following information:

  • Database connection information
  • Node plugin
  • JCL factories
  • Job life-cycle listeners

Here is a configuration file sample:

<configuration>

     <database>
        <driver>com.mysql.jdbc.Driver</driver>
        <url>jdbc:mysql://192.168.127.27:3306/JEM</url>
        <user>jem</user>
        <password>ASKJDPouisadalseioqwe</password>
        <properties>
            <property name="key" value="value">
            </property>
        </properties>
    </database>

    <!--node classname="org.pepstock.jem.grs.GrsNode">
        <properties/>
    </node-->
 
    <factories>
        <factory classname="org.pepstock.jem.ant.AntFactory">
            <properties>
                <property name="key" value="value"></property>
            </properties>
        </factory>
        <factory classname="org.pepstock.jem.springbatch.SpringBatchFactory">
            <properties>
                <property name="key" value="value"></property>
            </properties>
        </factory>
    </factories>
 
    <statistics-manager/>     
    
    <listeners>
        <listener classname="org.pepstock.jem.node.events.DefaultJobChangeStatusListener">
            <properties>
                <property name="key" value="value"></property>
            </properties>
        </listener>
    </listeners>
 
</configuration>                          

Node implementation

JEM provides a standard node implementation which is not able to manage locks.

Everybody can implement own node extending the class org.pepstock.jem.node.NodeInfo and overriding methods. It's possible to provide a set of properties to configure correctly the node implementation, by properties tag. The GRS node is a special node which is able to manage the access to the resources (like files) from different jobs in serialized way, avoiding concurrent access on the same resources.

Database connection information

It's mandatory to configure the persistence to a relational database. You do this by indicating the database property inside the jem node configuration. JEM support MySQL, Oracle, DB2 and many others and can be extended for any kind of relational database. The database is used to make persistent the following:

  • Pre Input queue
  • Input queue
  • Output queue
  • Routing queue
  • Roles with users and permissions
  • Resources
  • Swarm configuration
  • User preferences

A database is shared among cluster's nodes. Each node of the cluster must then have the same database configuration.

JCL factories

JEM can manage different job control languages via JCL factories. A Factory must implement org.pepstock.jem.factories.JemFactory interface, which defines methods to validate the JCL, create the job tasks to be executed and return the tag for the language type.

public interface JemFactory extends JclFactory, JobTaskFactory, Serializable {
 
	/**
	 * Called to initialize the factory. A set of properties are passed, or a
	 * empty collection if the properties are not defined
	 */
	void init(Properties properties) throws JemException;

	/**
	 * Uses to identify the type of job control language. This is the unique key
	 * related to the factory loading it during the startup phase.
	 */
	String getType();
	
	/**
	 * Uses to describe the type of job control language.
	 */
	String getTypeDescription();
	
	/**
	 * Returns all properties passed as argument on initialization.
	 */
	Properties getProperties();
}

The factory is loaded during JEM node startup, and it can be configured passing all necessary properties (by init method).

JemFactory interface implements other 2 interfaces, org.pepstock.jem.factories.JclFactory and org.pepstock.jem.factories.JobTaskFactory.

public interface JclFactory extends Serializable {
 
	/**
	 * Called to create a jcl object, by the string representing source code. It
	 * should validate the language of control job and throws an exception when
	 * the syntax of JCL is not correct
	 */
	Jcl createJcl(String content, List<String> inputArguments) throws JclFactoryException;
}

JclFactory interface is called for JCL syntax check and to extract all properties needed for execution (execution environment, hold status, input queue priority, memory needed). For more details see JCL section.

public interface JobTaskFactory extends Serializable {
 
	/**
	 * Called to create a job task object, by a job object. It creates command
	 * line and prepares the environment to execute the job
	 */
	JobTask createJobTask(Job job);
	
	/**
	 * Called to pass the classpath put in the factory, when a specific classloader is used.
	 */
	void setClassPath(List<String> classpath);
	
	/**
	 * Called to return the classpath put in the factory, when a specific classloader is used.
	 */
	List<String> getClassPath();
}

JobTaskFactory interface is called to create the process which will execute the job by running its JCL. It must define the command to execute its steps, passing all necessary arguments.

JEM is supporting 6 languages out of the box:

Furthermore JEM allows you to use a script lanaguage as JCL, using a generic JCL script factory.

Listeners

Each JEM Cluster's node, as well as any other Client, can be notified about the life-cycle of a job. A typical use-case is the job submission via the Web application, which is a Client: in this case, the User can watch the execution of its job from the Web GUI. Note that this is a form of internal notification, not related to the external communication about the job state (done via topic).

The notification is done by implementing the org.pepstock.jem.node.events.JobLifecycleListener interface:

public interface JobLifecycleListener extends EventListener {
 
        /**
         * Called to initialize the listener. A set of properties are passed, or a
         * empty collection if the properties are not defined
         */
        public void init(Properties properties);
 
        /**
         * Called when a job is moved in INPUT queue
         */
        public void queued(Job job);
 
        /**
         * Called before a job is executing
         */
        public void running(Job job);
 
        /**
         * Called after a job is ended
         */
        public void ended(Job job);
}

Job life-cycle listener is loaded into JEM node: directives for loading it, as well as its configuration, are stored in JEM node configuration file.

AS an example, JEM provides an out-of-the-box listener that sends an e-mail when a job is ended. To configure it, you must add the following lines inside JEM node configuration file:

<listeners>
    <listener classname="org.pepstock.jem.node.events.EmailEndJobListener">
        <properties>
            <property name="jem.emailServer" value="value"></property>
            <property name="jem.bounceAddress" value="value"></property>
            <property name="jem.emailTemplateFile" value="value"></property>
            <property name="jem.smtpPort" value="value"></property>
            <property name="jem.isSSLProtocol" value="value"></property>
            <property name="jem.isTLSProtocol" value="value"></property>
        </properties>
    </listener>
</listeners>

All these properties must be used to configure and prepare the e-mail engine:

  • jem.emailServer: (mandatory)the e-mail server used to send e-mails
  • jem.emailTemplateFile: (mandatory) path of the xml template file for the e-mail to send
  • jem.bounceAddress: (optional) bounce address used as target e-mail address for the e-mails that cannot be sent for whatever reason
  • jem.smtpPort: (optional) SMTP port to use
  • jem.isSSLProtocol: (optional) indicates whether SSL protocol is used. Default is false
  • jem.isTSLProtocol: (optional) indicates whether TSL protocol is used. Default is false

Here is a sample of e-mail template file:

<email-template>
    <!-- Email format - TEXT_HTML o TEXT_PLAIN....default: TEXT_PLAIN -->
    <!--format></format-->
     
    <!-- Name of the Email Sender: optional -->
    <!-- from-user-name is alias of fromUserName -->
    <from-user-name>My Name</from-user-name>
     
    <!-- Email address of the Email Sender: MANDATORY -->
    <!-- from-user-email-address is alias of fromUserEmailAddress -->
    <from-user-email-address>my.name@pepstock.org</from-user-email-address>
 
    <!-- Subject of the Email -->  
    <subject>
      <!--[CDATA[
        End Job test Subjecto for job ${jem.job.name}
      ]]-->
    </subject>
     
    <!-- Text of the Email --> 
    <text>
      <!--[CDATA[
        Hi,
        The Job ${jem.job.name} terminated with result: ${jem.job.result.returnCode}! 
        StartedTime: ${jem.job.startedTime}.
      ]]-->
    </text>
</email-template>

Here is the list of variables that can be used inside the e-mail template to compose the e-mail content:

Variable Description
jem.job.id Unique ID of the job
jem.job.name Job name
jem.job.user Job submitter ID
jem.job.submittedTime Job submit time
jem.job.startedTime Job start time
jem.job.endedTime Job end time
jem.job.memberLabel IP Address and port of JEM node which ran the job
jem.job.processId Process ID of job during the execution
jem.job.result.returnCode Return code of job execution
jem.job.result.exceptionMessage Exception message (if any), that is, java.lang.Exception.getMessage()
jem.job.jcl.type JCL type
jem.job.jcl.content Full JCL source
jem.job.jcl.environment Job execution Enviroment
jem.job.jcl.domain Job execution Domain
jem.job.jcl.affinity Job execution Affinity
jem.job.jcl.memory Amount of memory required by JCL

The template file is automatically reload after any changes.

Statistcs Manager

Statistics manager is internal component of node which is in charge to get many statistics about node. It's possible to configure if you want to store all collected data to a specific path . Here is a sample:

<statistics-manager path="stats"/>

If the element is missing in configuration file, the statistics manager is enabled by default without saving the data on file system. If it's present and path attribute is set, JEM will try to store all statistics information on folder, creating it if doesn't exists. The path will be created on data path then is an relative part of data path of global file system.

Hazelcast

Hazelcast configuration file is mandatory to create a Hazelcast instance. You can (conceptually) split it into different sections.

Properties

First section is related to the properties that can be set in Hazelcast. As an example, let's see how to configure Apache Log4J as logging engine in Hazelcast (this is strongly suggested because JEM uses Apache Log4J as well).

<properties>
    <property name="hazelcast.logging.type">log4j</property> 
</properties>

Properties are used also to set the configuration for the Cluster Security.

For more information about properties use, see Hazelcast documentation.

Group

Next section is about the group: Hazelcast define a Group by its name and password.

The group name must be the same of the environment defined in JEM node's configuration file (see previous section): otherwise, JEM will fail to start.

Password is a Hazelcast mandatory configuration element and it has to be the same among all the group's members. To enforce this, the actual value is statically set in the code, so the value set in configuration file isn't actually used. In JEM, the password is not the only authentication method to join the cluster: see the Cluster Security section.

<group>
    <name>ENV-1</name>
    <password>jem_password</password>
</group>

Partition rule

To avoid any risk to loose data, due to crash machine, Hazelcast is configured by a partition group which enables the backups of data on different machines, and never on the same. For more details, see Hazelcast documentation.

<partition-group enabled="true" group-type="HOST_AWARE" />

Network

Next section is about network: it is mandatory to manage the cluster. Hazelcast gives several ways to create the connections among members but the most helpful network configuration for a Cloud Computing approach is the use of multicast.

Evaluate also the usage of interface mapping to avoid to use different IP stack.

<network>
    <port auto-increment="true">5710</port>
    <join>
        <multicast enabled="true">
            <multicast-group>233.0.0.1</multicast-group>
            <multicast-port>54327</multicast-port>
        </multicast>
        <tcp-ip enabled="false">
            <interface>127.0.0.1:5710</interface>
        </tcp-ip>
    </join>
    <interfaces enabled="false">
        <interface>10.10.1.*</interface>
    </interfaces>
</network>

For more information about network configuration, see Hazelcast documentation.

Executor

JEM uses Hazelcast Executor capabilities (for internal tasks only, e.g for for commands scheduled by web application, not for the job execution), so executor configuration is suggested.

Executors provide a well-performing way to execute code on a (or a set of) cluster memeber(s).

<executor-service>
    <core-pool-size>16</core-pool-size>
    <max-pool-size>64</max-pool-size>
    <keep-alive-seconds>60</keep-alive-seconds>
</executor-service>

Map

The most important configuration of Hazelcast is the maps configuration.

For JEM needs, it's necessary that the maps use the persistence of Hazelcast. The first parameter to be defined for all maps is the backup count, set to 1 (so another member has a copy of the current map).

<!--
    Number of sync-backups. If 1 is set as the backup-count for example,
    then all entries of the map will be copied to another JVM for
    fail-safety. Valid numbers are 0 (no backup), 1, 2, 3.
-->
<backup-count>1</backup-count>

As usual, you can find more information about Map configuration in Hazelcast documentation.

It's also mandatory to configure the persistence to a relational database. JEM uses an database definition you put in JEM node XML configuration file. Synchronous persistence must be configured to ensure data consistency.

The following sample is used to configure the input queue (see org.pepstock.jem.node.persistence.InputMapManager class name).

<map-store enabled="true">
    <!-- Name of the class implementing MapLoader and/or MapStore. The class 
         should implement at least of these interfaces and contain no-argument constructor. 
         Note that the inner classes are not supported. -->
    <class-name>org.pepstock.jem.node.persistence.InputMapManager</class-name>
    <!-- Number of seconds to delay to call the MapStore.store(key, value). 
        If the value is zero then it is write-through so MapStore.store(key, value) 
        will be called as soon as the entry is updated. Otherwise it is write-behind 
        so updates will be stored after write-delay-seconds value by calling Hazelcast.storeAll(map). 
        Default value is 0. -->
    <write-delay-seconds>0</write-delay-seconds>
</map-store>

Apache Log4j

JEM uses Apache Log4J as logging engine. The (optional) configuration file must be an XML file (not a properties file).

A custom appender (org.pepstock.jem.log.InMemoryAppender) is defined and it's used to show in web user interface last 100 rows of log. The amount of rows is configurable by MaxRows attribute.

The default configuration (shown below) provides a console appender and file appender:

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" debug="false">   
    <appender name="consoleAppender" class="org.apache.log4j.ConsoleAppender">
        <param name="Threshold" value="INFO">
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{yyyy MM dd HH:mm:ss} %-6p [%t] %m%n">
        </layout>
    </appender>
 
    <appender name="fileAppender" class="org.apache.log4j.DailyRollingFileAppender">
        <param name="File" value="${jem.node}/logs/jem">
        <param name="DatePattern" value="'.'yyyy-MM-dd">
        <param name="Append" value="true">
        <param name="Threshold" value="INFO">
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{yyyy MM dd HH:mm:ss} %-6p [%t] %m%n">
        </layout>
    </appender>

    <appender name="inMemoryAppender" class="org.pepstock.jem.log.InMemoryAppender">
	<param name="Threshold" value="INFO" />
	<param name="MaxRows" value="100" />
	<layout class="org.apache.log4j.PatternLayout">
		<param name="ConversionPattern" value="%d{yyyy MM dd HH:mm:ss} %-6p [%t] %m%n" />
	</layout>
    </appender>
	
    <root>
	<priority value="info" />
	<appender-ref ref="consoleAppender" />
	<appender-ref ref="fileAppender" />		
	<appender-ref ref="inMemoryAppender" />		
    </root>
	
</log4j:configuration> 
Clone this wiki locally