Skip to content

ConfiguringJEMNode

stockiNail edited this page Oct 12, 2015 · 12 revisions

Configuration

The configuration folder contains all configuration files necessary to start a JEM node.

The files are the following:

  • jem-node.xml is JEM configuration file

See the details for configuration files.

JEM node

JEM node configuration file is an XML file which contains the following information:

  • Execution Environment
  • Directories' structure (see previous section about file-systems)

all other information about the node will be present inside the jem-env.xml file, this because they are considered common configuration for all nodes of the cluster.

Here is a configuration file sample:

<configuration>
 
    <execution-environment>
        <environment>ENV-1</environment>
        <domain>domain</domain>
        <affinity>classA,classB</affinity>
        <affinityfactory classname="org.pepstock.jem.node.affinity.JSPolicyAffinityLoader">
            <properties>
                <property name="jem.affinity.loader.policy" value="gfs/persistence/environment/policy/policy.js">
                </property>
            </properties>
        </affinityfactory>
	<parallel-jobs>1</parallel-jobs>
	<memory>128</memory>
    </execution-environment>
 
    <paths>
        <output>C:/JemFS/output</output>
        <data>
           <path name="default">C:/JemFS/default</path>
           <path name="hdfs">C:/JemFS/hdfs</path>
        </data>
        <source>C:/JemFS/source</source>
        <binary>C:/JemFS/binary</binary>
        <classpath>C:/JemFS/classpath</classpath>
        <library>C:/JemFS/library</library>
        <persistence>C:/JemFS/persistence</persistence>
    </paths>

    <java-runtimes>
        <java name="jdk6">C:\Program Files\Java\jdk1.6\</java>
        <java name="jdk7" default="true">C:\Program Files\Java\jdk1.7\</java>
    </java-runtimes>

</configuration>                          

Execution environment

Inside the environment element you define the JEM cluster name. It must be the same as the tag used inside Hazelcast configuration file (see next section). This configuration element is mandatory.

In the domain element you define the Domain of JEM node. This is optional: if missing, JEM node will get all jobs with default Domain.

In the affinity element you define the set of static Affinities of JEM node, as a list of comma-separated tag. This is optional: if missing, the JEM node will get all the jobs with the default Affinity.

In the affinityFactory element you define a factory class which will be called by JEM node, when it's starting, to have a list of affinities tags to add to the static ones. The factory is configured via properties/property elements. This is optional: if missing, JEM node will use only the static Affinities. Affinity Loader

The factory class must implement org.pepstock.jem.node.affinity.AffinityLoader interface which is described as following:

/**
 * Is a interface which could be implemented by a custom class to assign a set of affinities labels.<br>
 * These labels are used by node to evaluate if a job could be executed or not in the node. 
 */
public interface AffinityLoader {
 
        /**
         * Called to initialize the listener. A set of properties are passed, or a
         * empty collection if the properties are not defined
         */
        void init(Properties properties);
 
        /**
         * Called to have the list of affinities and memory, using the system information if needed.
         */
        Result load(SystemInfo info) throws Exception;
         
}

JEM passes to affinity loader implementation, by method load(SystemInfo info), a object with several of system information to use to evaluate the right affinities to set. All information are stored inside of object by a key-value map. Here the list of keys and their meanings:

Category Description Keys
System Determines the current system properties Here is the list of key.
Runtime Runtime information, like processors and memory availableProcessors: the number of processors available
totalMemory: the total amount of memory
freeMemory: the amount of free memory
Environment The current system environment and variables
Network Basic network information ipaddresses: list of ipaddresses of system, comma separated
hostnames: list of hostnames of system, comma separated
Running processes List of current processes in execution List of strings with the following format:
"PID", "USER", "STIME", "SIZE", "RSS", "SHARE", "STATE", "TIME", "%CPU", "COMMAND"

JEM has got 2 OOTB Affinity Factory classes, which use Java Script and Groovy code to define the right tags for affinities of JEM node. It's possible to configure the initial maximum number that JEM node is able to manage in parallel mode. The default is 1. This value can be changed at runtime from web interface or from a affinity loader (if defined). It's possible to configure the initial maximum amount of memory (in MB) to use to execute jobs. The default is 128. This value can be changed at runtime from web interface or from a affinity loader (if defined).

JavaScript AffinityLoader

The class is org.pepstock.jem.node.affinity.JSPolicyAffinityLoader: it loads and evaluates an Affinity JS definition file, specified by jem.affinity.loader.policy. The suggested location for policies' files is the policy directory. The JS Affinity Factory prepares a global read-only variable, named SYSINFO, which contains a lot of system information that the policy can use to defined the affinities. The factory, after policy execution, reads RESULT variable which has got all methods to set:

  • AFFINITIES: a string used as affinity tag from node
  • MEMORY: an integer value representing the memory to be uses to execute the job
  • PARALLELJOBS: an integer value representing the maximum amount of jobs which can be executed at the same time inside the node

Other 2 global read-only variables are set:

  • ENVIRONMENT: the name of environment of JEM cluster
  • DOMAIN: the name of domain of JEM node where JS script is running

An example of java script policy is the following:

/**
 * Sets maximum number of parallel jobs
 */
RESULT.setParallelJobs(1);

/**
 * Extracts the OS type
 */
var osName = SYSINFO.getSystemProperties().getProperty('os.name');

if (osName.contains('Windows')){
	RESULT.getAffinities().add("Windows");
} else if (osName.contains('Linux')){
	RESULT.getAffinities().add("Linux");
} else if (osName.contains('Mac')){
	RESULT.getAffinities().add("MacOS");
} else if (osName.contains('Solaris')){
	RESULT.getAffinities().add("Solaris");
} else if (osName.contains('HP')){
	RESULT.getAffinities().add("HPUX");
} else if (osName.contains('AIX')){
	RESULT.getAffinities().add("AIX");	
} else {
	RESULT.getAffinities().add("AnyOS");
}

/**
 * Extracts the hostname
 */
var hostnames = SYSINFO.getNetworkProperties().getProperty('hostnames');
if (hostnames != null)
	RESULT.getAffinities().add(hostnames);

/**
 * Computes 80% of free memory, in slot of 128MB
 */
var freemem = SYSINFO.getRuntimeProperties().getProperty('freeMemory');
freemem = freemem / 1024 / 1024 * 0.80;
RESULT.setMemory(Math.min(Math.max(Math.floor(freemem / 64) * 64, 64), 1024));

In this way, JEM node can recalculate its own dynamic affinities at runtime, after a command or periodically, at a given interval, without closing JEM node.

org.pepstock.jem.node.affinity.JSPolicyAffinityLoader is aware when the JS script file is changed and can reload new affinities, removing old ones.

Groovy AffinityLoader

The class is org.pepstock.jem.node.affinity.GroovyPolicyAffinityLoader: it loads and evaluates an Affinity Groovy definition file, specified by jem.affinity.loader.policy. The suggested location for policies' files is the policy directory.

The Groovy Affinity Factory prepares a global object, named SYSINFO, which contains a lot of system information that the policy can use to defined the affinities. The factory, after policy execution, reads a object (named RESULT), which has got all fields to set:

  • AFFINITIES: a string used as affinity tag from node
  • MEMORY: an integer value representing the memory to be uses to execute the job
  • PARALLELJOBS: an integer value representing the maximum amount of jobs which can be executed at the same time inside the node

Other 2 global read-only variables are set:

  • ENVIRONMENT: the name of environment of JEM cluster
  • DOMAIN: the name of domain of JEM node where Groovy script is running

An example of Groovy policy is the following:

/**
 * Sets maximum number of parallel jobs
 */
RESULT.parallelJobs = 1; 

/**
 * Extracts the OS type
 */ 
def osName = SYSINFO.getSystemProperties().get("os.name");

if (osName.contains("Windows")){
	RESULT.affinities.add("windows");	
} else if (osName.contains("Linux")){
	RESULT.affinities.add("linux");
} else if (osName.contains("Mac")){
	RESULT.affinities.add("macos");
} else if (osName.contains("Solaris")){
	RESULT.affinities.add("solaris");
} else if (osName.contains("HP")){
	RESULT.affinities.add("hpux");
} else if (osName.contains("AIX")){
	RESULT.affinities.add("aix");	
} else { 
	RESULT.affinities.add("anyos");
}

/**
 * Extracts the hostname
 */
def hostnames = SYSINFO.getNetworkProperties().get("hostnames");

if (hostnames != null){
	items = hostnames.split(',')
	items.each{ 
		if (it.indexOf(".") &gt; -1){
			names = it.split('.');
			shortHostname = names[0];
			RESULT.affinities.add(shortHostname);			
		} else {
			RESULT.affinities.add(it);
		}
	}
}

/**
 * Computes 80% of free memory, in slot of 128MB
 */
 
def freemem = SYSINFO.getRuntimeProperties().get("freeMemory").toLong();
freemem = freemem / 1024 / 1024 * 0.80; 
RESULT.memory = Math.min(Math.max(Math.floor(freemem / 64) * 64, 64), 1024);

return

In this way, JEM node can recalculate its own dynamic affinities at runtime, after a command or periodically, at a given interval, without closing JEM node.

org.pepstock.jem.node.affinity.GroovyPolicyAffinityLoader is aware when the JS script file is changed and can reload new affinities, removing old ones.

Paths

In paths element, you must define the mount point to file system used by JEM for common purposes:

  • data path where all the Environment datasets (read and written) are stored.
  • output path where JEM nodes will store all output produced by job, during its execution (see output management section)
  • source path where JCL sources are stored. Typically, it contains INCLUDE or other JCL snippets.
  • library path where all native system libraries (like .dll, .so), nedded by the executable files present in the binary folder, are stored
  • binary path where all the executable files (like .exe, .cmd, .sh ) called by the JCL are stored
  • classpath path where all the Java libraries (like jar and zip files), needed at runtime by a Java JCL, are stored
  • persistence contains the following directories structure:
    • [ENV-NAME] is the folder containing all the information relative to a specific environment
    • config path containing the configuration files relative to the environment. See Configuring Jem Cluster
    • keystores path where key-stores, used by JEM when running in secure mode, are stored. For more information see JEM cluster security
    • policy path where is present the standard policy file provide by JEM installation (policy.js). This file can be modify to reach custom needs.

Multiple data paths

It's possible to define more than 1 mount point (and than global file systems) for data. This is very helpful if you want to separate files from their nature or if you want to have different kind of files (i.e. Hadoop files) depending on their purposes.

To activate this features, you must:

  1. modify JEM node configuration, defining all mount points fo data paths
  2. modify JEM environment configuration, setting the definition file of all datasets rules
  3. create or modify data rules files, adding all necessary rules to address the right path, base on file name

JEM node configuration

It's necessary to change the JEM node XML configuration file, setting all necessary mount point. All paths must have a unique name and the absolute path of mount point.

<paths>
   <data>;
      <path name="default">/mnt/jem/default</path>
      <path name="hdfs">/mnt/jem/hdfs</path>
   </data>
   ...
   ...
</paths>

The names of data paths will be checked internally of cluster because they must be consistent. That means that all nodes should see all data paths (by different absolute path).

If a multi data paths are defined, a dataset rules XMl file must be set in JEM env configuration.

####JEM env configuration

Due to the rules to allocate files must be same in all cluster, the XML definition of dataset rules must be se in JEM environment configuration.

To do that, it's mandatory to indicate the absolute path of a dataset rules file.

<datasetsRules>${jem.persistence}/#[jem.environment.name]/config/datasetsRules.xml</datasetsRules>

####Datasets rules file

It's a XML file which contains all regular expressions, applicable on file name, used to allocate and address the files to the right file systems.

If a file name does'nt match with any regular expression, JEM will throw an exception.

Be aware that the regular expressions will apply to file defined by dataset entity in JCL. If a dynamic allocation of a file is done, there is not possibility to address it to the right file system.

<rules>
  <rule pathName="hdfs">
    <dataSetPattern>hadoop/.*</dataSetPattern>
  </rule>

  <rule pathName="default">
    <dataSetPattern>.*</dataSetPattern>
  </rule>
</rules>

#Java runtimes#

JEM allows you to use a specific JAVA rutime to execute your job if you have business logic which needs a specific java version and specific java vendor.

For each JEM, you can configure the JAVA_HOMEs installed on the machines, relating that to a tag. This tag will be automatically added to the static affinities therefore a job which needs a specific JRE will be executed on the right node.

All java elements contains:

  • name is mandatory attribute which is the tag related to the JRE
  • default is optional attribute which indicates the JRE default.

The content of java elements is the absolute path of JAVA_HOME.

If java-runtimes definition is missing, JEM executes all jobs using the same JRE that JEM node is using.

PAY ATTENTION: because durign the job execution some JEM classes are used, be aware that JEM is compiled by JDK 1.7 and tested on JDK 1.6 and 1.7.

Clone this wiki locally