Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default job name length for DistributedLzoIndexer #117

Merged
merged 8 commits into from
Sep 24, 2016
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import com.hadoop.mapreduce.LzoSplitRecordReader;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
Expand All @@ -25,6 +26,15 @@

public class DistributedLzoIndexer extends Configured implements Tool {
private static final Log LOG = LogFactory.getLog(DistributedLzoIndexer.class);
/**
* Override the default job name which is generated from the arguments.
*/
public static final String JOB_NAME_KEY = "lzo.indexer.distributed.job.name";
/**
* Override the default length to which the job name will be truncated. Set non-positive to disable.
*/
public static final String JOB_NAME_MAX_LENGTH_KEY = "lzo.indexer.distributed.job.name.max.length";
private static final int DEFAULT_JOB_NAME_MAX_LENGTH = 200;
private final String LZO_EXTENSION = new LzopCodec().getDefaultExtension();

private final PathFilter nonTemporaryFilter = new PathFilter() {
Expand Down Expand Up @@ -66,6 +76,20 @@ private void walkPath(Path path, PathFilter pathFilter, List<Path> accumulator)
}
}

static void setJobName(Job job, String[] args) {
final Configuration conf = job.getConfiguration();

String name = conf.get(JOB_NAME_KEY, "Distributed Lzo Indexer " + Arrays.toString(args));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define a constant for "Distributed Lzo Indexer" so that it can be reused by the test?


final int maxLength = conf.getInt(JOB_NAME_MAX_LENGTH_KEY, DEFAULT_JOB_NAME_MAX_LENGTH);

if (maxLength > 0 && name.length() > maxLength) {
name = name.substring(0, maxLength) + "...";
}

job.setJobName(name);
}

public int run(String[] args) throws Exception {
if (args.length == 0 || (args.length == 1 && "--help".equals(args[0]))) {
printUsage();
Expand All @@ -85,7 +109,7 @@ public int run(String[] args) throws Exception {
}

Job job = new Job(getConf());
job.setJobName("Distributed Lzo Indexer " + Arrays.toString(args));
setJobName(job, args);

job.setOutputKeyClass(Path.class);
job.setOutputValueClass(LongWritable.class);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
package com.hadoop.compression.lzo;

import junit.framework.TestCase;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;

public class TestDistLzoIndexerJobName extends TestCase {

public void testDefaultName() throws Exception {
String[] args = new String[]{
"hdfs://cluster/user/test/output/file-m-00000.lzo",
};

Job job = new Job(new Configuration(false));
DistributedLzoIndexer.setJobName(job, args);

String expected = "Distributed Lzo Indexer [hdfs://cluster/user/test/output/file-m-00000.lzo]";

assertEquals(expected, job.getJobName());
}

public void testCustomeName() throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: CustomeName -> CustomName

String[] args = new String[]{
"ignored",
};
String customName = "-<Custom Job Name>-";

Configuration conf = new Configuration(false);
conf.set("lzo.indexer.distributed.job.name", customName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the constant.

Job job = new Job(conf);
DistributedLzoIndexer.setJobName(job, args);

assertEquals(customName, job.getJobName());
}

public void testCustomeNameTruncation() throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same typo

String[] args = new String[]{
"ignored",
};

Configuration conf = new Configuration(false);
conf.set("lzo.indexer.distributed.job.name", "123456789");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the constants in these 2 lines.

conf.setInt("lzo.indexer.distributed.job.name.max.length", 5);
Job job = new Job(conf);
DistributedLzoIndexer.setJobName(job, args);

assertEquals("12345...", job.getJobName());
}

public void testDefaultLengthTruncation() throws Exception {
Copy link
Collaborator

@sjlee sjlee Sep 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think it's important to test the default length. It should be sufficient between the custom name test and custom length test IMO. I'm OK with dropping this test. The fact that this test needs to rely on the default value (200) implicitly is also bit of a drawback (should the default change in the future).

String[] args = new String[]{
"hdfs://cluster/user/test/output/file-m-00000.lzo",
"hdfs://cluster/user/test/output/file-m-00001.lzo",
"hdfs://cluster/user/test/output/file-m-00002.lzo",
"hdfs://cluster/user/test/output/file-m-00003.lzo",
"hdfs://cluster/user/test/output/file-m-00003.lzo",
};

Job job = new Job(new Configuration(false));
DistributedLzoIndexer.setJobName(job, args);

String expected = "Distributed Lzo Indexer [hdfs://cluster/user/test/output/file-m-00000.lzo, hdfs://cluster/user/test/output/file-m-00001.lzo, hdfs://cluster/user/test/output/file-m-00002.lzo, hdfs://cluster/user/test/...";
// Truncated length should be 200 + 3 for the "..."
assertEquals(203, expected.length());

assertEquals(expected, job.getJobName());
}

public void testCustomLengthTruncation() throws Exception {
String[] args = new String[]{
"hdfs://cluster/user/test/output/file-m-00000.lzo",
"hdfs://cluster/user/test/output/file-m-00001.lzo",
"hdfs://cluster/user/test/output/file-m-00002.lzo",
"hdfs://cluster/user/test/output/file-m-00003.lzo",
"hdfs://cluster/user/test/output/file-m-00003.lzo",
};

Configuration conf = new Configuration(false);
conf.setInt("lzo.indexer.distributed.job.name.max.length", 50);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the constant.

Job job = new Job(conf);
DistributedLzoIndexer.setJobName(job, args);

String expected = "Distributed Lzo Indexer [hdfs://cluster/user/test/...";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the constant here too.

// Truncated length should be 50 + 3 for the "..."
assertEquals(53, expected.length());

assertEquals(expected, job.getJobName());
}

public void testDisabledTruncation() throws Exception {
String[] args = new String[]{
"hdfs://cluster/user/test/output/file-m-00000.lzo",
"hdfs://cluster/user/test/output/file-m-00001.lzo",
"hdfs://cluster/user/test/output/file-m-00002.lzo",
"hdfs://cluster/user/test/output/file-m-00003.lzo",
"hdfs://cluster/user/test/output/file-m-00003.lzo",
};

Configuration conf = new Configuration(false);
conf.setInt("lzo.indexer.distributed.job.name.max.length", 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the constant.

Job job = new Job(conf);
DistributedLzoIndexer.setJobName(job, args);

String expected = "Distributed Lzo Indexer [hdfs://cluster/user/test/output/file-m-00000.lzo, hdfs://cluster/user/test/output/file-m-00001.lzo, hdfs://cluster/user/test/output/file-m-00002.lzo, hdfs://cluster/user/test/output/file-m-00003.lzo, hdfs://cluster/user/test/output/file-m-00003.lzo]";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

assertEquals(expected, job.getJobName());
}

}