Skip to content

Commit

Permalink
[ADAM-1328] Rename Transform to TransformAlignments.
Browse files Browse the repository at this point in the history
Resolves #1328.
  • Loading branch information
fnothaft authored and heuermh committed May 21, 2017
1 parent b14abc8 commit 2820e94
Show file tree
Hide file tree
Showing 12 changed files with 58 additions and 58 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Choose one of the following commands:
ADAM ACTIONS
countKmers : Counts the k-mers/q-mers from a read dataset.
countContigKmers : Counts the k-mers/q-mers from a read dataset.
transform : Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations
transformAlignments : Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations
transformFeatures : Convert a file with sequence features into corresponding ADAM format and vice versa
mergeShards : Merges the shards of a file
reads2coverage : Calculate the coverage from a given ADAM file
Expand All @@ -93,7 +93,7 @@ PRINT
You can learn more about a command, by calling it without arguments or with `--help`, e.g.

```
$ adam-submit transform --help
$ adam-submit transformAlignments --help
INPUT : The ADAM, BAM or SAM file to apply the transforms to
OUTPUT : Location to write the transformed data in ADAM/Parquet format
-add_md_tags VAL : Add MD Tags to reads based on the FASTA (or equivalent) file passed to this option.
Expand Down Expand Up @@ -145,7 +145,7 @@ $ adam-submit transform --help
to LENIENT
```

The ADAM `transform` command allows you to mark duplicates, run base quality score recalibration (BQSR) and other pre-processing steps on your data.
The ADAM `transformAlignments` command allows you to mark duplicates, run base quality score recalibration (BQSR) and other pre-processing steps on your data.

# Getting Started

Expand Down Expand Up @@ -209,11 +209,11 @@ These aliases call scripts that wrap the `spark-submit` and `spark-shell` comman

Now you can try running some simple ADAM commands:

### `transform`
### `transformAlignments`
Make your first `.adam` file like this:

````
adam-submit transform $ADAM_HOME/adam-core/src/test/resources/small.sam /tmp/small.adam
adam-submit transformAlignments $ADAM_HOME/adam-core/src/test/resources/small.sam /tmp/small.adam
````

If you didn't obtain your copy of adam from github, you can [grab `small.sam` here](https://raw.githubusercontent.com/bigdatagenomics/adam/master/adam-core/src/test/resources/small.sam).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ object ADAMMain {
List(
CountReadKmers,
CountContigKmers,
Transform,
TransformAlignments,
TransformFeatures,
MergeShards,
Reads2Coverage
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,16 @@ import org.bdgenomics.utils.cli._
import org.bdgenomics.utils.misc.Logging
import org.kohsuke.args4j.{ Argument, Option => Args4jOption }

object Transform extends BDGCommandCompanion {
val commandName = "transform"
object TransformAlignments extends BDGCommandCompanion {
val commandName = "transformAlignments"
val commandDescription = "Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations"

def apply(cmdLine: Array[String]) = {
new Transform(Args4j[TransformArgs](cmdLine))
new TransformAlignments(Args4j[TransformAlignmentsArgs](cmdLine))
}
}

class TransformArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetArgs {
class TransformAlignmentsArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetArgs {
@Argument(required = true, metaVar = "INPUT", usage = "The ADAM, BAM or SAM file to apply the transforms to", index = 0)
var inputPath: String = null
@Argument(required = true, metaVar = "OUTPUT", usage = "Location to write the transformed data in ADAM/Parquet format", index = 1)
Expand Down Expand Up @@ -91,13 +91,13 @@ class TransformArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetArgs {
var forceShuffle: Boolean = false
@Args4jOption(required = false, name = "-sort_fastq_output", usage = "Sets whether to sort the FASTQ output, if saving as FASTQ. False by default. Ignored if not saving as FASTQ.")
var sortFastqOutput: Boolean = false
@Args4jOption(required = false, name = "-force_load_bam", usage = "Forces Transform to load from BAM/SAM.")
@Args4jOption(required = false, name = "-force_load_bam", usage = "Forces TransformAlignments to load from BAM/SAM.")
var forceLoadBam: Boolean = false
@Args4jOption(required = false, name = "-force_load_fastq", usage = "Forces Transform to load from unpaired FASTQ.")
@Args4jOption(required = false, name = "-force_load_fastq", usage = "Forces TransformAlignments to load from unpaired FASTQ.")
var forceLoadFastq: Boolean = false
@Args4jOption(required = false, name = "-force_load_ifastq", usage = "Forces Transform to load from interleaved FASTQ.")
@Args4jOption(required = false, name = "-force_load_ifastq", usage = "Forces TransformAlignments to load from interleaved FASTQ.")
var forceLoadIFastq: Boolean = false
@Args4jOption(required = false, name = "-force_load_parquet", usage = "Forces Transform to load from Parquet.")
@Args4jOption(required = false, name = "-force_load_parquet", usage = "Forces TransformAlignments to load from Parquet.")
var forceLoadParquet: Boolean = false
@Args4jOption(required = false, name = "-single", usage = "Saves OUTPUT as single file")
var asSingleFile: Boolean = false
Expand Down Expand Up @@ -125,8 +125,8 @@ class TransformArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetArgs {
var storageLevel: String = "MEMORY_ONLY"
}

class Transform(protected val args: TransformArgs) extends BDGSparkCommand[TransformArgs] with Logging {
val companion = Transform
class TransformAlignments(protected val args: TransformAlignmentsArgs) extends BDGSparkCommand[TransformAlignmentsArgs] with Logging {
val companion = TransformAlignments

val stringency = ValidationStringency.valueOf(args.stringency)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ class ADAMMainSuite extends FunSuite {
test("single command group") {
val stream = new ByteArrayOutputStream()
Console.withOut(stream) {
new ADAMMain(List(CommandGroup("SINGLE COMMAND GROUP", List(Transform)))).apply(Array())
new ADAMMain(List(CommandGroup("SINGLE COMMAND GROUP", List(TransformAlignments)))).apply(Array())
}
val out = stream.toString()
assert(out.contains("Usage"))
Expand All @@ -72,7 +72,7 @@ class ADAMMainSuite extends FunSuite {
test("add new command group to default command groups") {
val stream = new ByteArrayOutputStream()
Console.withOut(stream) {
val commandGroups = defaultCommandGroups.union(List(CommandGroup("NEW COMMAND GROUP", List(Transform))))
val commandGroups = defaultCommandGroups.union(List(CommandGroup("NEW COMMAND GROUP", List(TransformAlignments))))
new ADAMMain(commandGroups)(Array())
}
val out = stream.toString()
Expand All @@ -97,7 +97,7 @@ class ADAMMainSuite extends FunSuite {
Console.withOut(stream) {
val module = new AbstractModule with ScalaModule {
def configure() = {
bind[List[CommandGroup]].toInstance(List(CommandGroup("SINGLE COMMAND GROUP", List(Transform))))
bind[List[CommandGroup]].toInstance(List(CommandGroup("SINGLE COMMAND GROUP", List(TransformAlignments))))
}
}
val injector = Guice.createInjector(module)
Expand All @@ -115,7 +115,7 @@ class ADAMMainSuite extends FunSuite {
Console.withOut(stream) {
val module = new AbstractModule with ScalaModule {
def configure() = {
bind[List[CommandGroup]].toInstance(defaultCommandGroups.union(List(CommandGroup("NEW COMMAND GROUP", List(Transform)))))
bind[List[CommandGroup]].toInstance(defaultCommandGroups.union(List(CommandGroup("NEW COMMAND GROUP", List(TransformAlignments)))))
}
}
val injector = Guice.createInjector(module)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class MergeShardsSuite extends ADAMFunSuite {
val inputPath = copyResource("unordered.sam")
val actualPath = tmpFile("unordered.sam")
val expectedPath = inputPath
Transform(Array("-single", "-defer_merging", inputPath, actualPath)).run(sc)
TransformAlignments(Array("-single", "-defer_merging", inputPath, actualPath)).run(sc)
MergeShards(Array(actualPath + "_tail", actualPath,
"-header_path", actualPath + "_head")).run(sc)
checkFiles(expectedPath, actualPath)
Expand All @@ -36,7 +36,7 @@ class MergeShardsSuite extends ADAMFunSuite {
val inputPath = copyResource("unordered.sam")
val actualPath = tmpFile("ordered.sam")
val expectedPath = copyResource("ordered.sam")
Transform(Array("-single",
TransformAlignments(Array("-single",
"-sort_reads",
"-sort_lexicographically",
"-defer_merging",
Expand All @@ -49,7 +49,7 @@ class MergeShardsSuite extends ADAMFunSuite {
sparkTest("merge sharded bam") {
val inputPath = copyResource("unordered.sam")
val actualPath = tmpFile("unordered.bam")
Transform(Array("-single",
TransformAlignments(Array("-single",
"-defer_merging",
inputPath, actualPath)).run(sc)
MergeShards(Array(actualPath + "_tail", actualPath,
Expand All @@ -66,7 +66,7 @@ class MergeShardsSuite extends ADAMFunSuite {
println(referencePath)

val actualPath = tmpFile("artificial.cram")
Transform(Array("-single",
TransformAlignments(Array("-single",
"-sort_reads",
"-sort_lexicographically",
"-defer_merging",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class Reads2FragmentsSuite extends ADAMFunSuite {
val expectedPath = copyResource("ordered.sam")
Reads2Fragments(Array(inputPath, fragmentsPath)).run(sc)
Fragments2Reads(Array(fragmentsPath, readsPath)).run(sc)
Transform(Array("-single", "-sort_reads", "-sort_lexicographically",
TransformAlignments(Array("-single", "-sort_reads", "-sort_lexicographically",
readsPath, actualPath)).run(sc)
checkFiles(expectedPath, actualPath)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,20 @@ package org.bdgenomics.adam.cli
import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.util.ADAMFunSuite

class TransformSuite extends ADAMFunSuite {
class TransformAlignmentsSuite extends ADAMFunSuite {
sparkTest("unordered sam to unordered sam") {
val inputPath = copyResource("unordered.sam")
val actualPath = tmpFile("unordered.sam")
val expectedPath = inputPath
Transform(Array("-single", inputPath, actualPath)).run(sc)
TransformAlignments(Array("-single", inputPath, actualPath)).run(sc)
checkFiles(expectedPath, actualPath)
}

sparkTest("unordered sam to ordered sam") {
val inputPath = copyResource("unordered.sam")
val actualPath = tmpFile("ordered.sam")
val expectedPath = copyResource("ordered.sam")
Transform(Array("-single", "-sort_reads", "-sort_lexicographically", inputPath, actualPath)).run(sc)
TransformAlignments(Array("-single", "-sort_reads", "-sort_lexicographically", inputPath, actualPath)).run(sc)
checkFiles(expectedPath, actualPath)
}

Expand All @@ -42,8 +42,8 @@ class TransformSuite extends ADAMFunSuite {
val intermediateAdamPath = tmpFile("unordered.adam")
val actualPath = tmpFile("unordered.sam")
val expectedPath = inputPath
Transform(Array(inputPath, intermediateAdamPath)).run(sc)
Transform(Array("-single", intermediateAdamPath, actualPath)).run(sc)
TransformAlignments(Array(inputPath, intermediateAdamPath)).run(sc)
TransformAlignments(Array("-single", intermediateAdamPath, actualPath)).run(sc)
checkFiles(expectedPath, actualPath)
}

Expand All @@ -52,15 +52,15 @@ class TransformSuite extends ADAMFunSuite {
val intermediateAdamPath = tmpFile("unordered.adam")
val actualPath = tmpFile("ordered.sam")
val expectedPath = copyResource("ordered.sam")
Transform(Array(inputPath, intermediateAdamPath)).run(sc)
Transform(Array("-single", "-sort_reads", "-sort_lexicographically", intermediateAdamPath, actualPath)).run(sc)
TransformAlignments(Array(inputPath, intermediateAdamPath)).run(sc)
TransformAlignments(Array("-single", "-sort_reads", "-sort_lexicographically", intermediateAdamPath, actualPath)).run(sc)
checkFiles(expectedPath, actualPath)
}

sparkTest("put quality scores into bins") {
val inputPath = copyResource("bqsr1.sam")
val finalPath = tmpFile("binned.adam")
Transform(Array(inputPath, finalPath, "-bin_quality_scores", "0,20,10;20,40,30;40,60,50")).run(sc)
TransformAlignments(Array(inputPath, finalPath, "-bin_quality_scores", "0,20,10;20,40,30;40,60,50")).run(sc)
val qualityScoreCounts = sc.loadAlignments(finalPath)
.rdd
.flatMap(_.getQual)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ class ViewSuite extends ADAMFunSuite {
sparkBefore("initialize 'reads' Array from flag-values.sam") {

val transform =
new Transform(
Args4j[TransformArgs](
new TransformAlignments(
Args4j[TransformAlignmentsArgs](
Array(
inputSamPath,
"unused_output_path"
Expand Down
4 changes: 2 additions & 2 deletions docs/source/01_intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ PRINT
You can learn more about a command, by calling it without arguments or with `--help`, e.g.

```
$ adam-submit transform
$ adam-submit transformAlignments
Argument "INPUT" is required
INPUT : The ADAM, BAM or SAM file to apply the transforms to
OUTPUT : Location to write the transformed data in ADAM/Parquet format
Expand Down Expand Up @@ -191,7 +191,7 @@ Argument "INPUT" is required
to LENIENT
```

The ADAM transform command allows you to mark duplicates, run base quality score recalibration (BQSR) and other pre-processing steps on your data.
The ADAM transformAlignments command allows you to mark duplicates, run base quality score recalibration (BQSR) and other pre-processing steps on your data.

There are also a number of projects built on ADAM, e.g.

Expand Down
14 changes: 7 additions & 7 deletions docs/source/40_deploying_ADAM.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ spark-master using `scp` and then copy to HDFS using
From ADAM shell, or as parameter to ADAM submit, you would refer HDFS URLs
such as:
```
adam-submit transform hdfs://spark-master/work_dir/sample1.bam \
adam-submit transformAlignments hdfs://spark-master/work_dir/sample1.bam \
hdfs://spark-master/work_dir/sample1.adam
```

Expand Down Expand Up @@ -197,9 +197,9 @@ can cause jobs to fail. To eliminate this issue, you can set the
resource request to YARN over the JVM Heap size indicated by `--driver-memory`
or `--executor-memory`.

As a final example, to run the ADAM [transform](#transform) CLI using YARN
cluster mode on a 64 node cluster with one executor per node and a 2GB per
executor overhead, we would run:
As a final example, to run the ADAM [transformAlignments](#transformAlignments)
CLI using YARN cluster mode on a 64 node cluster with one executor per node and
a 2GB per executor overhead, we would run:

```
./bin/adam-submit \
Expand All @@ -212,7 +212,7 @@ executor overhead, we would run:
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.executor.instances=64 \
-- \
transform in.sam out.adam
transformAlignments in.sam out.adam
```

In this example, we are allocating 200GB of JVM heap space per executor and for
Expand Down Expand Up @@ -255,7 +255,7 @@ include:
this workflow was demonstrated in [@vivian16] and sets up a Spark cluster
which then runs ADAM's [`countKmers` CLI](#countKmers).
* [adam-pipeline](https://github.com/BD2KGenomics/toil-scripts/tree/master/src/toil_scripts/adam_pipeline):
this workflow runs several stages in the ADAM [`transform` CLI](#transform).
this workflow runs several stages in the ADAM [`transformAlignments` CLI](#transformAlignments).
This pipeline is the ADAM equivalent to the GATK's "Best Practice" read
preprocessing pipeline. We then stitch together this pipeline with
[BWA-MEM](https://github.com/lh3/bwa) and the GATK in the [adam-gatk-pipeline](
Expand Down Expand Up @@ -443,7 +443,7 @@ does the following work:
# convert the file
_log.info('Converting %s into ADAM format at %s.', hdfs_tmp_file, hdfs_input_file)
call_adam(master_ip,
['transform',
['transformAlignments',
hdfs_tmp_file, hdfs_input_file],
memory=memory, override_parameters=spark_conf)
```
Expand Down
24 changes: 12 additions & 12 deletions docs/source/50_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,22 +101,22 @@ Beyond the [default options](#default-args), both `countKmers` and
* `-print_histogram`: If provided, prints a histogram of the $k$-mer count
distribution to standard out.

### transform {#transform}
### transformAlignments {#transformAlignments}

The `transform` CLI is the entrypoint to ADAM's read preprocessing tools. This
command provides drop-in replacement commands for several commands in the
[Genome Analysis Toolkit](https://software.broadinstitute.org/gatk/) "Best
Practices" read preprocessing pipeline and more [@depristo11]. This CLI tool
takes two required arguments:
The `transformAlignments` CLI is the entrypoint to ADAM's read preprocessing
tools. This command provides drop-in replacement commands for several commands
in the [Genome Analysis Toolkit](https://software.broadinstitute.org/gatk/)
"Best Practices" read preprocessing pipeline and more [@depristo11]. This CLI
tool takes two required arguments:

1. `INPUT`: The input path. A file containing reads in any of the supported
ADAM read input formats.
2. `OUTPUT`: The path to save the transformed reads to. Supports any of ADAM's
read output formats.

Beyond the [default options](#default-args) and the [legacy output
options](#legacy-output), `transform` supports a vast range of options. These
options fall into several general categories:
options](#legacy-output), `transformAlignments` supports a vast range of options.
These options fall into several general categories:

* General options:
* `-cache`: If provided, the results of intermediate stages will be cached.
Expand Down Expand Up @@ -204,7 +204,7 @@ options fall into several general categories:
fragment to load. Defaults to 10,000bp.
* `-md_tag_overwrite`: If provided, recomputes and overwrites the
`mismatchingPositions` field for records where this field was provided.
* Output options: `transform` supports the [legacy output](#legacy-output)
* Output options: `transformAlignments` supports the [legacy output](#legacy-output)
options. Additionally, there are the following options:
* `-coalesce`: Sets the number of partitions to coalesce the output to.
If `-force_shuffle_coalesce` is not provided, the Spark engine may ignore
Expand Down Expand Up @@ -399,9 +399,9 @@ options](#default-args). Additionally, `adam2fasta` takes the following options:

### adam2fastq

While the [`transform`](#transform) command can export to FASTQ, the
`adam2fastq` provides a simpler CLI with more output options. `adam2fastq`
takes two required arguments and an optional third argument:
While the [`transformAlignments`](#transformAlignments) command can export to
FASTQ, the `adam2fastq` provides a simpler CLI with more output options.
`adam2fastq` takes two required arguments and an optional third argument:

1. `INPUT`: The input read file, in any ADAM-supported read format.
2. `OUTPUT`: The path to save an unpaired or interleaved FASTQ file to, or the
Expand Down
4 changes: 2 additions & 2 deletions scripts/jenkins-test
Original file line number Diff line number Diff line change
Expand Up @@ -218,12 +218,12 @@ then
# once fetched, convert BAM to ADAM
echo "Converting BAM to ADAM read format"
rm -rf ${READS}
${ADAM} transform ${BAM} ${READS}
${ADAM} transformAlignments ${BAM} ${READS}

# then, sort the BAM
echo "Converting BAM to ADAM read format with sorting"
rm -rf ${SORTED_READS}
${ADAM} transform -sort_reads ${READS} ${SORTED_READS}
${ADAM} transformAlignments -sort_reads ${READS} ${SORTED_READS}

# convert the reads to fragments to re-pair the reads
echo "Converting read file to fragments"
Expand Down

0 comments on commit 2820e94

Please sign in to comment.