General cleanup of documentation.

* Removed URL text from view. * Removed broken links. * Moved intro paragraph to top, consistent w/ README * Updating links to be consistent w/ intro in docs * Removed contractions. * Minor edits to: * 01_intro.md * 02_installation.md * 30_running_example.md * 40_deploying_ADAM.md
bigdatagenomics · Aug 2, 2017 · 9032698 · 9032698
1 parent 6573997
commit 9032698
Show file tree

Hide file tree

Showing 8 changed files with 113 additions and 109 deletions.
diff --git a/README.md b/README.md
@@ -5,15 +5,16 @@ ADAM
 
 # Introduction
 
-ADAM is a genomics analysis platform with specialized file formats built using [Apache Avro][Avro], [Apache Spark][Spark] and [Apache Parquet][Parquet]. Apache 2 licensed. Some quick links: 
+ADAM is a genomics analysis platform with specialized file formats built using [Apache Avro](http://avro.apache.org), [Apache Spark](http://spark.apache.org/) and [Parquet](http://parquet.io/). Apache 2 licensed.
 
-* [Follow our Twitter account](https://twitter.com/bigdatagenomics/).
-* [Chat with ADAM developers in Gitter](https://gitter.im/bigdatagenomics/adam).
-* [Join our mailing list](http://bdgenomics.org/mail/).
-* [Checkout the current build status](https://amplab.cs.berkeley.edu/jenkins/view/Big%20Data%20Genomics/).
-* [Download official releases][releases].
-* [View our software artifacts on Maven Central](http://search.maven.org/#search%7Cga%7C1%7Corg.bdgenomics) ([…including snapshots](https://oss.sonatype.org/index.html#nexus-search;quick~bdgenomics)).
-* [Look at our CHANGES file](https://github.com/bigdatagenomics/adam/blob/master/CHANGES.md).
+* [Follow](https://twitter.com/bigdatagenomics/) our Twitter account
+* [Chat](https://gitter.im/bigdatagenomics/adam) with ADAM developers on Gitter
+* [Join](http://bdgenomics.org/mail) our mailing list
+* [Check out](https://amplab.cs.berkeley.edu/jenkins/view/Big%20Data%20Genomics/) the current build status
+* [Download](https://github.com/bigdatagenomics/adam/releases) official releases
+* [View](http://search.maven.org/#search%7Cga%7C1%7Corg.bdgenomics) our software artifacts on Maven Central
+* [See](https://oss.sonatype.org/index.html#nexus-search;quick~bdgenomics) our snapshots
+* [Look](https://github.com/bigdatagenomics/adam/blob/master/CHANGES.md) at our CHANGES file  
 
 ## Why ADAM?
 
@@ -414,4 +415,4 @@ architecture generalized beyond genomics. To cite this paper, please cite:
 ```
 
 We prefer that you cite both papers, but if you can only cite one paper, we
-prefer that you cite the SIGMOD 2015 manuscript.
+prefer that you cite the SIGMOD 2015 manuscript.
diff --git a/docs/source/01_intro.md b/docs/source/01_intro.md
@@ -1,21 +1,21 @@
 # Introduction
 
-* Follow our Twitter account at [https://twitter.com/bigdatagenomics/](https://twitter.com/bigdatagenomics/)
-* Chat with ADAM developers at [https://gitter.im/bigdatagenomics/adam](https://gitter.im/bigdatagenomics/adam)
-* Join our mailing list at [http://bdgenomics.org/mail](http://bdgenomics.org/mail)
-* Checkout the current build status at [https://amplab.cs.berkeley.edu/jenkins/](https://amplab.cs.berkeley.edu/jenkins/view/Big%20Data%20Genomics/)
-* Download official releases at [https://github.com/bigdatagenomics/adam/releases](https://github.com/bigdatagenomics/adam/releases)
-* View our software artifacts on Maven Central at [http://search.maven.org/#search%7Cga%7C1%7Corg.bdgenomics](http://search.maven.org/#search%7Cga%7C1%7Corg.bdgenomics)
-* See our snapshots at [https://oss.sonatype.org/index.html#nexus-search;quick~bdgenomics](https://oss.sonatype.org/index.html#nexus-search;quick~bdgenomics)
-* Look at our CHANGES file at [https://github.com/bigdatagenomics/adam/blob/master/CHANGES.md](https://github.com/bigdatagenomics/adam/blob/master/CHANGES.md)
+ADAM is a genomics analysis platform with specialized file formats built using [Apache Avro](http://avro.apache.org), [Apache Spark](http://spark.apache.org/) and [Parquet](http://parquet.io/). Apache 2 licensed.
 
-ADAM is a genomics analysis platform with specialized file formats built using [Apache Avro](http://avro.apache.org), [Apache Spark](http://spark.apache.org/) and [Parquet](http://parquet.io/). Apache 2 licensed.  
+* [Follow](https://twitter.com/bigdatagenomics/) our Twitter account
+* [Chat](https://gitter.im/bigdatagenomics/adam) with ADAM developers on Gitter
+* [Join](http://bdgenomics.org/mail) our mailing list
+* [Check out](https://amplab.cs.berkeley.edu/jenkins/view/Big%20Data%20Genomics/) the current build status
+* [Download](https://github.com/bigdatagenomics/adam/releases) official releases
+* [View](http://search.maven.org/#search%7Cga%7C1%7Corg.bdgenomics) our software artifacts on Maven Central
+* [See](https://oss.sonatype.org/index.html#nexus-search;quick~bdgenomics) our snapshots
+* [Look](https://github.com/bigdatagenomics/adam/blob/master/CHANGES.md) at our CHANGES file  
 
 ## Apache Spark
 
 [Apache Spark](http://spark.apache.org/) allows developers to write algorithms in succinct code that can run fast locally, on an in-house cluster or on Amazon, Google or Microsoft clouds. 
 
-For example, the following code snippet will print the top 10 21-mers in `NA2114` from 1000 Genomes.
+For example, the following code snippet will print the top ten 21-mers in `NA2114` from 1000 Genomes:
 
 ```scala
 val ac = new ADAMContext(sc)
@@ -51,10 +51,10 @@ Executing this Spark job will output the following:
 (32484,CCTCCCAAAGTGCTGGGATTA)
 ```
 
-You don't need to be Scala developer to use ADAM. You could also run the following ADAM CLI command for the same result:
+You do not need to be a Scala developer to use ADAM. You could also run the following ADAM CLI command for the same result:
 
 ```bash
-$ adam-submit count_kmers \
+adam-submit count_kmers \
        /data/NA21144.chrom11.ILLUMINA.adam \
        /data/results.txt 21
 ```
@@ -71,7 +71,7 @@ $ adam-submit count_kmers \
 - Parquet is simply a file format which makes it easy to sync and share data using tools like `distcp`, `rsync`, etc
 - Parquet provides a command-line tool, `parquet.hadoop.PrintFooter`, which reports useful compression statistics 
 
-In the counting k-mers example above, you can see there is a defined *predicate* and *projection*. The *predicate* allows rapid filtering of rows while a *projection* allows you to efficiently materialize only specific columns for analysis. For this k-mer counting example, we filter out any records that are not mapped or have a `MAPQ` less than 20 using a `predicate` and only materialize the `Sequence`, `ReadMapped` flag and `MAPQ` columns and skip over all other fields like `Reference` or `Start` position, e.g.
+In the counting k-mers example above, you can see that there is a defined *predicate* and *projection*. The *predicate* allows rapid filtering of rows while a *projection* allows you to efficiently materialize only specific columns for analysis. For this k-mer counting example, we filter out any records that are not mapped or have a `MAPQ` less than 20 using a `predicate` and only materialize the `Sequence`, `ReadMapped` flag and `MAPQ` columns and skip over all other fields like `Reference` or `Start` position, e.g.
 
 Sequence| ReadMapped | MAPQ | ~~Reference~~ | ~~Start~~ | ...
 --------|------------|------|-----------|-------|-------
@@ -81,21 +81,20 @@ TACTGAA | true | 30 | ~~chrom1~~ | ~~34232~~ | ...
 
 ## Apache Avro
 
-
 - Apache Avro is a data serialization system ([http://avro.apache.org](http://avro.apache.org))
 - All Big Data Genomics schemas are published at [https://github.com/bigdatagenomics/bdg-formats](https://github.com/bigdatagenomics/bdg-formats)
 - Having explicit schemas and self-describing data makes integrating, sharing and evolving formats easier
 
 Our Avro schemas are directly converted into source code using Avro tools. Avro supports a number of computer languages. ADAM uses Java; you could 
-just as easily use this Avro IDL description as the basis for a Python project. Avro currently supports c, c++, csharp, java, javascript, php, python and ruby. 
+just as easily use this Avro IDL description as the basis for a Python project. Avro currently supports C, C++, C#, Java, JavaScript, PHP, Python and Ruby. 
 
 ## More than k-mer counting
 
-ADAM does much more than just k-mer counting. Running the ADAM CLI without arguments or with `--help` will display available commands, e.g.
+ADAM does much more than just k-mer counting. Running the ADAM CLI without arguments or with `--help` will display available commands.
 
+```bash
 $ adam-submit
 
-```
        e         888~-_          e             e    e
       d8b        888   \        d8b           d8b  d8b
      /Y88b       888    |      /Y88b         d888bdY88b
@@ -130,9 +129,9 @@ PRINT
                 view : View certain reads from an alignment-record file.
 ```
 
-You can learn more about a command, by calling it without arguments or with `--help`, e.g.
+You can learn more about a command, by calling it without arguments or with `--help`.
 
-```
+```bash
 $ adam-submit transformAlignments
 Argument "INPUT" is required
  INPUT                                                           : The ADAM, BAM or SAM file to apply the transforms to
@@ -186,9 +185,7 @@ Argument "INPUT" is required
 
 The ADAM transformAlignments command allows you to mark duplicates, run base quality score recalibration (BQSR) and other pre-processing steps on your data.
 
-There are also a number of projects built on ADAM, e.g.
+There are also a number of projects built on ADAM:
 
 - [Avocado](https://github.com/bigdatagenomics/avocado) is a variant caller built on top of ADAM for germline and somatic calling
-- [Mango](https://github.com/bigdatagenomics/mango) a library for visualizing large scale genomics data with interactive latencies
-
-
+- [Mango](https://github.com/bigdatagenomics/mango) is a library for visualizing large scale genomics data with interactive latencies
diff --git a/docs/source/02_installation.md b/docs/source/02_installation.md
@@ -9,10 +9,10 @@ installed in order to build ADAM.
 > 1.6.3. To build for Spark 2, run the `./scripts/move_to_spark2.sh` script.
 
 ```bash
-$ git clone https://github.com/bigdatagenomics/adam.git
-$ cd adam
-$ export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m"
-$ mvn clean package -DskipTests
+git clone https://github.com/bigdatagenomics/adam.git
+cd adam
+export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m"
+mvn clean package -DskipTests
 ```
 Outputs
 ```
@@ -43,16 +43,16 @@ alias adam-shell="${ADAM_HOME}/bin/adam-shell"
 
 `$ADAM_HOME` should be the path to where you have checked ADAM out on your local filesystem. 
 The first alias should be used for running ADAM jobs that operate locally. The latter two aliases 
-call scripts that wrap the `spark-submit` and `spark-shell` commands to set up ADAM. You'll need
+call scripts that wrap the `spark-submit` and `spark-shell` commands to set up ADAM. You will need
 to have the Spark binaries on your system; prebuilt binaries can be downloaded from the
 [Spark website](http://spark.apache.org/downloads.html). Our [continuous integration setup](
 https://amplab.cs.berkeley.edu/jenkins/job/ADAM/) builds ADAM against Spark versions 1.6.1 and 2.0.0,
 Scala versions 2.10 and 2.11, and Hadoop versions 2.3.0 and 2.6.0.
 
-Once this alias is in place, you can run ADAM by simply typing `adam-submit` at the commandline, e.g.
+Once this alias is in place, you can run ADAM by simply typing `adam-submit` at the command line.
 
 ```bash
-$ adam-submit
+adam-submit
 ```
 
 ## Building for Python {#python-build}

diff --git a/docs/source/30_running_example.md b/docs/source/30_running_example.md
@@ -2,10 +2,10 @@
 
 Once you have data converted to ADAM, you can gather statistics from the ADAM
 file using [`flagstat`](#flagstat). This command will output stats identically
-to the samtools `flagstat` command, e.g.
+to the samtools `flagstat` command.
 
 ```bash
-$ ./bin/adam-submit flagstat NA12878_chr20.adam
+./bin/adam-submit flagstat NA12878_chr20.adam
 ```
 Outputs:
 ```
@@ -22,11 +22,11 @@ Outputs:
 105812 + 0 with mate mapped to a different chr (mapQ>=5)
 ```
 
-In practice, you'll find that the ADAM `flagstat` command takes orders of magnitude less
-time than samtools to compute these statistics. For example, on a MacBook Pro the command 
-above took 17 seconds to run while `samtools flagstat NA12878_chr20.bam` took 55 secs.
+In practice, you will find that the ADAM `flagstat` command takes orders of magnitude less
+time than samtools to compute these statistics. For example, on a MacBook Pro, the command 
+above took 17 seconds to run while `samtools flagstat NA12878_chr20.bam` took 55 seconds.
 On larger files, the difference in speed is even more dramatic. ADAM is faster because
-it's multi-threaded and distributed and uses a columnar storage format (with a projected
+it is multi-threaded, distributed and uses a columnar storage format (with a projected
 schema that only materializes the read flags instead of the whole read). 
 
 ## Running on a cluster