-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CGCloud deploy docs #1279
CGCloud deploy docs #1279
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small nits, otherwise looks great! Thanks @jpdna!
|
||
#### Launch a cluster | ||
|
||
Spin up a Spark cluster with one master and two slave nodes with the command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer leader/worker to master/slave.
Also, I would note in the documents that you're setting up a cluster where the workers are m3.large
. Somewhat obvious, I concede, but it's useful to note that you can set a different leader node type. Also, doesn't this command need you to provide a cluster name?
export MY_KEYFILE="?????.pem" | ||
export MY_CLUSTER_NAME="adam_cluster" | ||
export MY_CLUSTER_SIZE=10 | ||
[CGCloud](https://github.com/BD2KGenomics/cgcloud) lets you automate the creation, management and provisioning of VMs and clusters of VMs in Amazon EC2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you wrap lines at 80 characters throughout?
``` | ||
cgcloud ssh spark-master | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extra whitespace.
Export the path to your `spark-ec2` script, | ||
To use the ADAM application on top of Spark, we need to download and install ADAM on `spark-master` | ||
From the command line on `spark-master` download a release from: | ||
https://github.com/bigdatagenomics/adam/releases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing period at EOL.
alias spark_ec2_login="$SPARK_EC2_SCRIPT -k $MY_KEYPAIR -i $MY_KEYFILE login $MY_CLUSTER_NAME" | ||
The typical flow of data to and from your ADAM application on EC2 will be: | ||
- Upload data to AWS S3 | ||
- Use Conductor (described below) or otherwise transfer from S3 to the HDFS on your cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an anchor link {#conductor}
in the section where conductor is described, and link from here (described below)
-> [(described below)](#conductor)
. This'll make navigation a bit easier.
To transfer large amounts of data back and forth from S3, we suggest using [Conductor](https://github.com/BD2KGenomics/conductor). | ||
|
||
Its also possible to directly use AWS S3 as a distributed file system, but with some loss of performance. | ||
( example to be added ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I might drop the example to be added
bit and remove the paragraph break between this paragraph and the conductor paragraph.
Test PASSed. |
e8bed96
to
4d4ab71
Compare
ready for further review or merge |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small nits, otherwise LGTM!
alias spark_ec2_destroy="$SPARK_EC2_SCRIPT destroy $MY_CLUSTER_NAME" | ||
alias spark_ec2_login="$SPARK_EC2_SCRIPT -k $MY_KEYPAIR -i $MY_KEYFILE login $MY_CLUSTER_NAME" | ||
Spin up a Spark cluster named `cluster1` with one leader and two workers nodes | ||
of instance type `m3.large`with the command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between words in m3.large
with
#### Install ADAM | ||
|
||
To use the ADAM application on top of Spark, we need to download and install | ||
ADAM on `spark-master` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
period at EOL
To use the ADAM application on top of Spark, we need to download and install | ||
ADAM on `spark-master` | ||
From the command line on `spark-master` download a release from: | ||
https://github.com/bigdatagenomics/adam/releases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Punctuation at EOL? Maybe remove paragraph break.
As of this writing, CGCloud supports Spark 1.6.2, not Spark 2.x, so download | ||
the Spark 1.x Scala2.10 release: | ||
``` | ||
wget https://repo1.maven.org/maven2/org/bdgenomics/adam/\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the \
ed linebreak here.
Test PASSed. |
cgcloud doc edits edits to cgcloud docs more cgcloud edits more cgcloud docs edits more cgcloud docs edits edit cgcloud docs more cgcloud doc edits
ready again for more review or merge |
Test PASSed. |
Merged! Thanks @jpdna! |
No description provided.