Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CGCloud deploy docs #1279

Merged
merged 1 commit into from
Nov 18, 2016
Merged

CGCloud deploy docs #1279

merged 1 commit into from
Nov 18, 2016

Conversation

jpdna
Copy link
Member

@jpdna jpdna commented Nov 17, 2016

No description provided.

@jpdna jpdna changed the title CGCloud deply docs CGCloud deploy docs Nov 17, 2016
Copy link
Member

@fnothaft fnothaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small nits, otherwise looks great! Thanks @jpdna!


#### Launch a cluster

Spin up a Spark cluster with one master and two slave nodes with the command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer leader/worker to master/slave.

Also, I would note in the documents that you're setting up a cluster where the workers are m3.large. Somewhat obvious, I concede, but it's useful to note that you can set a different leader node type. Also, doesn't this command need you to provide a cluster name?

export MY_KEYFILE="?????.pem"
export MY_CLUSTER_NAME="adam_cluster"
export MY_CLUSTER_SIZE=10
[CGCloud](https://github.com/BD2KGenomics/cgcloud) lets you automate the creation, management and provisioning of VMs and clusters of VMs in Amazon EC2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap lines at 80 characters throughout?

```
cgcloud ssh spark-master
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra whitespace.

Export the path to your `spark-ec2` script,
To use the ADAM application on top of Spark, we need to download and install ADAM on `spark-master`
From the command line on `spark-master` download a release from:
https://github.com/bigdatagenomics/adam/releases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing period at EOL.

alias spark_ec2_login="$SPARK_EC2_SCRIPT -k $MY_KEYPAIR -i $MY_KEYFILE login $MY_CLUSTER_NAME"
The typical flow of data to and from your ADAM application on EC2 will be:
- Upload data to AWS S3
- Use Conductor (described below) or otherwise transfer from S3 to the HDFS on your cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an anchor link {#conductor} in the section where conductor is described, and link from here (described below) -> [(described below)](#conductor). This'll make navigation a bit easier.

To transfer large amounts of data back and forth from S3, we suggest using [Conductor](https://github.com/BD2KGenomics/conductor).

Its also possible to directly use AWS S3 as a distributed file system, but with some loss of performance.
( example to be added )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I might drop the example to be added bit and remove the paragraph break between this paragraph and the conductor paragraph.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1622/
Test PASSed.

@jpdna jpdna force-pushed the cgcloud_doc branch 4 times, most recently from e8bed96 to 4d4ab71 Compare November 17, 2016 20:43
@jpdna
Copy link
Member Author

jpdna commented Nov 17, 2016

ready for further review or merge

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1623/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1624/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1625/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1626/
Test PASSed.

Copy link
Member

@fnothaft fnothaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small nits, otherwise LGTM!

alias spark_ec2_destroy="$SPARK_EC2_SCRIPT destroy $MY_CLUSTER_NAME"
alias spark_ec2_login="$SPARK_EC2_SCRIPT -k $MY_KEYPAIR -i $MY_KEYFILE login $MY_CLUSTER_NAME"
Spin up a Spark cluster named `cluster1` with one leader and two workers nodes
of instance type `m3.large`with the command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between words in m3.largewith

#### Install ADAM

To use the ADAM application on top of Spark, we need to download and install
ADAM on `spark-master`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

period at EOL

To use the ADAM application on top of Spark, we need to download and install
ADAM on `spark-master`
From the command line on `spark-master` download a release from:
https://github.com/bigdatagenomics/adam/releases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Punctuation at EOL? Maybe remove paragraph break.

As of this writing, CGCloud supports Spark 1.6.2, not Spark 2.x, so download
the Spark 1.x Scala2.10 release:
```
wget https://repo1.maven.org/maven2/org/bdgenomics/adam/\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the \ed linebreak here.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1627/
Test PASSed.

cgcloud doc edits

edits to cgcloud docs

more cgcloud edits

more cgcloud docs edits

more cgcloud docs edits

edit cgcloud docs

more cgcloud doc edits
@jpdna
Copy link
Member Author

jpdna commented Nov 17, 2016

ready again for more review or merge

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1628/
Test PASSed.

@fnothaft fnothaft merged commit 20a0eb2 into bigdatagenomics:master Nov 18, 2016
@fnothaft
Copy link
Member

Merged! Thanks @jpdna!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants