Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update usage docs running for EC2 and CDH #493

Closed
arahuja opened this issue Nov 19, 2014 · 13 comments
Closed

Update usage docs running for EC2 and CDH #493

arahuja opened this issue Nov 19, 2014 · 13 comments
Assignees
Milestone

Comments

@arahuja
Copy link
Contributor

arahuja commented Nov 19, 2014

  • Update to use adam-submit
  • Update CDH docs to have YARN and standalone information
@jpdna
Copy link
Member

jpdna commented Oct 7, 2016

Regarding EC2, the existing (out of date) documentation is here:
https://github.com/bigdatagenomics/adam/blob/master/docs/source/40_running_on_EC2.md

To confirm, do we want the updated ADAM EC2 quickstart to deploy Spark via spark-ec2 scripts as described at?:
http://spark.apache.org/docs/1.6.1/ec2-scripts.html

or do we want to provide cgcloud/Toil or other instructions?

@jpdna
Copy link
Member

jpdna commented Oct 7, 2016

So Spark's spark-ec2 script as of Spark 1.6.1 doesn't work for us, unless we modify it, due to it using java 7 and we need java 8.

I still feel like cgcloud/toil is a bit heavy weight for some users, and perhaps we should modify spark-ec2 script, or make an equivalent, that can be used for simple ADAM deployment on EC2.

But - if you think we should only document a cgcloud/toil path for ADAM on EC2 that is fine too.
What do you think @fnothaft and @heuermh

@fnothaft
Copy link
Member

fnothaft commented Oct 7, 2016

So Spark's spark-ec2 script as of Spark 1.6.1 doesn't work for us, unless we modify it, due to it using java 7 and we need java 8.

Officially, the spark-ec2 script is beta/unsupported, so we shouldn't use it anyways.

I still feel like cgcloud/toil is a bit heavy weight for some users, and perhaps we should modify spark-ec2 script, or make an equivalent, that can be used for simple ADAM deployment on EC2.

I would do cgcloud-spark, which is fairly battle tested, well supported by Hannes + co, etc. I would definitely not modify the spark-ec2 script or roll our own. There's a non-trivial amount of work needed to get all of this working, and you run into a shocking number of bugs and weird edge cases.

If time allowed, I would also document ADAM on Elastic Spark, but I would not plan on that as a primary documented route, since the cost model for Elastic Spark is different. If we are interested in that, I can connect you to some folks at AWS.

@jpdna
Copy link
Member

jpdna commented Oct 7, 2016

It looks like cgcloud-spark is on Spark 1.5.2
and java7.

We should have java8 and Spark 1.6.1 to match
the next release and current master

Should we talk with Hannes about updating spark_box ?

Also - what is the interaction here with the joda time stuff?

@fnothaft
Copy link
Member

fnothaft commented Oct 7, 2016

I'm OK with staying on Spark 1.5.2 or moving to 1.6.1. It's a one line change. I'll open a PR to move to Java 8.

Also - what is the interaction here with the joda time stuff?

Conductor is impacted by the joda time/Java 8 mismatch, but ADAM isn't. There's a WAR at BD2KGenomics/cgl-docker-lib#187.

@fnothaft
Copy link
Member

fnothaft commented Oct 7, 2016

Opened BD2KGenomics/cgcloud#231.

@jpdna
Copy link
Member

jpdna commented Oct 13, 2016

Should I go ahead and try out: https://github.com/fnothaft/cgcloud/tree/issues/231-spark-162-java8
?

@fnothaft
Copy link
Member

@jpdna yes, that'd be great!

@jpdna
Copy link
Member

jpdna commented Oct 16, 2016

This: BD2KGenomics/cgcloud#234
has me stuck, must be something dumb I'm missing as this worked for me several times in past, including last week on different machine. My prob is same on current cgcloud from pip install cgcloud-core - so not due to using your branch Frank. Let me know any thoughts...

@jpdna
Copy link
Member

jpdna commented Oct 16, 2016

has me stuck,

I unstuck it, I needed new IAM key

@jpdna
Copy link
Member

jpdna commented Nov 16, 2016

blocked on this cgcloud error:
BD2KGenomics/cgcloud#245

@fnothaft
Copy link
Member

EC2 docs resolved by #1279. CDH docs are forthcoming. @jpdna do you plan to tackle the CDH docs? If not, I'll knock them out tomorrow.

fnothaft added a commit to fnothaft/adam that referenced this issue Dec 4, 2016
heuermh pushed a commit that referenced this issue Dec 6, 2016
@jpdna
Copy link
Member

jpdna commented Dec 6, 2016

CDH docs

I'm not sure what the CDH docs should include.
Basically if you have CDH installed, then if you get an ADAM distribution, it will just work AFAIK.
I guess I'd show the example of pointing to yarn-client:

adam-submit --master yarn-client --num-executors 10 --executor-cores 2 --executor-memory 20g -- transform hdfs://hdfsmaster/data/input/file1.sam hdfs://hdfsmaster/data/output/file1.adam

If you have an idea of what it should include ands its faster for you to do then go ahead, otherwise point me in right direction and I'll look at today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants