Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rough spike at adding Java driver #15

Merged
merged 6 commits into from
Sep 3, 2015
Merged

Rough spike at adding Java driver #15

merged 6 commits into from
Sep 3, 2015

Conversation

catsby
Copy link
Contributor

@catsby catsby commented Sep 1, 2015

⚠️ 🚧
This is a spike for adding a java driver, copied-paste and butchered from the standard exec driver.

It's pretty simple and naive. I submit now for feedback on the approach, or to catch any major flaws in my simple thinking

Does:

  • accepts a jar file from a remote location (http(s)://)
  • downloads into the AllocDir
  • executes with java -jar <file> [args]

Assumes:

  • publicly accessible jar file for download
  • file is http(s) (not ftp, ssh)

The jar file referenced should be public to anyone (at time of writing). It's a simple app the outputs "Hi" every few seconds. The source can be found here:

TODOs

  • Abstract out the http fetching into a Fetcher interface of sorts
  • Test with a batch-like Java app, that does not run indefinitely, to confirm exits etc
  • Probably several other things

At this point, I need feedback on what I'm doing right/wrong 😐


func (d *JavaDriver) Fingerprint(cfg *config.Config, node *structs.Node) (bool, error) {
// We can always do a fork/exec
node.Attributes["driver.java"] = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test for this with java -version. I think it's also useful to capture the version name. This might be

java version "1.8.0_51"
openjdk version "1.8.0_45-internal"

And whether or not it's oracle java, which I think we can detect from Java(TM) or Hotspot(TM) in the version string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp, totally glossed over this 😱

I updated it in a337df4

select {
case <-h.doneCh:
return nil
case <-time.After(5 * time.Second):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can fix this in this PR since we don't really have job configs yet (or do we?), but I think this should be configurable to give the app a chance to do cleanup or checkpointing. Curious: How would we pass configuration into this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but I also think we should be firm here.
I think burden of responding to interrupt and cleaning up should fall on the developer, and I think 5 seconds should be ok, but maybe 10 seconds is more reasonable (more Heroku-isms coming out of me 😄 )

If not, I don't see where else this leads to except for us accepting a "shut down" command string that we exec, and then we're exec'ing arbitrary strings 😱

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a chance to do cleanup or checkpointing

Checkpointing for sure. Cleanup though seems like it's a thing happening implicitly with the Driver. For Java, we're using the java command and all that's in a VM. Same with Docker, right?

Unless you're talking about cleanup as in "get things in order before we go away". In which case, I think 10 seconds is still good..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you're talking about cleanup as in "get things in order before we go away". In which case, I think 10 seconds is still good..

Yes exactly. Sorry about the confusing wording. EC2 gives you a 120 second warning. I know that there are use-cases to extend this to a much longer duration (1 hour, for example). Also there are probably different thresholds appropriate for different scenarios:

  • Out of resources, so pre-empting something (terminate immediately)
  • Upgrading (wait for currently running app to finish doing its work)

For example, at Riot we might have a game server that needs to be taken down for upgrade or hotfix, but we want to let the current games finish (~1 hour) without booting people out. In the mean time the game server will receive the sigint and will simply not accept new games.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think there are a bunch of future enhancements possible. I don't think allowing an arbitrary drain time will work, since it blocks the ability of the scheduler to make progress. It also can cause priority inversion, if a low priority job refuses to drain and a high priority job cannot be scheduled. Probably 5 second max is fine for now, we can address in 0.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can address in 0.2

👍

@armon
Copy link
Member

armon commented Sep 3, 2015

@catsby Looking good! I agree with the TODOs, we can probably abstract out some of the helper things like downloading, waiting for PID, etc.

Where the rubber hits the road on a lot of these will be the isolation mechanisms. Those I think we want to put in helper libs and share as much as possible.

Chroot to restrict disk, various controls for CPU etc, and those will improve over time.

@catsby catsby mentioned this pull request Sep 3, 2015
@catsby catsby added this to the v0.1 milestone Sep 3, 2015
@catsby catsby added the Driver label Sep 3, 2015
@cbednarski
Copy link
Contributor

This looks ready to me! We can iterate details but let's mainline this so we can do x-plat testing and such. 👍

catsby added a commit that referenced this pull request Sep 3, 2015
Rough spike at adding Java driver
@catsby catsby merged commit ff0383b into master Sep 3, 2015
@catsby catsby deleted the b-java-driver branch September 3, 2015 19:31
grembo pushed a commit to grembo/nomad that referenced this pull request Jan 30, 2022
Improve logging on pot create fail and pot destroy failure
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants