-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rough spike at adding Java driver #15
Conversation
|
||
func (d *JavaDriver) Fingerprint(cfg *config.Config, node *structs.Node) (bool, error) { | ||
// We can always do a fork/exec | ||
node.Attributes["driver.java"] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can test for this with java -version
. I think it's also useful to capture the version name. This might be
java version "1.8.0_51"
openjdk version "1.8.0_45-internal"
And whether or not it's oracle java, which I think we can detect from Java(TM)
or Hotspot(TM)
in the version string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derp, totally glossed over this 😱
I updated it in a337df4
select { | ||
case <-h.doneCh: | ||
return nil | ||
case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can fix this in this PR since we don't really have job configs yet (or do we?), but I think this should be configurable to give the app a chance to do cleanup or checkpointing. Curious: How would we pass configuration into this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but I also think we should be firm here.
I think burden of responding to interrupt and cleaning up should fall on the developer, and I think 5 seconds should be ok, but maybe 10 seconds is more reasonable (more Heroku-isms coming out of me 😄 )
If not, I don't see where else this leads to except for us accepting a "shut down" command string that we exec, and then we're exec'ing arbitrary strings 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a chance to do cleanup or checkpointing
Checkpointing for sure. Cleanup though seems like it's a thing happening implicitly with the Driver. For Java, we're using the java
command and all that's in a VM. Same with Docker, right?
Unless you're talking about cleanup as in "get things in order before we go away". In which case, I think 10 seconds is still good..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you're talking about cleanup as in "get things in order before we go away". In which case, I think 10 seconds is still good..
Yes exactly. Sorry about the confusing wording. EC2 gives you a 120 second warning. I know that there are use-cases to extend this to a much longer duration (1 hour, for example). Also there are probably different thresholds appropriate for different scenarios:
- Out of resources, so pre-empting something (terminate immediately)
- Upgrading (wait for currently running app to finish doing its work)
For example, at Riot we might have a game server that needs to be taken down for upgrade or hotfix, but we want to let the current games finish (~1 hour) without booting people out. In the mean time the game server will receive the sigint and will simply not accept new games.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think there are a bunch of future enhancements possible. I don't think allowing an arbitrary drain time will work, since it blocks the ability of the scheduler to make progress. It also can cause priority inversion, if a low priority job refuses to drain and a high priority job cannot be scheduled. Probably 5 second max is fine for now, we can address in 0.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can address in 0.2
👍
@catsby Looking good! I agree with the TODOs, we can probably abstract out some of the helper things like downloading, waiting for PID, etc. Where the rubber hits the road on a lot of these will be the isolation mechanisms. Those I think we want to put in helper libs and share as much as possible. Chroot to restrict disk, various controls for CPU etc, and those will improve over time. |
This looks ready to me! We can iterate details but let's mainline this so we can do x-plat testing and such. 👍 |
Rough spike at adding Java driver
Improve logging on pot create fail and pot destroy failure
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This is a spike for adding a
java
driver, copied-paste and butchered from the standardexec
driver.It's pretty simple and naive. I submit now for feedback on the approach, or to catch any major flaws in my simple thinking
Does:
http(s)://
)AllocDir
java -jar <file> [args]
Assumes:
http(s)
(notftp
,ssh
)The
jar
file referenced should be public to anyone (at time of writing). It's a simple app the outputs "Hi" every few seconds. The source can be found here:TODOs
Fetcher
interface of sortsAt this point, I need feedback on what I'm doing right/wrong 😐