Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

[WIP] Task Driver Plugin Support for Nomad 0.9.x+ #17

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

justenwalker
Copy link
Contributor

@justenwalker justenwalker commented Sep 3, 2019

Add support for TaskDriver plugin

Change Log:

  • Split out the command into separate command folders for stand-alone and plugin
  • Metrics refactor to allow performance counters to be read without needing prometheus
  • Refactor of the container/process API to make it compatible with the requirements of the Nomad TaskDriver plugin interface
  • Implement TaskDriver features

Known Issues:

  • When a task is restarted, it will fail to connect to the windows FIFO and the allocation will enter a failed state.
  • When the Nomad Agent is restarted, the task plugin will not load correctly. It's unclear why this is the case. Exhausted most avenues of debugging within the plugin itself, so might have bake a custom Nomad agent with additional debug logging to see where it's getting hung up.

- Keep track of all the performance counters so they can be queried
later by other packages such as the Nomad Task Driver plugin
- Wait semantics are more managable
- Containers are started by RunContained returning a running container
- Can send shutdown signal to container explicitly via function call
with timeout/delay before hard kill
- Can wait for a container to be done and return results
@justenwalker justenwalker self-assigned this Sep 3, 2019
@justenwalker justenwalker added the enhancement New feature or request label Sep 3, 2019
@justenwalker justenwalker changed the title Task Driver Plugin Support for Nomad 0.9.x+ [WIP] Task Driver Plugin Support for Nomad 0.9.x+ Oct 14, 2019
@ddreier
Copy link
Contributor

ddreier commented Oct 20, 2019

@justenwalker you may find the second known issue related to this: hashicorp/go-plugin#125

We ran into it when developing our own Task Driver plugin in C# for IIS.

@justenwalker
Copy link
Contributor Author

@justenwalker you may find the second known issue related to this: hashicorp/go-plugin#125

We ran into it when developing our own Task Driver plugin in C# for IIS.

Thanks for the tip! Looks like this change is at hashicorp/go-plugin@8091134

Unfortunately, this means it isn't in the 0.9.x line; and it also didn't make it into v0.10.0-rc1 either.

However, it is in master now; so perhaps it will be in the GA version of v0.10.0 when the release gets cut. Unfortunately, this means that the plugin will likely not work correctly on 0.9.x unless this update gets back-ported and they release 0.9.7.

Still, this is a good place to start debugging, to see if the issue still persists on Nomad HEAD.

@ddreier
Copy link
Contributor

ddreier commented Oct 22, 2019

Yeah, I should have mentioned that it was slated for Nomad 0.10.1.

We were only able to reproduce the issue when Nomad is not stopped cleanly. Not sure if you're experiencing different. For us, the infrequency of that happening in our environment makes the risk worth the reward. But this is a different scenario, so I understand being more cautious. :)

@justenwalker
Copy link
Contributor Author

We were only able to reproduce the issue when Nomad is not stopped cleanly. Not sure if you're experiencing different.

In my experience, it is unable to start the plugin regardless of how clean the plugin is shut down; and it persists across subsequent restarts until the local client state is wiped; so that's a pretty serious issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants