Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jk/in parallel #3

Closed
wants to merge 7 commits into from
Closed

Jk/in parallel #3

wants to merge 7 commits into from

Conversation

jkeiser
Copy link
Contributor

@jkeiser jkeiser commented Feb 3, 2014

No description provided.

@sethvargo
Copy link

I prefer group to in_parallel with an optional hash of params:

group(parallel: true, threads: 10) do
  group(serial: true) do
  end
end

Also, is there value in this approach, or does it make more sense to have async as a top-level field on resources? Something like:

remote_file '/var/foo' do
  source '...'
  async true
  notifies :restart, 'service[bar]'
end

I think people are also more familiar with the word "async".

@jkeiser
Copy link
Contributor Author

jkeiser commented Feb 3, 2014

I like the group syntax.

async true in resources could work as long as Chef is collecting groups of consecutive async resources, but group provides a much more explicit beginning and end to the collection.

@sethvargo
Copy link

@jkeiser they are different use cases (I think):

template 'foo'

group(parallel: true) do
  thing_1
  thing_2
end

template 'bar'

In my mind, this says, create the foo template, run (thing_1 and thing_2) in parallel, create the bar template.

Whereas:

template 'foo'

thing_1 do
  async true
end

thing_2 do
  async true
end

template 'bar'

Says, create the foo template, schedule thing_1 and your earliest connivence, schedule thing_2 and your earliest connivence, create the bar template.

@jkeiser
Copy link
Contributor Author

jkeiser commented Feb 4, 2014

So we're clear, under an async true regime, this would never run any of the resources in parallel, correct?

template 'foo'

thing_1 do
  async true
end

template 'bar'

thing_2 do
  async true
end

@sethvargo
Copy link

Depends on why you mean by "parallel". If you mean "at the same exact time", no. If you mean "in a non-blocking thread", yes.

@sethvargo
Copy link

It makes more sense with remote_file:

remote_file '/var/file' do
  source 'really-really-big-file'
  async true
  notifies :restart, 'service[foo]', :immediately
end

package 'foo'
package 'bar'

This will queue the remote file for download, but it won't wait for the download to complete. That runs in a background thread and then executes notifications upon completion.

@jkeiser
Copy link
Contributor Author

jkeiser commented Feb 4, 2014

OK, now I see what you mean by async. They are indeed different use cases. There is a use for parallel_group (sometimes you want to create your three database servers before you create any web servers), and I can see uses for async too. I'll leave async for a later time, but I think it's a great idea.

@aabes
Copy link

aabes commented Apr 19, 2014

Dotting down a couple of the thoughts we chatted about during chefconf, to see if any resonate:

a) support "m out of n" semantics. sample use case: provision a cluster of 30 nodes, but accept a first go where only 20 are successful ( and possibly converge in subsequent runs towards the desired 30).

b) allow for compensating activities. use case: if you provision 10 nodes, but wanted 30, define a block that will handle whatever partial successes have already been performed. [1]

(
as a side note, these are inspired by BPEL for-each activities and compensation handlers. BPEL was created to handle coordination among asynchronously interacting web-services. IMHO, While most of BPEL is not relevant here, these 2 concepts can be pretty useful,
)

[1] Compensating transactions: https://en.wikipedia.org/wiki/Compensating_transaction

@colindean
Copy link

Per conversation, I'm 👍 on some kind of parallelization, because I'd really like my recipes to execute in parallel when possible.

@adamhjk
Copy link
Contributor

adamhjk commented Jul 14, 2014

@jkeiser - this RFC still relevant? I think I'm +1 on both this and on async.

@jkeiser
Copy link
Contributor Author

jkeiser commented Sep 1, 2014

@adamhjk yep, this is still totally relevant. As you are +1, I'll merge.

@aabes totally there on the enhancements. I think there are a number of potential failure conditions we can handle more gracefully. (Though I think the default needs to be 100% completion to succeed.)

@jkeiser
Copy link
Contributor Author

jkeiser commented Sep 13, 2014

Just realized this needs to be in a community meeting to merge @adamhjk

@Tech356
Copy link

Tech356 commented Sep 17, 2014

I was excited to find and read this RFC. Currently remote_file resources take up the bulk of the time for my runs and making them run in parallel would be a huge speedup.

The specific use case I thought about was using async on my remote_file resources so they would start downloading at the start of the run. When the chef-client gets to a remote_file resource in a recipe it checks if that resource has completed. If it has completed the chef-client continues, if it has not then the chef-client halts and waits for the resource to complete.

This would parallelize all remote_file resources regardless of which recipe they are in and would not break dependencies on those resources.

From my understanding the current proposed version of async would start the resource when it comes to it in the recipe and then the chef-client would continue on to the next resource. The only way to depend on the async resource would be to have a notifies, which in my use cases, would not work since I need the dependent resource to run every time.

The group format looks useful but only for resources in the same recipe. Generally I only have one remote_file per recipe which would prevent me from taking advantage of the group format.

@jeremyolliver
Copy link

@Tech356 That's my interpretation as well - however I think that's a necessary constraint. There are plenty of legitimate reasons you couldn't start immediately (perhaps download directory is created via recipe or package install partway through the chef run, or the user that will own the file) - you can't expect a set of assumptions to be figured out magically for you, you need to optimise explicitly.

I would have suggested running the remote_file during compile time to get them to trigger early, while still keeping the logic with relevant cookbooks, but how would running the remote_file resources at compile time work? is parallelism or async still possible at that point?

@Tech356
Copy link

Tech356 commented Sep 18, 2014

@jeremyolliver I agree that there are constraints that are necessary, but in my case the remote files only depend on node attributes which are resolved during compile.

I was thinking more along the lines of adding a couple options to async to define how it performs. If the remote_files don't have dependencies (or all the dependencies are resolved at compile time) then one could add a flags like:

# starts at convergence and then waits at the resource declaration for the resource to be complete 
async true, :at_run_start, :wait_at_declaration

# starts at convergence and just runs with no blocking   
async true, :at_run_start, :no_wait

# starts are declaration and waits till finished. Not async at all.
async true, :at_declaration, :wait_at_declaration

# starts at declaration and just runs with no blocking    
async true, :at_declaration, :no_wait

The last one is like what is currently proposed. The first one is like what I am looking for. Maybe there are other options that might be useful to others.

@thommay
Copy link
Collaborator

thommay commented Mar 4, 2015

@adamhjk @jkeiser any reason we can't get this merged tomorrow?

@thommay
Copy link
Collaborator

thommay commented Mar 9, 2015

👍

@jkeiser
Copy link
Contributor Author

jkeiser commented Mar 9, 2015

I won't be here until next week :)

@nathenharvey
Copy link
Contributor

This PR is currently on the agenda for our next IRC developers' meeting.

Please let me know if it gets merged or otherwise closed before then so that the agenda can be updated.

Thanks!

@nathenharvey
Copy link
Contributor

@jkeiser can you please add the appropriate copyright notice to this RFC before our meeting tomorrow?

## Copyright

This work is in the public domain. In jurisdictions that do not allow for this,
this work is available under CC0. To the extent possible under law, the person
who associated CC0 with this work has waived all copyright and related or
neighboring rights to this work.

@lamont-granquist
Copy link
Contributor

👍

@jonlives
Copy link
Contributor

Once @jkeiser has added a clarification that "future things" will require a new RFC, this is approved for merge @chef/rfc-editors

@btm
Copy link
Contributor

btm commented Mar 26, 2015

Squashed and merged in 5e13978. This was accepted as RFC044.

@btm btm closed this Mar 26, 2015
@btm btm deleted the jk/in_parallel branch March 26, 2015 23:44
jonlives pushed a commit that referenced this pull request Oct 6, 2017
# This is the 1st commit message:

This commit proposes an RFC to replace the existing RFC-075 (Multiple Policy Files and Teams)

Signed-off-by: Jon Cowie <jcowie@chef.io>

# This is the commit message #2:

More specification details added

Signed-off-by: Jon Cowie <jcowie@chef.io>

# This is the commit message #3:

Add more specification details and problems section

Signed-off-by: Jon Cowie <jonlives@gmail.com>

# This is the commit message #4:

Add path parameter to git source

Signed-off-by: Jon Cowie <jonlives@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.