integrate Intel TBB #1376

wds15 · 2019-10-01T07:39:44Z

Summary

This PR integrate the Intel TBB with stan-math. Specifically this PR ensures the TBB's threadpool initialization and it switches map_rect C++11 threads implementation over to use the tbb::parallel_for (which has scheduling and a more efficient threadpool).

In order to keep the Intel TBB optional in the next release, the TBB code is only used whenever STAN_THREADS is set. This makes the TBB only mandatory whenever threads are being used. The Intel TBB will become mandatory in the near future regardless of the STAN_THREADS use (which has licensing implications for GPL-2 projects using Stan-math).

The initialisation of the TBB happens in two places:

General initialisation of the maximal concurrency level in the TBB threadpool. This is handled by the new init_threadpool_tbb function defined in stan/math/prim/core/init_threadpool_tbb.hpp. The number of threads used is defined in the STAN_NUM_THREADS environment variable.
For all TBB worker threads, the AD tape is initialised through an observer which is registered with the Intel TBB. Every time a worker thread enters the threadpool, the observer is called and ensures proper initialisation of the AD tape resource automatically. See stan/math/rev/core/init_chainablestack_tbb.hpp.

Note: This PR is a branch from the feature/intel-tbb-lib such that it includes these changes in addition to the actual changes introduced in addition. As reference branch the feature/intel-tbb-lib has been selected to make it easier to follow the actual changes introduced relative to the base branch.

Performance Tests

All reported times are in seconds.

threads	Ubuntu i7 develop	Ubuntu i7 TBB	Ubuntu i7 speedup	Ubuntu AMD TBB	Ubuntu AMD develop	Ubuntu AMD speedup
1	166	156	0.060	166.768	148.131	-0.125
2	91	84	0.077	90.459	155.651	0.418
4	80	45	0.438	49.331	93.949	0.475
8	76	42	0.448	28.109	80.378	0.650
12				22.028	72.776	0.697
14				22.421	72.742	0.697
16				23.166	66.685	0.653
18				23.056	69.001	0.666
24				23.525	80.779	0.709
32				23.047	87.403	0.736

threads	Mac i9 develop	Mac i9 TBB	Mac i9 speedup
1	253	184	0.272
2	145	103	0.290
4	91	64	0.297
6	81	55	0.321

threads	Win i7 develop	Win i7 TBB	Win i7 speedup
1	452	464	-0.027
2	222	230	-0.036
4	148	132	0.108
6	146	116	0.205

Tests

test/unit/math/prim/core/init_threadpool_tbb_test.cpp tests initialisation of the threadpool
test/unit/math/prim/core/init_threadpool_tbb_late_test.cpp tests initialisation of the threadpool in case a c++ user has already initialized the scheduler (in which case nothing happens)
test/unit/math/rev/mat/functor/gradient_test.cpp tests if the threaded gradient code runs fine when using the Intel TBB as backend. This tests if the effects of AD tape initialisation have taken place.

Side Effects

The initialisation of the TBB threadpool is only optional. That is, should client code not call the init_threadpool_tbb function, then the first execution of any TBB code will trigger default initialisation of the TBB threadpool which is the default behaviour of the TBB. In case the client code initialises first the TBB through explicit instantiation of the tbb::task_scheduler_init interface, then a subsequent call to the init_threadpool_tbb has no effect - this is again the default behaviour of the TBB (only the very first call is honoured).

For interfaces it is recommended to make a call to stan::math::init_threadpool_tbb() prior to using Stan-math. This will ensure the proper initialisation of the threadpool and ensure that no more STAN_NUM_THREADS threads are running in the threadpool.

Todo

agreement on design
a few more tests for init_threadpool_tbb
agree on the fate of stan::math::get_num_threads (moved to stan/math/prim/scal/fun/get_num_threads.hpp), see comments within

Checklist

Math issue #(issue number)
Copyright holder: Sebastian Weber

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

…ure/intel-tbb-init

…stable/2017-11-14)

SteveBronder · 2019-10-01T09:41:51Z

stan/math/prim/scal.hpp

@@ -78,6 +79,7 @@
 #include <stan/math/prim/scal/fun/identity_free.hpp>
 #include <stan/math/prim/scal/fun/if_else.hpp>
 #include <stan/math/prim/scal/fun/inc_beta.hpp>
+#include <stan/math/prim/scal/fun/init_threadpool_tbb.hpp>


Is there a reason these sit in prim/scal/fun? Maybe we should have a backend folder where we put both TBB and OpenCL stuff

I am open to suggestions. Another option could be stan/math/core which is motivated by rev/core - but I am really open to where to stick this; I thought prim/scal/fun reflects the right dependency which is why I put it there.

I think core is fine. I imagine there wont be much "TBB core" code outside of this which would justify making another folder. Other TBB code is going to live either inside of Stan code or specific stan math functions right?

Either way this is a review-time detail.

The map_rect which uses the TBB will, of course, stay where it is. So not much other TBB code (if any) will go into stan/math/core. It just needs to be included always.

SteveBronder · 2019-10-01T09:53:10Z

stan/math/prim/scal/fun/get_num_threads.hpp

+// 2. Consider to switch from std::thread::hardware_concurrency() to
+// tbb::task_scheduler_init::default_num_threads() once TBB is
+// mandatory.
+// 3. pull out of internal?


Would you need to use get_num_threads outside of this file? If so then yeah probs

In the future I can imagine that client code will want to find out about STAN_NUM_THREADS and at that point it would be good to have this in the stan::math namespace... but maybe that is not needed and we can keep it where it is.

SteveBronder · 2019-10-01T09:55:12Z

stan/math/prim/scal/fun/get_num_threads.hpp

+inline int get_num_threads(int num_jobs) {
+  int num_threads = 1;
+#ifdef STAN_THREADS
+  const char* env_stan_num_threads = std::getenv("STAN_NUM_THREADS");


Sidenote: We should start thinking about how to add the number of threads and the OpenCL devices so they are flags at runtime

Maybe... not sure. There are pros and cons for both ways (runtime environment vs interface configuration). I am continuing here the system we started for map_rect.

(moving all my comments up to here so that it's easier for others to read the code from the github page)

line 50: personally, especially with multiple if elses, I like having brackets to show leaving and entering a scope

From the docs for hardware_concurrency

Number of concurrent threads supported. If the value is not well defined or not computable, returns 0

Should we have a check here and if the returned value is zero return 1?

Also does this return the number of logical or physical cores? I think previously we saw that hyperthreading etc was not that useful

The get_num_jobs function is right now used inside map_rect, but its use will go away there once we switch over map_rect to use the TBB.

When we switch over map_rect to the TBB, we can then include this function inside the init_threadpool_tbb.hpp file, sure. Then we should also remove the num_jobs argument which is really not needed. The get_num_threads should only parse the STAN_NUM_THREADS and that's it.

The value -1 means "use all cores". I would change this to then return the TBB default for that. As I recall that is without hyper-threading as I saw that last time.

rok-cesnovar · 2019-10-01T10:04:59Z

stan/math/prim/scal/fun/get_num_threads.hpp

+ *
+ * @param num_jobs number of jobs
+ * @return number of threads to use
+ * @throws std::runtime_error if the value of STAN_NUM_THREADS env. variable


This is more of a final review-style comment than what you need right now, but this throws an std::invalid_argument.

oh, you are right... will fix.

rok-cesnovar · 2019-10-01T10:08:05Z

stan/math/prim/scal/fun/init_threadpool_tbb.hpp

+ * @throws std::runtime_error if the value of STAN_NUM_THREADS env. variable
+ * is invalid
+ */
+inline bool init_threadpool_tbb(tbb::stack_size_type stack_size = 0) {


What happens in someone calls this more than once? Should this throw, return silently without changing anything? Is there a TBB API call you can make to check whether the threadpool was initialized?

Is that what is_active does? Should we return immediately if is_active is true?

Comment 1: First note that the tbb::task_scheduler_init is static. So you ever create a single instance of this thing. Any further call is basically ignored. This is in line with the TBB way of working in this case. Repeated initialisation is not possible. You do it once and that's it - no more reconfiguration after that. If you want to have differently sized working areas then you can use so-called task_arenas.

Comment 2: Why should we return immediately? In case the static init object is active, then this one was used (and the parsing has worked) so that we will just return again true if it was true before.

stan/math/prim/scal/fun/init_threadpool_tbb.hpp

SteveBronder

Couple quick comments, I'll look over the tests more tonight or tomorrow

SteveBronder · 2019-10-01T10:23:21Z

stan/math/prim/scal/fun/init_threadpool_tbb.hpp

+ * @throws std::runtime_error if the value of STAN_NUM_THREADS env. variable
+ * is invalid
+ */
+inline bool init_threadpool_tbb(tbb::stack_size_type stack_size = 0) {


Just so I'm clear, stack_size set the amount of memory each thread has at initialization right? Is there a reason to say 0 as the default and not something like 64 just to give them a bit?

0 let's the TBB choose that initial size. We should leave the default here as is, I think.

Gah totally missed that in the docs my bad

SteveBronder · 2019-10-01T10:26:14Z

stan/math/prim/scal/fun/init_threadpool_tbb.hpp

+
+  static tbb::task_scheduler_init tbb_scheduler(tbb_max_threads, stack_size);
+
+  return tbb_scheduler.is_active();


From docs

By default, Intel TBB 2.2 automatically creates a task scheduler the first time that a thread uses task scheduling services and destroys it when the last such thread exits.

Does this function have a side effect somewhere that actually turns on some global scheduler?

The side effect is initialisation of the threadpool with the scheduler.

Would be good to add that as a @note in the doxygen

SteveBronder · 2019-10-01T10:27:10Z

stan/math/rev/core/init_chainablestack_tbb.hpp

+namespace stan {
+namespace math {
+
+static std::mutex cout_mutex;


For stuff like this I find it helpful to have a // FIXME(DELETE) somewhere just so I remember to delete this stuff

SteveBronder · 2019-10-01T10:29:11Z

stan/math/rev/core/init_chainablestack_tbb.hpp

+static std::mutex cout_mutex;
+
+// initialize AD tape for TBB threads
+struct ad_tape_observer : public tbb::task_scheduler_observer {


General docs would be nice. With this stuff especially I, Rok, pretty much any other reviewer will be reading a lot of these methods and schemes for the first time. So a little doc here is nice to lay down intent

Sure... I was in a hurry yesterday night... can add some URL refs to TBB doc, for example.

stan/math/rev/core/init_chainablestack_tbb.hpp

SteveBronder · 2019-10-01T10:43:02Z

stan/math/rev/core/init_chainablestack_tbb.hpp

+  }
+
+ private:
+  std::unordered_map<std::thread::id, ChainableStack*> ad_tape_;


You can use using to reuse this type in the code for a little nicer readability

Why not a vector pair? I've only really found unordered_map to actually be faster when you have a pretty large N

unordered_map is just a convenient way to have an index set which is indexed with std::thread_id keys. Speed is totally irrelevant here (this is called just once per created thread for the entire life-time of the threading in the pool). So I want to use std::thread_id as a key.

test/unit/math/rev/mat/functor/gradient_test.cpp

rok-cesnovar · 2019-10-01T10:55:58Z

From a design perspective I agree with how this is done. I dont think you can make a more clean solution here. There are minor review-time issues with braces, docs and such but I think Steve has already flagged most of those.

Initializing the threadpool is straightforward now with another layer and the details of get_num_threads hidden away. The open question there is what to do if someone calls the init threadpool more than once.

The chainable stack is also pretty straightforward, with the only question, I think, being whether unordered map is the way to go or not.

Can we write any sort of tests for the get_num_threads function. That is going to be tricky probably.

rok-cesnovar · 2019-10-01T11:00:26Z

How much additional work is changing map_rect going to be? I am starting to think this is so close we need to merge the Cmdstan and Stan Math TBB PRs, the former does still need some work. Otherwise reviewing and preparing all these branches is going to be a huge pain for everyone.

For the sake of complying with our agreement made at the Stan Math meeting, I am stating here that take care of reverting all PRs if we, for some reason, fail to meet the demands we set .

wds15 · 2019-10-01T11:36:22Z

There are three cases which will lead to no effect for a call to init_threadpool_tbb:

Initializing with tbb::task_scheduler_init the TBB threadpool before calling init_threadpool_tbb. In that case the return will be false for the active state.
Calling init_threadpool_tbb repeatedly will lead to initialising the threadpool on the first time. Then the active state returned will always be true. Any call after the first will not change anything.
If you use any TBB code which leads to tasks being dispatched (so any access to scheduling facilities), then the TBB code will on the spot initialise its threadpool using defaults (use all cores). In that case calling init_threadpool_tbb will have no effect and the active state is false.

This is the default behaviour of the TBB. I don't see why we should deviate from that. I can add some URLs to the TBB doc if that helps?

get_num_threads is already well tested in test/unit//math/prim/mat/functor/num_threads_test.cpp... those need to be adapted when we take away the num_jobs argument.

For the observer: I want to use std::thread_id as a key and as such I have to use an unordered_map...I don't see another way doing that otherwise conveniently.

I can certainly add the map_rect code here. That is straightforward. I just thought it would be better to do it in a follow-up PR, no? Either way is fine with me.

I will address these things later tonight. Let me know if my responses don't make sense / you think differently / I misunderstood.

With this PR I would also like to update the wiki page which talks about threading.

SteveBronder · 2019-10-01T11:37:17Z

Also it looks like the design here differs from the design doc's ScopedChainableStack?

https://github.com/wds15/design-docs/blob/parallel_autodiff/designs/0003-parallel_autodiff.md#step-2-independent-ad

wds15 · 2019-10-01T11:38:34Z

What do you think differs? I am relatively sure that things are consistent with one another (maybe not the same... the parallel design RFC is months old).

rok-cesnovar · 2019-10-01T14:42:21Z

There are three cases which will lead to no effect for a call to init_threadpool_tbb:
This is the default behaviour of the TBB. I don't see why we should deviate from that.

I just wanted to double check what would happen. A line in the doc stating something like "calling this after the threadpool has already been initialized will have no effect" would suffice in my opinion.

get_num_threads is already well tested in test/unit//math/prim/mat/functor/num_threads_test.cpp... those need to be adapted when we take away the num_jobs argument.

👍

I can certainly add the map_rect code here. That is straightforward. I just thought it would be better to do it in a follow-up PR, no? Either way is fine with me.

I would do it here. If you say the map_rect changes are small, I think adding them here makes sense. The only non-trivial thing here is the observer code.
I would then run the warfarin testbench we did for TLSv* (I can run it on Ubuntu and Windows) to double check everything is as expected. Would that work for you?

wds15 · 2019-10-01T15:00:43Z

Adding the map_rect things here is easy. Ok. Then I suggest to do:

address things mentioned (including moving init things to stan/math/core)
add the map_rect with the TBB
once that is merged we do in another PR moving map_rect back into prim where it belongs. If you want me to do that in this PR, then you will see a few files being moved around... what do you prefer? This PR for this moving or in another? There are no more references to rev-things in the map_rect using the TBB when we change.

Performance testing? Ok... if you think its needed.

Cool! This thing is landing!

rok-cesnovar · 2019-10-01T15:30:50Z

once that is merged we do in another PR moving map_rect back into prim

Yes, do that separately.

Performance testing? Ok... if you think its needed.

Yes, I think it would be great to have them here, this is the first thing with TBB. We have the scripts ready and it should not take that much time. I would take it more as an additional advertisement for TBB in Stan. I will prepare the scripts, I think that is reasonable as I am the one requesting the tests and to take some burden of of you.

Cool! This thing is landing!

🚀

…math into feature/intel-tbb-init

stan-buildbot · 2019-10-08T17:35:27Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.96)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.93)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.01)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.02)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.95)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.98)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.07)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.9)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.98147692471
Commit hash: 2479cce900cd81e521a74808da13b051bd996495

stan-buildbot · 2019-10-08T18:02:26Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.95)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.98)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.02)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.02)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.05)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.01)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.99428447690
Commit hash: 81cbff2

stan-buildbot · 2019-10-08T19:23:59Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.92)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.94)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.02)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.08)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.0)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.98)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.01)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.94)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.98153594776
Commit hash: 395372d47f28cee7f4937a8c8d5585907a7a015b

stan-buildbot · 2019-10-08T21:47:33Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 1.0)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.01)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.02)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.02)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.87)
Result: 0.99192952134
Commit hash: 3d337877304b04edbe7b8e07ab1d425b49b870ae

stan-buildbot · 2019-10-08T22:15:05Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.04)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.02)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.02)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.84)
Result: 0.99333679134
Commit hash: 3d337877304b04edbe7b8e07ab1d425b49b870ae

stan-buildbot · 2019-10-08T22:28:33Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.0)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.03)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.01)
(performance.compilation, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.96)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.98738357614
Commit hash: 3d337877304b04edbe7b8e07ab1d425b49b870ae

stan-buildbot · 2019-10-09T03:10:11Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.98)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.97)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.05)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.03)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.02)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.96)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.84)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.98)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.97)
Result: 0.98710218725
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

stan-buildbot · 2019-10-09T05:29:09Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.97)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.97)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.93)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.95)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.3)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.07)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.02)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.02)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.14)
Result: 1.02407227782
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

stan-buildbot · 2019-10-09T06:04:45Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.77)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.01)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.86)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.6)
(performance.compilation, 1.17)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.11)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.08)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.2)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.06)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.12)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.98)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.2)
Result: 1.07446600559
Commit hash: 0c6f6c89ae4e3855eacbd2f02905881c2a49e9d6

SteveBronder · 2019-10-09T07:27:13Z

Why does this keep going off?

rok-cesnovar · 2019-10-09T07:28:20Z

We are running a cmdstan PR with a submodule reference to this PR. And that also sends the cmdstan perf results here.

stan-buildbot · 2019-10-09T08:28:15Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.03)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.98)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.01)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.01)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.02)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.04)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.92)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.99)
Result: 0.99942056911
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

rok-cesnovar · 2019-10-09T09:14:00Z

I am restarting this again. It got stuck in the Stan upstream part with

Cannot contact EC2 (Windows) - Build, Docker, WSL (i-0f525173fa0eaaf67): hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on EC2 (Windows) - Build, Docker, WSL (i-0f525173fa0eaaf67) failed. The channel is closing down or has closed down

Could not connect to EC2 (Windows) - Build, Docker, WSL (i-0f525173fa0eaaf67) to send interrupt signal to process

and there was no other way than simply restarting...

stan-buildbot · 2019-10-09T09:40:47Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.98)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.07)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.02)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.02)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.09)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.94)
Result: 1.01064086204
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

wds15 · 2019-10-09T12:33:29Z

Again the same error? Is there a Jenkins / Windows problem or a problem on this branch?

rok-cesnovar · 2019-10-09T12:38:59Z

Jenkins/Windows. I restarted the Always run part 2 now.

wds15 · 2019-10-09T15:05:59Z

Will this ever go through? I hope this time we go through the Windows stuff...

wds15 · 2019-10-09T17:41:01Z

ok let me merge develop into this one since the allow spaces went in.

…math into feature/intel-tbb-init

serban-nicusor-toptal · 2019-10-09T17:42:12Z

Windows seems to be fixed now, shouldn't happen again!

@wds15 It failed at the testing stage ( because of Jenkins ), restarted it here: https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1376/34/pipeline/

rok-cesnovar · 2019-10-09T17:44:06Z

Yeah, but it still points to PR-744 that was merged and doesnt exist anymore. So it will def fail again. It seems that Windows Jenkins is stable now so hopefully this is all over in the morning CET time.

wds15 · 2019-10-09T17:45:01Z

thanks for restarting! I need to merge now develop into this... but looking at stan downstream tests it looks to me as if we need there possibly as well the tbb.dll copy thing?

stan-buildbot · 2019-10-09T23:34:16Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.99)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.98)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.01)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.03)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 0.99)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.02)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.02)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.05)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.89)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.01)
Result: 0.99778605623
Commit hash: cd626022fdee22a0603909203220f6ac79b68f42

stan-buildbot · 2019-10-10T00:51:39Z

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.94)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.01)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.01)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.03)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.92)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.88)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.0)
Result: 0.98252425644
Commit hash: fd680d3

rok-cesnovar

Lets get this merged! Thank you @wds15!!

weberse2 and others added 11 commits September 30, 2019 18:12

current state of affairs

c2716b6

Merge remote-tracking branch 'origin/feature/intel-tbb-lib' into feat…

e3a2301

…ure/intel-tbb-init

add TBB thread initializer of the AD tapes

01d2f4e

add tbb threadpool init function

caf482a

add doc

7270d12

doc todo

4d8d8af

add more tests for threadpool tbb init

890f3e8

remind us of more tests to todo

f2339a7

fix ad tape observer and add gradient tests

6c35b7a

simplify init_threadpool_tbb

ab07d95

[Jenkins] auto-formatting by clang-format version 6.0.0 (tags/google/…

1e4b65b

…stable/2017-11-14)

SteveBronder reviewed Oct 1, 2019

View reviewed changes

rok-cesnovar reviewed Oct 1, 2019

View reviewed changes

SteveBronder reviewed Oct 1, 2019

View reviewed changes

stan/math/prim/scal/fun/init_threadpool_tbb.hpp Outdated Show resolved Hide resolved

SteveBronder reviewed Oct 1, 2019

View reviewed changes

wds15 added 4 commits October 1, 2019 19:43

Merge branch 'feature/intel-tbb-init' of https://github.com/stan-dev/…

2e4ae19

…math into feature/intel-tbb-init

address PR comments on get_num_threads

4c4a348

address review comments for global observer

a3a156f

switch map_rect_concurrent to use tbb parallel_for

76bd90a

rok-cesnovar mentioned this pull request Oct 9, 2019

allow spaces in path leading to stan-directory in makefiles #1395

Merged

5 tasks

Merge branch 'feature/intel-tbb-init' of https://github.com/stan-dev/…

a5088bd

…math into feature/intel-tbb-init

align readme

fd680d3

rok-cesnovar approved these changes Oct 10, 2019

View reviewed changes

wds15 merged commit 731e9a5 into develop Oct 10, 2019

serban-nicusor-toptal added this to the 3.0.0 milestone Oct 18, 2019

mcol deleted the feature/intel-tbb-init branch January 5, 2020 12:47


		static tbb::task_scheduler_init tbb_scheduler(tbb_max_threads, stack_size);

		return tbb_scheduler.is_active();

integrate Intel TBB #1376

integrate Intel TBB #1376

Conversation

wds15 commented Oct 1, 2019 • edited by rok-cesnovar Loading

Summary

Performance Tests

Tests

Side Effects

Todo

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveBronder Oct 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rok-cesnovar Oct 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveBronder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rok-cesnovar commented Oct 1, 2019

rok-cesnovar commented Oct 1, 2019

wds15 commented Oct 1, 2019

SteveBronder commented Oct 1, 2019

wds15 commented Oct 1, 2019

rok-cesnovar commented Oct 1, 2019

wds15 commented Oct 1, 2019

rok-cesnovar commented Oct 1, 2019 • edited Loading

stan-buildbot commented Oct 8, 2019

stan-buildbot commented Oct 8, 2019

stan-buildbot commented Oct 8, 2019

stan-buildbot commented Oct 8, 2019

stan-buildbot commented Oct 8, 2019

stan-buildbot commented Oct 8, 2019

stan-buildbot commented Oct 9, 2019

stan-buildbot commented Oct 9, 2019

stan-buildbot commented Oct 9, 2019

SteveBronder commented Oct 9, 2019

rok-cesnovar commented Oct 9, 2019

stan-buildbot commented Oct 9, 2019

rok-cesnovar commented Oct 9, 2019

stan-buildbot commented Oct 9, 2019

wds15 commented Oct 9, 2019

rok-cesnovar commented Oct 9, 2019

wds15 commented Oct 9, 2019

wds15 commented Oct 9, 2019

serban-nicusor-toptal commented Oct 9, 2019 • edited Loading

rok-cesnovar commented Oct 9, 2019

wds15 commented Oct 9, 2019

stan-buildbot commented Oct 9, 2019

stan-buildbot commented Oct 10, 2019

rok-cesnovar left a comment

Choose a reason for hiding this comment

wds15 commented Oct 1, 2019 •

edited by rok-cesnovar

Loading

SteveBronder Oct 1, 2019 •

edited

Loading

rok-cesnovar Oct 1, 2019 •

edited

Loading

rok-cesnovar commented Oct 1, 2019 •

edited

Loading

serban-nicusor-toptal commented Oct 9, 2019 •

edited

Loading