JNI concurrent interface extensions #1215

zhilians · 2017-03-14T20:43:54Z

This PR is to introduce concurrent interfaces, cross platform support improvements and multilabel multiline prediction support to VW's JNI.

Added learner/concurrent/ package containing our concurrent components. We need to do this since VW learner in C is not thread-safe and hence the VW learner in JNI synchronizes on every predict call which can hamper throughput of high volume low latency services that require to make predict calls.
Added multiline prediction support in multi-label predictor interface.
Removed NativeUtils which was used to load JNI lib from jar. Added proper OS dependent target. This is partially inline with New Java build #1193. We will need to merge fixes from New Java build #1193.

…Multilabel Interface to JNI; generated OS dependant lib files; modified pom to include generated lib file in java library path; removed unused variable in multiclassleaner;

…ions in parallel, we introduce a concurrent learner (for multilabel and multiline multiclass learners) that works with a pool of learners to get high throughput for predictions in an online setting.

…_model method instead of initialize. Former method reuses shared variables from the seed learner instance and hence has much less memory footprint in comparison to latter which allocates new memory for all new instances created.

…onal to stop any unwanted NPEs.

…ner instances of the same model.

JohnLangford · 2017-03-29T16:45:45Z

@eisber @jmorra can you take a look at this? It seems plausibly quite valuable, but at the same time I'm wondering if we should do more to push functionality into C++ so the java interface can be slimmer?

eisber · 2017-03-29T19:20:04Z

I’m all for a unified C++ layer...

…

-Markus

jmorra

I think a lot of this stuff sounds pretty reasonable. A few concerns.

This won't work with the current language level. And bumping the language level should not be taken lightly.
This seems to also include changes to the build library that I proposed in another PR. In my opinion the changes to the way this is built should not go in here.

jmorra · 2017-03-29T19:26:54Z

java/src/main/c++/vowpalWabbit_learner_VWLearners.cc

+JNIEXPORT jlong JNICALL Java_vowpalWabbit_learner_VWLearners_seedVWModel(JNIEnv *env, jclass obj, jlong vwPtr)
+{ jlong cloneVwPtr = 0;
+  try
+  { vw* vwInstance = VW::seed_vw_model((vw*)vwPtr, "");


What does seed_vw_model do?

ref: https://github.com/JohnLangford/vowpal_wabbit/blob/462ed1624967d99a349527edd108a6f62ed92b9a/vowpalwabbit/parse_args.cc#L1400-L1424

It initializes new VW instances using the parameters from an existing instance (seed), these cloned VW instances shares the shared data with the the seed instance.

The seed_vw_model method is expected to make the concurrent methods more memory efficient.

jmorra · 2017-03-29T19:30:26Z

java/src/main/java/vowpalWabbit/learner/VWLearners.java

-import vowpalWabbit.jni.NativeUtils;
-
-import java.io.IOException;
+import java.util.Optional;


This won't work. The pom.xml file specifies the language level as Java 6 and this is a Java 8 construct.

Good catch.

Optional helped to maintain cleaner and more semantically meaningful code, shall we upgrade it to Java 8 since Java 6 is no longer updated?

jmorra · 2017-03-29T19:31:06Z

java/src/main/java/vowpalWabbit/learner/VWLearners.java

            finally {
                STATIC_LOCK.unlock();
            }
        }
    }
    private static native long initialize(String command);
+    private static native long seedVWModel(long nativePointer);


What's the difference between seeding and initializing? If you are no longer using initialize that should be removed.

see comments on seed_vw_model

Lets say you have a vw model of size 1GB and you create 2 instances of the same model using initialize, memory footprint in java is 2 * 1GB. Now if you use seed_vw_model, internal data structures are shared and hence memory footprint would still be 1GB.

We still need initialize to create the first instance. seed_vw_model requires a seed model to clone from.

jmorra · 2017-03-29T19:42:26Z

@deaktator could you comment on this too?

zhilians · 2017-04-21T17:21:05Z

@jmorra I have removed the usage of Optional which is the only JAVA8 component used. I think it will be beneficial to have this merge in, then we can iterate on longer term projects like fixing the JAVA build or building a unified c++ interface.

deaktator · 2017-04-21T20:43:59Z

I wanted to get the motivation for this PR. Is the main motivation improving throughput or to simply provide timeout semantics in the prediction methods? If it's throughput, I would suggest adding JMH benchmarks with the follow: 1. Your code 1. Threadlocal model with executor service backed by fixed size thread pool related to the desired concurrency level 1. Serial execution of single thread model Is there a concern about the queue growing too large with the threadlocal approach? It seems simpler and possibly faster than using a blocking queue. Also, on closer inspection, you are polling the model queue inside the submitted Callable passed to the ExecutorService. This will add to the queue in the ExecutorService. Therefore the same issue would exist in either your or the threadlocal implementation. Some other questions: ! If the predict method is always called by the same thread, won't this likely perform worse than a normal model? By calling the following, you block the calling thread, right? Am I wrong? ```java + O result = future.get(timeoutInMillis, TimeUnit.MILLISECONDS); ``` If so, won't your throughput be no better than serial model performance? Why is your asynchronous boundary inside predict? Why block in the predict method vs return a Future? If you decide to return a Future, consider Guava ListenableFutures so you can potentially avoid blocking altogether by providing a callback mechanism in the model. Why provide a LinkedBlockingQueue and an ExecutorService to the multithreaded learner constructor? If the executor service doesn't have the same number of threads as the size of the queue, then you'll be wasting resources. This is especially true when there are less threads than elements in the queue. Memory will be waste. In the other case, threads will be unnecessarily created and used without additional throughput, right? I would update the javadoc to say **null** rather than empty. I would update the docs if necessary to indicate that models concurrent models should be read only, don't learn or whatever terminology you like. This is so that you don't get into a situation that the concurrent model is not idempotent. Speaking of idempotency, I would add a test to ensure idempotency by predicting many times based on the same input. Stick the results in a Set and test that the size of the set is 1. If you choose not to change the implementation, I would consider adding an AtomicLong containing the number of interrupted predictions. I was wondering about the shared state in VW. Is this known / guaranteed to be thread-safe? If not, that cloning the seed learner stuff could result in really nasty bugs that will only be observable under heavy load and will likely manifest in ways that are hard to diagnose the problem. The worst problem would be a corruption that doesn't result in errors but causes predictions to be "unstable" / nonsensical. Am I way off on this stuff? If so, let me know. What do you think @jmorra?

deaktator · 2017-04-21T21:57:12Z

Also, if we can guarantee that models will only ever be called on specific threads, maybe we should have non-locking variants of the serial models to be used exclusively inside the concurrent models. This will avoid locking and will use the concurrency mechanisms in the concurrent models to ensure thread safety.

deaktator · 2017-04-22T05:27:07Z

@zhilians @atulvkamat @eisber @JohnLangford @jmorra What do you think about the above few comments?

deaktator · 2017-04-23T04:27:53Z

Here's some scala doc that does a quick and dirty benchmark. Shows the multithreaded approach could be up to 3 orders of magnitude slower than serial execution. I would definitely recommend benchmarking this PR's code against serial execution. // Scala Code // // ------------------------------------------------------------ // Example output // ------------------------------------------------------------ // sum unthreaded_time_sec threaded_time_sec unthreaded_speedup // 100000 1.9319E-5 0.017919362 927.55115X // 100000 1.7781E-5 0.016643956 936.0529X // 100000 2.0025E-5 0.023529027 1174.9827X // 100000 1.5941E-5 0.01935222 1213.9904X // 100000 1.209E-5 0.015618176 1291.8259X // 100000 2.0106E-5 0.028387927 1411.9132X // 100000 4.1961E-5 0.018170957 433.0439X // 100000 8.384E-6 0.017119417 2041.9153X // 100000 1.9023E-5 0.019923734 1047.3497X // 100000 1.8086E-5 0.018302908 1011.9931X import java.util.concurrent.ExecutorService import java.util.concurrent.Executors import java.util.concurrent.Future import java.util.concurrent.Callable import scala.annotation.tailrec class Test(threads: Int) { private[this] val exSvc = Executors.newFixedThreadPool(threads) /** * Return length of string. Compute asynchronously and block to get result. * @param s * @return length of string */ def strLength(s: String) = exSvc.submit(new Test.StrLen(s)).get } object Test { def strLength(s: String) = s.length class StrLen(s: String) extends Callable[Int] { def call() = strLength(s) } } def time[A](a: => A) = { val t1 = System.nanoTime val r = a val t2 = System.nanoTime (r, (1.0e-9*(t2 - t1)).toFloat) } @tailrec def sumThreaded(t: Test, n: Int, i: Int, strs: Array[String], s: Int): Int = { if (i < n) sumThreaded(t, n, i+1, strs, s + t.strLength(strs(i))) else s } @tailrec def sumUnthreaded(n: Int, i: Int, strs: Array[String], s: Int): Int = { if (i < n) sumUnthreaded(n, i+1, strs, s + Test.strLength(strs(i))) else s } def threadedLengths(n: Int, seed: Long, threads: Int) = { val r = new scala.util.Random(seed) val strs = Array.fill(n)(r.nextString(100)) val t = new Test(threads) time(sumThreaded(t, strs.length, 0, strs, 0)) } def unthreadedLengths(n: Int, seed: Long) = { val r = new scala.util.Random(seed) val strs = Array.fill(n)(r.nextString(100)) time(sumUnthreaded(strs.length, 0, strs, 0)) } println("sum\tunthreaded_time_sec\tthreaded_time_sec\tunthreaded_speedup") 1 to 10 foreach { i => val (u, ut) = unthreadedLengths(1000, i) val (t, tt) = threadedLengths(1000, i, 2) val utInv = 1d / ut val ttInv = 1d / tt val speedUp = (utInv / ttInv).toFloat println(s"$u\t$ut\t$tt\t${speedUp}X") }

jmorra · 2017-04-24T16:03:42Z

I think all of @deaktator's comments seem reasonable. @JohnLangford @zhilians what do you think?

atulvkamat · 2017-04-24T18:23:47Z

I wanted to get the motivation for this PR. Is the main motivation improving throughput or to simply provide timeout semantics in the prediction methods?
Yes motivation is throughput for large models (~1GB or greater in size). In fact, we tested this on a 10GB model.

If it's throughput, I would suggest adding JMH benchmarks with the follow
Yes we will look into publishing this benchmark results. Requires us to carry out this work, will take sometime.

! If the predict method is always called by the same thread, won't this likely perform worse than >>a normal model? By calling the following, you block the calling thread, right? Am I wrong?
+ O result = future.get(timeoutInMillis, TimeUnit.MILLISECONDS);
If so, won't your throughput be no better than serial model performance?

The assumption is that the calling thread itself is not necessarily the main thread, which is the case when you use a webserver where every rest api request is assigned a dedicated thread.

Why is your asynchronous boundary inside predict? Why block in the predict method vs return >>a Future? If you decide to return a Future, consider Guava ListenableFutures so you can ?>>potentially >>avoid blocking altogether by providing a callback mechanism in the model.

Same as above

Why provide a LinkedBlockingQueue and an ExecutorService to the multithreaded learner >>constructor? If the executor service doesn't have the same number of threads as the size of the >>queue, then you'll be wasting resources. This is especially true when there are less threads >>than elements in the queue. Memory will be waste. In the other case, threads will be >>unnecessarily created and used without additional throughput, right?

I dont follow this comment completely. As a user of this functionality, one can decide how many threads one wants dedicated for the prediction. Ideal rule of thumb is to keep it close to (but less than) the number of cores in the CPU since predict calls are pure CPU only calls (and no IO). LinkedBlockingQueue is to handle scenarios where the application is queuing predict requests faster than the amount of time it takes to make actual predictions.

I would update the docs if necessary to indicate that models concurrent models should be read >>only, don't learn or whatever terminology you like. This is so that you don't get into a situation >>that the concurrent model is not idempotent.

That is correct. We need to update the doc and maybe potentially disable learn by default for such read-only predict only learners.

Speaking of idempotency, I would add a test to ensure idempotency by predicting many times >>based on the same input. Stick the results in a Set and test that the size of the set is 1.

If you choose not to change the implementation, I would consider adding an AtomicLong >>containing the number of interrupted predictions.

I was wondering about the shared state in VW. Is this known / guaranteed to be thread-safe? If >>not, that cloning the seed learner stuff could result in really nasty bugs that will only be >>observable under heavy load and will likely manifest in ways that are hard to diagnose the >>problem. The worst problem would be a corruption that doesn't result in errors but causes >>predictions to be "unstable" / nonsensical.

testMultilabelsConcurrency is doing that.

Also seed_vw_model is already being used in C#, so we assume that the method works as advertised and our tests indicate that this is so.

https://github.com/JohnLangford/vowpal_wabbit/blob/28c080a7df2b3c9a01f056f0957d222132997f46/cs/cli/vw_base.cpp#L66

atulvkamat · 2017-04-24T18:29:18Z

Also, if we can guarantee that models will only ever be called on specific threads, maybe we >>should have non-locking variants of the serial models to be used exclusively inside the concurrent >>models. This will avoid locking and will use the concurrency mechanisms in the concurrent >>models to ensure thread safety.

We thought about it. The lock calls itself doesn't contribute significantly to the latency, its only when lot of consumers block on the lock call due to contention is when it starts contributing significantly to the overall latency.

atulvkamat · 2017-04-24T18:32:32Z

Threadlocal model with executor service backed by fixed size thread pool related to the desired >>concurrency level

I agree, that was part of our initial plan until one of our colleagues made us aware of some weird issues ThreadLocal can cause. Besides, for our 10GB model which took about 30-60 secs to initialize, we had to use not so robust approach to initialize the learners before it can be used for prediction purposes when using threadlocals.

atulvkamat · 2017-04-24T18:35:19Z

@deaktator can you tell us about the size of the model you used? From the latency numbers you have published it seems like it is even less than 1 msec. We used a 10GB model that produced min latency of around 12 msecs.

zhilians · 2017-04-25T00:36:59Z

Thanks @deaktator for the detailed review. Adding to @atulvkamat 's comments:

I wanted to get the motivation for this PR. Is the main motivation improving throughput or to simply provide timeout semantics in the prediction methods? If it's throughput, I would suggest adding JMH benchmarks with the follow:

Your code

Threadlocal model with executor service backed by fixed size thread pool related to the desired concurrency level

Serial execution of single thread model

The main focus for this PR is to increase the throughput in a high concurrency container without multiplying the memory usage. I think it's not a bad idea to run benchmark tests for different implementations over different sizes of the model. Will schedule time to do so this week.

Also, on closer inspection, you are polling the model queue inside the submitted Callable passed to the ExecutorService. This will add to the queue in the ExecutorService. Therefore the same issue would exist in either your or the threadlocal implementation.

I understand that you are concerned about the manipulation of the LinkedBlockingQueue within the Callable submitted to the ExecutorService. It's true that it'll increase the latency of the operation in ExecutorService, but we are only talking a fraction of the prediction time here, depending on the model size and the actual predictor. The model queue will only be fiddled when retrieving and returning a model, which should be done promptly.

! If the predict method is always called by the same thread, won't this likely perform worse than a normal model? By calling the following, you block the calling thread, right? Am I wrong?

You are correct that the caller will be blocked waiting for the result or timeout. But we are picturing this will be mostly used in a multi-consumer environment where multiple consumers are trying to get predictions from the same model, they will be benefit from having more than one predictor available, in which scenario the whole system shall be linearly accelerated.

If the use case was really predicting using the same thread, one should rather use the original predictor implementation, which will still be available.

Why provide a LinkedBlockingQueue and an ExecutorService to the multithreaded learner constructor? If the executor service doesn't have the same number of threads as the size of the queue, then you'll be wasting resources. This is especially true when there are less threads than elements in the queue. Memory will be waste. In the other case, threads will be unnecessarily created and used without additional throughput, right?

It's a good point. The motivation behind this constructor was to make it flexible for people who wants to reuse a shared executor service across different concurrent predictor instances. You are right that memory will be wasted if we have more predictor instances than threads in the executor, but not in a very bad way since we use seed_vw_model which greatly reduces the memory footprint of new vw instances. It's up to the user to decide and control the trade-off between the number of threads and number of predictors.

But to simplify the API, I will add another simplified constructor that creates an executor and a predictor queue of the same size. Will update the javadoc as well.

deaktator · 2017-04-25T04:03:46Z

@zhilians I get what you're saying and it seems like you've adequately fleshed out your use case and have done a fair amount of performance testing. What I am concerned with is that the API is dictated by your use case. The API (signature, not documentation) doesn't make it clear to less astute users the requirement that it must be called in a multi-threaded environment. It's obvious in hindsight, but I had to read the code before it clicked. If instead you returned Futures or, even better, ListenableFutures, I feel like you wouldn't be imposing your use case on all users of the API. I like the "Your Server as a Function" paper by Marius Eriksen (https://monkey.org/~marius/funsrv.pdf) and the reactive paradigm so I'm likely biased toward Futures that are asynchronously transformable (and non-blocking) to other Futures . What do you think about the possibility of returning futures? One other thought. I would make a constructor that takes a timeout and have two predict methods: one that takes a custom timeout and one that doesn't take a timeout and uses the one from the constructor parameter. Is this dumb? I feel like it is more uniform with the current serial models.

zhilians · 2017-04-27T08:13:24Z

What I am concerned with is that the API is dictated by your use case. The API (signature, not documentation) doesn't make it clear to less astute users the requirement that it must be called in a multi-threaded environment. It's obvious in hindsight, but I had to read the code before it clicked.

We certainly don't want to impose our use case on all the users of the API. Our contribution is more focused on providing another possibility to speed up predictions in a concurrent environment. IMHO, when the user creates a concurrent learner from the factory, it's immediately clear that the user is going to get a pool of predictors instead of adopting a parallelized predictor that could be speeded up even in a single thread environment. We can certainly hammer the documentations a bit to make it clearer.

If instead you returned Futures or, even better, ListenableFutures, I feel like you wouldn't be imposing your use case on all users of the API. I like the "Your Server as a Function" paper by Marius Eriksen (https://monkey.org/~marius/funsrv.pdf) and the reactive paradigm so I'm likely biased toward Futures that are asynchronously transformable (and non-blocking) to other Futures.

What do you think about the possibility of returning futures?

One other thought. I would make a constructor that takes a timeout and have two predict methods: one that takes a custom timeout and one that doesn't take a timeout and uses the one from the constructor parameter. Is this dumb? I feel like it is more uniform with the current serial models.

In addition to our timeboxed predict method, I've added another public Future<O> predict(final E example) to provide flexibility for users who wants to react to the prediction results asynchronously.

To the ListenableFutures idea, I personally think it's a nice extension to Future, but it's out of the scope of this PR because we are talking about introducing a third party dependency to this relatively low level and barebone API. Seems bit pricy to me for just one feature.

I think the timeout parameter or the Future<O> return type speak for themselves that they are methods from a concurrent predictor and should be used accordingly, so I am reluctant to match the API in the current serial models.

… method to create concurrent predictor using thread pool and predictor pool of same size in factory;

zhilians · 2017-04-27T08:27:29Z

Thank you for another in-depth review.

PR comments addressed in 4703f57

I strongly believe that we should merge this in as soon as possible as it's providing an additional simple and memory effective wrapper for speeding up concurrent prediction tasks. We could ship it now if there's no major flaw in the implementation and then work on the feedbacks from the community regarding this new API.

@jmorra @deaktator @atulvkamat Do you have other concerns regarding the core functionalities?

atulvkamat · 2017-04-27T17:27:31Z

@zhilians Change looks good to me.

deaktator · 2017-04-27T21:06:50Z

The only two outstanding concerns I have are 1. The queue polling inside the Callable on a possibly unbounded queue in the Executor. As was stated previously by @atulvkamat, big models can have latency on the order of milliseconds so if you get ~ M*500 calls/sec, where M is the number of models in the blocking queue, the current predictors will die due to back pressure with an OutOfMemoryError or a heap size exception, right? I didn't scrutinize the javadoc too much but I think this should be documented if it's not already. 2. The thread-safety of the seeded models. I can't speak to this as it lies in the VW internals in the C code. It would be nice to have a a sign off. Can @JohnLangford or someone else attest to the thread safety of this approach? Is the shared state between such models thread-safe? If not, this seeded model approach will also not be thread-safe. I know that @atulvkamat commented on the unit tests checking for this, but 100 examples is most likely not enough to catch a race condition gone bad and a passing test isn't definitive proof, merely a probabilistic proof (but the ε isn't really known). Other than those, I think this stuff looks good.

JohnLangford · 2017-04-28T19:00:03Z

The seeded models approach is in routine very heavy use in general.

…

-John

On Thu, Apr 27, 2017 at 5:06 PM, Ryan Deak ***@***.***> wrote: The only two outstanding concerns I have are 1. The queue polling inside the Callable on a possibly unbounded queue in the Executor. As was stated previously by @atulvkamat, big models can have latency on the order of milliseconds so if you get ~ M*500 calls/sec, where M is the number of models in the blocking queue, the current predictors will die due to back pressure with an OutOfMemoryError or a heap size exception, right? I didn't scrutinize the javadoc too much but I think this should be documented if it's not already. 2. The thread-safety of the seeded models. I can't speak to this as it lies in the VW internals in the C code. It would be nice to have a a sign off. Can @JohnLangford or someone else attest to the thread safety of this approach? Is the shared state between such models thread-safe? If not, this seeded model approach will also not be thread-safe. I know that @atulvkamat commented on the unit tests checking for this, but 100 examples is most likely not enough to catch a race condition gone bad and a passing test isn't definitive proof, merely a probabilistic proof (but the ε isn't really known). Other than those, I think this stuff looks good. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1215 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE25oTxofrVlQ15YlD9KFB8dwdH3YoVks5r0QNrgaJpZM4MdIP5> .

deaktator · 2017-04-29T03:36:18Z

Fine by me. @jmorra merge?

jon-morra-zefr

Looks good.

jmorra

Looks good

JohnLangford · 2017-04-29T15:36:37Z

Merged, thank you everyone :-)

* Add prerequisites: zlib-devel (or zlib1g-dev) (#1226) * Added zlib-devel (or zlib1g-dev) to Prerequisites * Update README.md * do not use deprecated sklearn module (#1223) * Enable the C wrapper to extract model to memory and initialize model from memory (#1189) * extract model to memory and initialize model from memory * add get_confidence C wrapper function * JNI concurrent interface extensions (#1215) * replaced NativeUtils with Java's own library loader; Added MulticlassMultilabel Interface to JNI; generated OS dependant lib files; modified pom to include generated lib file in java library path; removed unused variable in multiclassleaner; * Since an instance of vw model is not thread safe for multiple predictions in parallel, we introduce a concurrent learner (for multilabel and multiline multiclass learners) that works with a pool of learners to get high throughput for predictions in an online setting. * Learner pool now creates vw instances of the same model using seed_vw_model method instead of initialize. Former method reuses shared variables from the seed learner instance and hence has much less memory footprint in comparison to latter which allocates new memory for all new instances created. * Fixed space alignment issue and changed getLearner api to return Optional to stop any unwanted NPEs. * Removed TODO as we now use seed_vw_model to instantiate multiple learner instances of the same model. * remove java8 components * added method to return Future in abstract concurrent predictor; added method to create concurrent predictor using thread pool and predictor pool of same size in factory; * tweaks

JohnLangford · 2017-06-01T02:38:01Z

This patch is causing problems and will need to be backed out unless it can be fixed. https://travis-ci.org/JohnLangford/vowpal_wabbit#L2107

Testing locally, I can only reproduce 5% of the time so this is some form of race condition.

@zhilians can you investigate and nail?

zhilians · 2017-06-01T06:51:05Z

Taking a look, I did a quick test and couldn't reproduce this error at all on Linux/Mac. I am now trying to replicate the same CI test environment (https://travis-ci.org/JohnLangford/vowpal_wabbit/jobs/238178163/config)

JohnLangford · 2017-06-01T11:43:19Z

My local repro was: repeat 100 (mvn test -f java/pom.xml | grep BUILD) which generated 5 "BUILD FAILURE" events.

…

-John

On Thu, Jun 1, 2017 at 2:51 AM, Zhiliang Su ***@***.***> wrote: Taking a look, I did a quick test and couldn't reproduce this error at all on Linux/Mac. I am now trying to replicate the same CI test environment ( https://travis-ci.org/JohnLangford/vowpal_wabbit/jobs/238178163/config) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1215 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE25sKGls2CZEbHqW9hUC5jZCh4YUuuks5r_l9agaJpZM4MdIP5> .

zhilians · 2017-06-05T12:44:23Z

Just some updates.

I've been trying to fix the test failures, seems like it really consistent and can reproduce every time on Travis CI. Running it 100 times on my virtual machine setup 100% like the Travis CI didn't throw me errors.

The only place where race condition could exist is when using the seed_model to create shallow copies of a given VW instance, which might not be thread-safe. So I tried adding a lock on the seed vw instance.

The test still unfortunately fails: https://travis-ci.org/zhilians/vowpal_wabbit/builds/238682875

So I think I can rule that part out.

Next step is to look into those IllegalArgumentException thrown.

JohnLangford · 2017-06-10T19:52:18Z

I've pulled this from the master branch now. I would like to get it back in, so if you can debug that would be great.

zhilians and others added 5 commits March 14, 2017 10:40

replaced NativeUtils with Java's own library loader; Added Multiclass…

5e0eecd

…Multilabel Interface to JNI; generated OS dependant lib files; modified pom to include generated lib file in java library path; removed unused variable in multiclassleaner;

Since an instance of vw model is not thread safe for multiple predict…

cec094b

…ions in parallel, we introduce a concurrent learner (for multilabel and multiline multiclass learners) that works with a pool of learners to get high throughput for predictions in an online setting.

Fixed space alignment issue and changed getLearner api to return Opti…

5d7a4c6

…onal to stop any unwanted NPEs.

Removed TODO as we now use seed_vw_model to instantiate multiple lear…

5069f7c

…ner instances of the same model.

JohnLangford added the Lang: Java label Mar 29, 2017

jmorra requested changes Mar 29, 2017

View reviewed changes

zhilians added 2 commits April 20, 2017 17:01

remove java8 components

16004c8

Merge branch 'master' into jni-extensions

21ba0ce

added method to return Future in abstract concurrent predictor; added…

4703f57

… method to create concurrent predictor using thread pool and predictor pool of same size in factory;

jon-morra-zefr approved these changes Apr 29, 2017

View reviewed changes

jmorra approved these changes Apr 29, 2017

View reviewed changes

JohnLangford merged commit ab8f09c into VowpalWabbit:master Apr 29, 2017

JNI concurrent interface extensions #1215

JNI concurrent interface extensions #1215

Conversation

zhilians commented Mar 14, 2017

JohnLangford commented Mar 29, 2017

eisber commented Mar 29, 2017 via email

jmorra left a comment

Choose a reason for hiding this comment

jmorra Mar 29, 2017

Choose a reason for hiding this comment

zhilians Mar 30, 2017

Choose a reason for hiding this comment

jmorra Mar 29, 2017

Choose a reason for hiding this comment

zhilians Mar 30, 2017 • edited Loading

Choose a reason for hiding this comment

jmorra Mar 29, 2017

Choose a reason for hiding this comment

zhilians Mar 30, 2017

Choose a reason for hiding this comment

atulvkamat Mar 30, 2017

Choose a reason for hiding this comment

jmorra commented Mar 29, 2017

zhilians commented Apr 21, 2017

deaktator commented Apr 21, 2017 via email

deaktator commented Apr 21, 2017 via email

deaktator commented Apr 22, 2017 via email

deaktator commented Apr 23, 2017 via email

jmorra commented Apr 24, 2017

atulvkamat commented Apr 24, 2017 • edited Loading

atulvkamat commented Apr 24, 2017

atulvkamat commented Apr 24, 2017

atulvkamat commented Apr 24, 2017

zhilians commented Apr 25, 2017 • edited Loading

deaktator commented Apr 25, 2017 via email

zhilians commented Apr 27, 2017 • edited Loading

zhilians commented Apr 27, 2017

atulvkamat commented Apr 27, 2017

deaktator commented Apr 27, 2017 via email

JohnLangford commented Apr 28, 2017 via email

deaktator commented Apr 29, 2017 via email

jon-morra-zefr left a comment

Choose a reason for hiding this comment

jmorra left a comment

Choose a reason for hiding this comment

JohnLangford commented Apr 29, 2017

JohnLangford commented Jun 1, 2017

zhilians commented Jun 1, 2017

JohnLangford commented Jun 1, 2017 via email

zhilians commented Jun 5, 2017

JohnLangford commented Jun 10, 2017

zhilians Mar 30, 2017 •

edited

Loading

atulvkamat commented Apr 24, 2017 •

edited

Loading

zhilians commented Apr 25, 2017 •

edited

Loading

zhilians commented Apr 27, 2017 •

edited

Loading