Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized CircularQueue by using ArrayBlockingQueue with constant time size implementation #1237

Merged
merged 5 commits into from
Sep 30, 2019

Conversation

gavlyukovskiy
Copy link
Contributor

Since AbstractInstanceRegistry already have synchronization on recentRegisteredQueue and recentCanceledQueue it's safe to change CircularQueue implementation to extend LinkedList with constant time size().

Benchmark:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@State(Scope.Benchmark)
@Fork(value = 2)
@Warmup(iterations = 3, batchSize = 10000)
@Measurement(iterations = 3, batchSize = 10000)
public class CircularQueueBenchmark {

    private final EurekaCircularQueue<String> eureka = new EurekaCircularQueue<>(10000);
    private final EurekaCircularQueueLinkedList<String> eurekaLinkedList = new EurekaCircularQueueLinkedList<>(10000);

    @Setup(Level.Iteration)
    public void setup(){
        eureka.clear();
        eurekaLinkedList.clear();
    }

    @Benchmark
    public EurekaCircularQueue<String> eureka() {
        synchronized (eureka) {
            eureka.add("something");
        }
        return eureka;
    }

    @Benchmark
    public EurekaCircularQueueLinkedList<String> eurekaLinkedList() {
        synchronized (eurekaLinkedList) {
            eurekaLinkedList.add("something");
        }
        return eurekaLinkedList;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(CircularQueueBenchmark.class.getSimpleName())
                .build();

        new Runner(opt).run();
    }
}

Results:

Benchmark                                 Mode  Cnt     Score     Error  Units
CircularQueueBenchmark.eureka            thrpt    6     3.297 ±   0.005  ops/s
CircularQueueBenchmark.eurekaLinkedList  thrpt    6  3893.782 ± 110.320  ops/s

fixes #1236

Copy link
Contributor

@troshko111 troshko111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    @Override
    public void clearRegistry() {
        overriddenInstanceStatusMap.clear();
        **recentCanceledQueue.clear();**
        **recentRegisteredQueue.clear();**
        recentlyChangedQueue.clear();
        registry.clear();
    }

The highlighted (**) calls are not thread-safe now.

Also ideally this should be an array-based circular buffer as both the old and the new implementations are not cache-friendly and are slow to iterate which is one of their primary uses, but it does not matter in this context much (if at all), I'm ok with this change as it makes perfect sense to me as long as you address the issues.

Copy link
Contributor

@troshko111 troshko111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments.

@gavlyukovskiy
Copy link
Contributor Author

Also ideally this should be an array-based circular buffer as both the old and the new implementations are not cache-friendly and are slow to iterate which is one of their primary uses, but it does not matter in this context much (if at all), I'm ok with this change as it makes perfect sense to me as long as you address the issues.

I was also thinking about it, and I even have working implementation, but I thought it would be too risky for you to accept it considering low performance difference (I measured only add) between array and linked list backed queues :)

I added commit with additional synchronization and array backed implementation. I also noticed that iteration just moves elements to array list, which is much faster when using collection as a source, especially with efficient toArray implementation. I made some measurements and it looks pretty impressive:

CircularQueueIterateBenchmark.arrayCircularQueue                             thrpt    3   14177.309 ±  291.964  ops/s
CircularQueueIterateBenchmark.arrayCircularQueueConstructor                  thrpt    3   45361.993 ± 2427.956  ops/s
CircularQueueIterateBenchmark.arrayCircularQueueConstructorOptimizedToArray  thrpt    3  264552.864 ± 5394.090  ops/s
CircularQueueIterateBenchmark.eurekaLinkedList                               thrpt    3   11591.028 ±  819.217  ops/s

I don't know how to create a benchmark that has proper memory layout to show higher cache hit rate for array backed implementation, but even difference between loop copy - arrayCircularQueue and constructor with efficient toArray - arrayCircularQueueConstructorOptimizedToArray is impressive!

@troshko111
Copy link
Contributor

Sorry for the delay on this one, I'm giving this full attention now.

@@ -1278,29 +1280,113 @@ long getCurrentTimeNano() { // for testing

}

private class CircularQueue<E> extends ConcurrentLinkedQueue<E> {
/* visible for testing */ static class CircularQueue<E> extends AbstractQueue<E> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me but considering you're rolling out a data structure, the amount of unit tests is expected to be much larger, I don't see test cases for [0, 1, even #, odd #] capacity buffers, tests for ops when under capacity, same tests for at capacity, etc. The risk/benefit ratio is just too high.

Based on your benchmarks and iterations so far, it seems like the best way forward is to

This would be correct, about the same performance, get rid of unnecessary synchronization, require no additional testing.

Let me know what you think.

Copy link
Contributor

@brharrington brharrington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be simpler to wrap an ArrayBlockingQueue. Something like:

import java.util.AbstractQueue;
import java.util.Iterator;
import java.util.concurrent.ArrayBlockingQueue;

public class CircularQueue<T> extends AbstractQueue<T> {

  private final ArrayBlockingQueue<T> underlying;

  CircularQueue(int capacity) {
    this.underlying = new ArrayBlockingQueue<>(capacity);
  }

  @Override
  public Iterator<T> iterator() {
    return underlying.iterator();
  }

  @Override
  public int size() {
    return underlying.size();
  }

  @Override
  public boolean offer(T item) {
    while (!underlying.offer(item)) {
      underlying.poll();
    }
    return true;
  }

  @Override
  public T poll() {
    return underlying.poll();
  }

  @Override
  public T peek() {
    return underlying.peek();
  }

  // Override for efficiency

  @Override
  public void clear() {
    underlying.clear();
  }

  @Override
  public Object[] toArray() {
    return underlying.toArray();
  }
}

@gavlyukovskiy
Copy link
Contributor Author

@brharrington I agree with you, I checked the code and implementation is the same as mine, except locking. And I more believe Doug Lea than myself to implement and test data structures :)

Also it looks like with this approach we can remove locking on insertion as this code will be sufficient even under race:

@Override
  public boolean offer(T item) {
    while (!underlying.offer(item)) {
      underlying.poll();
    }
    return true;
  }

@troshko111 what do you think on this approach?

@troshko111
Copy link
Contributor

This collection is thread-safe and based on the usage in the instance registry, all of the sync code can be removed, i.e. clear, element addition, toArray.

@gavlyukovskiy
Copy link
Contributor Author

@troshko111 I updated the PR with suggestions from @brharrington. Please review

Copy link
Contributor

@troshko111 troshko111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@troshko111 troshko111 merged commit d744f4c into Netflix:master Sep 30, 2019
@gavlyukovskiy gavlyukovskiy changed the title Optimized CircularQueue by extending LinkedList with constant time size implementation Optimized CircularQueue by using ArrayBlockingQueue with constant time size implementation Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AbstractInstanceRegistry$CircularQueue is inneficient
3 participants