Skip to content

Commit

Permalink
updated readme, added comments, updated jmh
Browse files Browse the repository at this point in the history
  • Loading branch information
tomfran committed Oct 21, 2023
1 parent 4176c4a commit f161355
Show file tree
Hide file tree
Showing 15 changed files with 263 additions and 58 deletions.
63 changes: 47 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,11 @@ An implementation of the Log-Structured Merge Tree (LSM tree) data structure in

1. [Sorted String Table](#SSTable)
2. [Skip-List](#Skip-List)
3. [Benchmarks](#Benchmarks)
4. [Implementation status](#Implementation-status)
3. [Tree](#Tree)
4. [Benchmarks](#Benchmarks)
5. [Implementation status](#Implementation-status)

To interact with a toy tree you can use `./gradlew run`

---

Expand Down Expand Up @@ -89,8 +92,6 @@ use higher levels to skip unwanted nodes.
Given `n` elements, a skip list has `log(n)` levels, the first level containing all the elements.
By increasing the level, the number of elements is cut roughly by half.

![readme_imgs/skip-list.png](misc/skip-list.png)

To locate an element, we start from the top level and move forward until we find an element greater than the one
we are looking for. Then we move down to the next level and repeat the process until we find the element.

Expand All @@ -99,10 +100,34 @@ the operation on the node. All of them have an average time complexity of `O(log

---

## Tree

...

### Components

...

### Insertion

...

### Lookup

...

### Write-ahead logging

...

---

## Benchmarks

I am using [JMH](https://openjdk.java.net/projects/code-tools/jmh/) to run benchmarks,
the results are obtained on a MacBook Pro (16-inch, 2021) with an M1 Pro processor and 16 GB of RAM.
the results are obtained on AMD Ryzen™ 5 4600H with 16GB of RAM and 512GB SSD.

> Take those with a grain of salt, development is still in progress.
To run them use `./gradlew jmh`.

Expand All @@ -113,9 +138,9 @@ To run them use `./gradlew jmh`.

```
Benchmark Mode Cnt Score Error Units
c.t.l.sstable.SSTableBenchmark.negativeAccess thrpt 10 3541989.316 ± 78933.780 ops/s
c.t.l.sstable.SSTableBenchmark.randomAccess thrpt 10 56157.613 ± 264.314 ops/s
Benchmark Mode Cnt Score Error Units
c.t.l.sstable.SSTableBenchmark.negativeAccess thrpt 5 3316202.976 ± 32851.546 ops/s
c.t.l.sstable.SSTableBenchmark.randomAccess thrpt 5 7989.945 ± 40.689 ops/s
```

Expand All @@ -125,10 +150,9 @@ c.t.l.sstable.SSTableBenchmark.randomAccess thrpt 10 56157.613 ± 264.314 ops/s
- Contains: test whether the keys are present in the Bloom filter.

```
Benchmark Mode Cnt Score Error Units
c.t.l.bloom.BloomFilterBenchmark.add thrpt 10 9777191.526 ± 168208.916 ops/s
c.t.l.bloom.BloomFilterBenchmark.contains thrpt 10 10724196.205 ± 20411.741 ops/s
Benchmark Mode Cnt Score Error Units
c.t.l.bloom.BloomFilterBenchmark.add thrpt 5 3190753.307 ± 74744.764 ops/s
c.t.l.bloom.BloomFilterBenchmark.contains thrpt 5 3567392.634 ± 220377.613 ops/s
```

Expand All @@ -139,16 +163,22 @@ c.t.l.bloom.BloomFilterBenchmark.contains thrpt 10 10724196.205 ± 20411.741 ops

```
Benchmark Mode Cnt Score Error Units
c.t.l.memtable.SkipListBenchmark.addRemove thrpt 10 684885.546 ± 21793.787 ops/s
c.t.l.memtable.SkipListBenchmark.get thrpt 10 823423.128 ± 83028.354 ops/s
Benchmark Mode Cnt Score Error Units
c.t.l.memtable.SkipListBenchmark.addRemove thrpt 5 430239.471 ± 4825.990 ops/s
c.t.l.memtable.SkipListBenchmark.get thrpt 5 487265.620 ± 8201.227 ops/s
```

### Tree

- Get: get elements from a tree with 1M keys;
- Add: add 1M distinct elements to a tree with a memtable size of 2^18

```
...
Benchmark Mode Cnt Score Error Units
c.t.l.tree.LSMTreeAddBenchmark.add thrpt 5 540788.751 ± 54491.134 ops/s
c.t.l.tree.LSMTreeGetBenchmark.get thrpt 5 9426.951 ± 241.190 ops/s
```

---
Expand All @@ -170,6 +200,7 @@ c.t.l.memtable.SkipListBenchmark.get thrpt 10 823423.128 ± 83028.354 ops/s
- [x] Operations
- [x] Background flush
- [x] Background compaction
- [ ] Write ahead log
- [x] Benchmarks
- [x] SSTable
- [x] Bloom filter
Expand Down
5 changes: 2 additions & 3 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,9 @@ tasks.test {
}

jmh {
includes = ["LSMTreeAddBenchmark.*"]
fork = 1
warmupIterations = 0
iterations = 1
warmupIterations = 1
iterations = 5
benchmarkMode = ['thrpt']
jmhTimeout = '15s'
jmhVersion = '1.37'
Expand Down
Binary file removed misc/skip-list.png
Binary file not shown.
3 changes: 2 additions & 1 deletion src/jmh/java/com/tomfran/lsm/tree/LSMTreeAddBenchmark.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@ public void setup() throws IOException {
}

@TearDown
public void teardown() throws IOException {
public void teardown() throws IOException, InterruptedException {
tree.stop();
Thread.sleep(5000);
BenchmarkUtils.deleteDir(DIR);
}

Expand Down
7 changes: 5 additions & 2 deletions src/jmh/java/com/tomfran/lsm/tree/LSMTreeGetBenchmark.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
import java.nio.file.Path;
import java.util.concurrent.TimeUnit;

import static com.tomfran.lsm.utils.BenchmarkUtils.shuffleItems;

@OutputTimeUnit(TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class LSMTreeGetBenchmark {
Expand All @@ -23,12 +25,13 @@ public class LSMTreeGetBenchmark {
ByteArrayPair[] items;

@Setup
public void setup() throws IOException, InterruptedException {
public void setup() throws IOException {
tree = BenchmarkUtils.initTree(DIR, MEMTABLE_SIZE, LEVEL_SIZE);
items = BenchmarkUtils.fillItems(NUM_ITEMS);
for (var i : items)
tree.add(i);
Thread.sleep(5000);

shuffleItems(items);
}

@TearDown
Expand Down
11 changes: 11 additions & 0 deletions src/jmh/java/com/tomfran/lsm/utils/BenchmarkUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Random;

import static com.tomfran.lsm.TestUtils.getRandomPair;

Expand All @@ -27,6 +28,16 @@ public static ByteArrayPair[] fillItems(int n) {
return items;
}

public static void shuffleItems(ByteArrayPair[] v) {
var rn = new Random();
for (int i = 0; i < v.length; i++) {
var tmp = v[i];
int j = rn.nextInt(i, v.length);
v[i] = v[j];
v[j] = tmp;
}
}

public static void deleteDir(Path dir) throws IOException {
try (var files = Files.list(dir)) {
files.forEach(f -> {
Expand Down
8 changes: 4 additions & 4 deletions src/main/java/com/tomfran/lsm/bloom/BloomFilter.java
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
package com.tomfran.lsm.bloom;


import com.tomfran.lsm.io.BaseInputStream;
import com.tomfran.lsm.io.BaseOutputStream;
import com.tomfran.lsm.io.ExtendedInputStream;
import com.tomfran.lsm.io.ExtendedOutputStream;
import it.unimi.dsi.fastutil.longs.LongLongMutablePair;
import it.unimi.dsi.fastutil.longs.LongLongPair;
import org.apache.commons.codec.digest.MurmurHash3;
Expand Down Expand Up @@ -78,7 +78,7 @@ public BloomFilter(int size, int hashCount, long[] bits) {
* @return The Bloom filter.
*/
public static BloomFilter readFromFile(String filename) {
BaseInputStream is = new BaseInputStream(filename);
ExtendedInputStream is = new ExtendedInputStream(filename);
try {
int size = is.readVByteInt();
int hashCount = is.readVByteInt();
Expand Down Expand Up @@ -140,7 +140,7 @@ private LongLongMutablePair getHash(byte[] key) {
* @param filename The file to write to.
*/
public void writeToFile(String filename) {
BaseOutputStream os = new BaseOutputStream(filename);
ExtendedOutputStream os = new ExtendedOutputStream(filename);

os.writeVByteInt(size);
os.writeVByteInt(hashCount);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,20 @@
import java.io.FileInputStream;
import java.io.IOException;

public class BaseInputStream {
/**
* This class use a FastBufferedInputStream as a base and adds
* utility methods to it, mainly for reading variable-byte encoded longs and integers.
*/
public class ExtendedInputStream {

private final FastBufferedInputStream fis;

public BaseInputStream(String filename) {
/**
* Initialize an input stream on a file.
*
* @param filename the file filename.
*/
public ExtendedInputStream(String filename) {
try {
fis = new FastBufferedInputStream(new FileInputStream(filename));
fis.position(0);
Expand All @@ -19,10 +28,27 @@ public BaseInputStream(String filename) {
}
}

/**
* Read a variable byte int from the stream, see readVByteLong()
*
* @return the next V-Byte int.
*/
public int readVByteInt() {
return (int) readVByteLong();
}

/**
* Read a variable byte long from the stream.
* <p>
* A variable byte long is written as:
* <tt>|continuation bit| 7-bits payload|</tt>
* <p>
* For instance the number 10101110101010110 is represented using 24 bits as follows:
* <p>
* |1|1010110|1|0000101|0|0111010|
*
* @return the next V-Byte long.
*/
public long readVByteLong() {
long result = 0;
int b;
Expand All @@ -39,6 +65,11 @@ public long readVByteLong() {
return result - 1;
}

/**
* Read 8 bytes representing a long.
*
* @return the next long in the stream.
*/
public long readLong() {
try {
long result = 0;
Expand All @@ -52,6 +83,11 @@ public long readLong() {
}
}

/**
* Read a single byte as an int.
*
* @return the next 8-bits integer in the stream.
*/
public int readByteInt() {
try {
return fis.read();
Expand All @@ -60,6 +96,12 @@ public int readByteInt() {
}
}

/**
* Read N bytes.
*
* @param n the wanted number of bytes.
* @return an array with the next N bytes.
*/
public byte[] readNBytes(int n) {
try {
return fis.readNBytes(n);
Expand All @@ -68,6 +110,13 @@ public byte[] readNBytes(int n) {
}
}

/**
* Read a ByteArrayPair from the stream.
* <p>
* Each array is encoded as length, payload.
*
* @return the next item in the stream.
*/
public ByteArrayPair readBytePair() {
try {
int keyLength = readVByteInt();
Expand All @@ -82,6 +131,12 @@ public ByteArrayPair readBytePair() {
}
}

/**
* Skip N bytes from the stream.
*
* @param n the number of bytes to skip.
* @return the number of bytes skipped.
*/
public long skip(int n) {
try {
return fis.skip(n);
Expand All @@ -90,6 +145,11 @@ public long skip(int n) {
}
}

/**
* Position the stream at the wanted offset.
*
* @param offset the offset to place the stream to.
*/
public void seek(long offset) {
try {
fis.position(offset);
Expand All @@ -98,6 +158,9 @@ public void seek(long offset) {
}
}

/**
* Close resources.
*/
public void close() {
try {
fis.close();
Expand Down
Loading

0 comments on commit f161355

Please sign in to comment.