Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The throughput is inconsistent with the experimental results of the paper #3

Open
Whiteleaf3er opened this issue Apr 25, 2023 · 3 comments

Comments

@Whiteleaf3er
Copy link

I configured it as described in the paper, the trace is a 7:3 w/r ratio sample provided in this github, in the case of 64MiB CacheSize and 128MiB WSS

For Hit Ratio, austere can indeed perform better

However, the throughput of AC-D (800MiB/s) is only about half of CD-LRU-D (1900MiB/s), and the throughput of AC-D shown in the paper is better (and only 40MiB/s in the paper, which is not of the same order of magnitude as my results).

According to my observation, the main performance bottleneck lies in the index update, which may be related to the bucket structure you mentioned.
image
image
image

@Whiteleaf3er
Copy link
Author

"According to my observation, the main performance bottleneck lies in the index update"
Time Elapsed:
Time elpased for compression: 0
Time elpased for decompression: 0
Time elpased for computeFingerprint: 36133
Time elpased for dedup: 27030
Time elpased for lookup: 18817
Time elpased for update_index: 190536
Time elpased for io_ssd: 5681
Time elpased for io_hdd: 5024
Time elpased for debug: 0
The above is the result of cachedup CD-LRU , time elpased for update_index is still very high, which is very confusing because usually index_update doesn't take much time

@fallfish
Copy link
Owner

fallfish commented Apr 30, 2023

Hi Whiteleaf3er,

Thanks for sharing with us your finding! I'm able to reproduce your results with the current prototype. I want to share some of my thoughts with you.

I suppose that you also run the program with the sample configuration (with which I can reproduce your result), where I set the "fakeIO" to one. That setting actually means there is no real I/O issued to any physical storage - it is for testing purposes. In the paper, we used a SATA (probably also much slow compared to what we have now, i.e., NVMe devices) SSD as the cache device. Also, you may need to be aware of the OS page cache when performing a performance test (we use direct I/O to bypass that).

As for the index update for DLRU design, my current suspicion is that the overhead is reasonable. The OPS (operations per second) is 163840 / (190536 / 1000000.0) ~ 860K (or 5G/0.19 ~ 26GB/s). According to the current prototype, we have std::map for both LBA index and FP index, along with their LRU lists. I had a test with std::map<uint64_t, FP> with 163840 entries inserted on i5-7267U CPU @ 3.10GHz. With "-O3" the time consumed is already 39657.8 microseconds (around 1/5 of your test). Given that we have multiple such structures and several memcpy around, the results look reasonable to me, while surely there can be optimizations.

Nevertheless, I admit that the complexity of Austere Cache design, e.g., the re-arrangement of all the items in one single bucket in the sketch, can lead to software deficiency. The software deficiency can make it inferior compared to existing designs in platforms with faster devices; it seems now is the time to re-examine the design : ). One possible optimization is to use vectorized instructions, such that the memory movement (in a whole bucket) can be accomplished in one single instruction and could be even potentially pipelined.

Should you have new findings or any questions, please feel free to reach me.

Thanks,
Qiuping

@Whiteleaf3er
Copy link
Author

Hi Qiuping,

thank you for your reply! Yes, I use the sample for testing, in which fakeIO=1.
Whether it is CD-LRU or Austere, the main throughput bottleneck lies in update_index, especially Austere, where the time overhead of update_index is high. I will try to optimize it, thank you very much for your open-source sharing ser.

best wishes,
Whiteleaf3er

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants