Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs-index: Track API #85

Merged
merged 7 commits into from
Sep 3, 2024
Merged

fs-index: Track API #85

merged 7 commits into from
Sep 3, 2024

Conversation

tareknaser
Copy link
Collaborator

@tareknaser tareknaser commented Aug 10, 2024

Description

This PR adds a new method to the ResourceIndex in the fs-index crate. The added method, update_one, allows the index to be updated with the latest information from the file system for a single resource. It handles updates for resources that have been added, removed, or modified.

Fixes #84

Changes

  • New Method: ResourceIndex::update_one
  • Added Tests:
    • test_track_removal
    • test_track_modification
    • test_track_removal_with_collision
    • test_track_addition
    • test_track_modification_with_collision
    • test_track_addition_with_collision
  • Added Benchmarks:
    • index_update_one

Copy link

Benchmark for c2371c5

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 251.5±3.10µs 253.0±1.26µs +0.60%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.08µs 15.7±0.18µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1351.2±18.31ns 1345.5±4.14ns -0.42%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.3±0.66µs 197.6±0.56µs +0.15%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1773.2±5.55µs 1783.2±18.68µs +0.56%
crc32_resource_id_creation/compute_from_bytes:large 91.9±1.32µs 92.1±0.34µs +0.22%
crc32_resource_id_creation/compute_from_bytes:medium 5.7±0.01µs 5.7±0.06µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 96.7±0.44ns 96.8±0.54ns +0.10%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 64.8±0.35µs 68.2±0.58µs +5.25%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 961.5±15.87µs 953.8±3.66µs -0.80%
resource_index/index_build//tmp/ark-fs-index-benchmarks5tAZqO 109.5±1.86ms N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarkssKpmwg 108.4±2.69ms N/A N/A
resource_index/index_get_resource_by_id 95.2±0.93ns 98.5±0.48ns +3.47%
resource_index/index_get_resource_by_path 48.2±0.29ns 54.6±0.76ns +13.28%
resource_index/index_update_all 1091.1±34.55ms 1100.1±40.04ms +0.82%

fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
Copy link

Benchmark for 95ad4a0

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 251.1±1.71µs 248.6±1.01µs -1.00%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.05µs 15.7±0.27µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1384.8±7.61ns 1380.7±20.55ns -0.30%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.2±0.44µs 197.6±0.55µs +0.20%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1747.4±7.78µs 1752.7±21.17µs +0.30%
crc32_resource_id_creation/compute_from_bytes:large 86.9±0.27µs 86.8±0.71µs -0.12%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.05µs 5.4±0.02µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.9±0.22ns 93.1±0.79ns +0.22%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.1±0.21µs 65.8±2.95µs +1.08%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 983.5±6.18µs 990.6±30.84µs +0.72%
resource_index/index_build//tmp/ark-fs-index-benchmarksEFluuv 107.6±1.54ms N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksW9sZ4f 108.5±2.81ms N/A N/A
resource_index/index_get_resource_by_id 98.2±1.03ns 97.1±0.97ns -1.12%
resource_index/index_get_resource_by_path 52.5±0.19ns 52.6±0.29ns +0.19%
resource_index/index_update_all 1103.8±29.80ms 1151.4±45.51ms +4.31%

@tareknaser tareknaser marked this pull request as ready for review August 27, 2024 17:31
Copy link

Benchmark for f653836

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 249.8±0.98µs 249.0±1.14µs -0.32%
blake3_resource_id_creation/compute_from_bytes:medium 15.6±0.07µs 15.7±0.09µs +0.64%
blake3_resource_id_creation/compute_from_bytes:small 1370.1±8.87ns 1386.1±5.77ns +1.17%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.3±0.54µs 197.5±0.69µs +0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1744.0±9.17µs 1752.7±19.67µs +0.50%
crc32_resource_id_creation/compute_from_bytes:large 86.7±0.39µs 86.9±0.36µs +0.23%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.08µs 5.4±0.01µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.9±0.24ns 92.9±0.26ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.2±0.16µs 64.9±0.30µs -0.46%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 996.8±14.09µs 993.3±7.19µs -0.35%
resource_index/index_build//tmp/ark-fs-index-benchmarks3iyrap 112.7±1.85ms N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarks5pHxeV 111.3±0.84ms N/A N/A
resource_index/index_get_resource_by_id 112.6±0.74ns 99.5±0.37ns -11.63%
resource_index/index_get_resource_by_path 53.8±0.28ns 55.0±0.48ns +2.23%
resource_index/index_update_all 1131.1±36.64ms 1127.5±48.74ms -0.32%
resource_index/index_update_one 508.7±16.94ms N/A N/A

@@ -10,6 +10,7 @@ The most important struct in this crate is `ResourceIndex` which comes with:

- **Reactive API**
- `update_all`: Method to update the index by rescanning files and returning changes (additions/deletions).
- `update_one`: Method to update the index by rescanning a single file (addition/deletion/modification).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be mentioned in other section. "Reactive" means that we have a stream of updates and we react on these updates, even if we manually poll the stream. But this "cherry-pick" is completely manual approach, not reactive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. What about the term "Selective API"?

fs-index/src/index.rs Outdated Show resolved Hide resolved
@@ -666,3 +666,245 @@ fn test_update_all_files_with_same_hash() {
assert_eq!(index.collisions().len(), 1, "{:?}", index);
});
}

/// Simple test for tracking a single resource addition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the arklib PR we also tried to implement an automated test scenario, where we'd test one specific invariant that "any sequence of update_one calls must be equivalent to update_all on the full data". My idea how to approach it is something like:

  1. Start with empty directory.
  2. Generate an update, randomly one of 3 kinds (modification and deletion could be generated only when we have some files).
  3. Call update_one on the updated file.
  4. In isolation, build new index from the current state of the directory.
  5. Compare outputs of step 3 and step 4. Must be equal.

This is not exactly a unit test, because it's slow and tests multiple corner cases in one scenario. But I think it must be useful to run it periodically. By the way, I think that Ishan and Pushkar do in fs-storage with property-based testing should reflect the same idea.

Do you think it's feasible to implement in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan.
We could create multiple tests to verify this as well.
To do this properly and cleanly, we’ll likely need to set up some helper functions.

Given the scope, I think this should be handled in its own PR. This current PR is blocking #36, so once it's merged, I can add the additional tests along with the index watch API.

What do you think?

Copy link

github-actions bot commented Sep 1, 2024

Benchmark for ac3828d

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 249.3±1.06µs 250.6±0.87µs +0.52%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.02µs 15.7±0.05µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1375.7±17.36ns 1391.4±10.58ns +1.14%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 198.3±3.82µs 198.3±6.57µs 0.00%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1739.7±3.91µs 1748.7±23.97µs +0.52%
crc32_resource_id_creation/compute_from_bytes:large 86.9±1.12µs 86.9±0.82µs 0.00%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.03µs 5.4±0.02µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 93.0±0.44ns 93.0±0.22ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.0±0.32µs 65.7±4.19µs +1.08%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 983.7±2.83µs 983.2±4.01µs -0.05%
resource_index/index_build//tmp/ark-fs-index-benchmarks9KGeK6 108.3±2.56ms N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarkspves2w 105.7±1.86ms N/A N/A
resource_index/index_get_resource_by_id 96.8±0.44ns 97.0±0.41ns +0.21%
resource_index/index_get_resource_by_path 53.4±0.33ns 53.2±0.24ns -0.37%
resource_index/index_update_all 1099.1±25.96ms 1098.0±36.31ms -0.10%
resource_index/index_update_one 659.0±21.32ms N/A N/A

Copy link
Member

@kirillt kirillt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great 👍

Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
@tareknaser tareknaser merged commit 56343b4 into main Sep 3, 2024
4 checks passed
@tareknaser tareknaser deleted the fs_index_track_API branch September 3, 2024 16:18
Copy link

github-actions bot commented Sep 3, 2024

Benchmark for 52515f0

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 247.8±0.99µs 247.6±1.08µs -0.08%
blake3_resource_id_creation/compute_from_bytes:medium 15.5±0.15µs 15.5±0.07µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1366.4±23.70ns 1351.8±4.71ns -1.07%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.4±0.55µs 197.7±0.72µs +0.15%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1764.6±18.36µs 1772.4±39.04µs +0.44%
crc32_resource_id_creation/compute_from_bytes:large 86.7±0.23µs 86.7±0.17µs 0.00%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.01µs 5.4±0.05µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.3±0.13ns 92.3±0.23ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 64.6±0.29µs 65.0±0.17µs +0.62%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 957.0±15.95µs 952.1±2.78µs -0.51%
resource_index/index_build//tmp/ark-fs-index-benchmarksMREvgy 114.6±1.05ms N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksXtEFlT 112.4±1.10ms N/A N/A
resource_index/index_get_resource_by_id 97.0±0.78ns 96.6±0.45ns -0.41%
resource_index/index_get_resource_by_path 53.0±1.17ns 53.5±0.38ns +0.94%
resource_index/index_update_all 1118.2±38.69ms 1132.8±54.47ms +1.31%
resource_index/index_update_one 696.4±26.04ms 682.9±30.14ms -1.94%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fs-index: Track API
2 participants