Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(deps): Upgrade OpenDAL to 0.47 #4224

Merged
merged 10 commits into from
Jul 1, 2024
Merged

Conversation

tisonkun
Copy link
Contributor

@tisonkun tisonkun commented Jun 27, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Upgrade OpenDAL to 0.47.1.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • Refactor
    • Improved the efficiency and flexibility of cache handling by making LruCacheLayer and LruCacheAccess generic over a type that implements Access.
    • Renamed read_cache_stat to cache_stat for consistency.
    • Updated various method signatures to enhance performance and reliability.

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun requested review from evenyag, v0y4g3r, waynexia and a team as code owners June 27, 2024 21:31

This comment was marked as spam.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jun 27, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b6585e3 and 6e324fc.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (12)
  • src/common/datasource/src/file_format.rs (1 hunks)
  • src/common/datasource/src/file_format/csv.rs (1 hunks)
  • src/common/datasource/src/file_format/json.rs (1 hunks)
  • src/common/datasource/src/file_format/parquet.rs (2 hunks)
  • src/mito2/src/cache/write_cache.rs (1 hunks)
  • src/mito2/src/sst/index/applier.rs (2 hunks)
  • src/mito2/src/sst/index/store.rs (1 hunks)
  • src/object-store/Cargo.toml (1 hunks)
  • src/object-store/src/layers/lru_cache.rs (2 hunks)
  • src/object-store/src/layers/lru_cache/read_cache.rs (8 hunks)
  • src/object-store/src/layers/prometheus.rs (2 hunks)
  • src/operator/src/statement/copy_table_from.rs (2 hunks)
Files skipped from review due to trivial changes (1)
  • src/object-store/src/layers/prometheus.rs
Additional comments not posted (11)
src/object-store/Cargo.toml (1)

20-20: Dependency version updated successfully.

The opendal dependency has been updated to version "0.47". Ensure that all features listed under this dependency are compatible with the new version.

src/common/datasource/src/file_format.rs (1)

152-154: Enhanced asynchronous handling in open_with_decoder.

The function now properly handles asynchronous operations with .await and improved error mapping. This aligns with Rust's best practices for asynchronous programming and error handling.

src/common/datasource/src/file_format/json.rs (1)

90-97: Improved error handling in JsonFormat.

The addition of .context(error::ReadObjectSnafu { path })? after asynchronous reader operations provides better error context, enhancing maintainability and debuggability of the code.

src/common/datasource/src/file_format/parquet.rs (1)

55-62: Consistent and robust error handling in ParquetFormat.

The addition of .context(error::ReadObjectSnafu { path })? after asynchronous operations is a significant enhancement for error handling, ensuring more reliable and maintainable code.

src/mito2/src/sst/index/store.rs (1)

70-72: Enhanced asynchronous operation and error handling in reader method.

The changes introduce proper awaiting of asynchronous operations and improve error handling by using .context(OpenDalSnafu). This ensures that any errors during file operations are caught and handled appropriately, enhancing the robustness of the method.

src/mito2/src/sst/index/applier.rs (1)

128-140: Refined asynchronous read operation in cached_puffin_reader.

The changes here enhance the asynchronous read operation by properly awaiting the operation and adding context-specific error handling. This improvement is crucial for robust error reporting and handling, ensuring that errors are not only caught but also described with sufficient context.

src/common/datasource/src/file_format/csv.rs (1)

172-179: Improved asynchronous operation and error handling in infer_schema.

The changes introduce proper awaiting of asynchronous operations and enhance error handling using .context(error::ReadObjectSnafu). This ensures that any errors during file operations are caught and handled appropriately, enhancing the robustness and reliability of the schema inference process.

src/object-store/src/layers/lru_cache/read_cache.rs (1)

164-186: Refined read operation handling in read method.

The changes here improve how read operations are handled by properly awaiting asynchronous operations and introducing a boxed reader, which is crucial for handling polymorphism in Rust effectively. This ensures that the read operation is not only asynchronous but also adaptable to different reader implementations.

src/mito2/src/cache/write_cache.rs (1)

191-193: Enhanced asynchronous read and error handling in upload method.

The changes introduce proper awaiting of asynchronous operations and enhance error handling using .context(error::OpenDalSnafu). This ensures that any errors during file operations are caught and handled appropriately, enhancing the robustness of the upload process.

src/operator/src/statement/copy_table_from.rs (2)

159-160: Enhanced error handling in asynchronous file reading operations.

The addition of .await and .context for error handling in the collect_metadata method under the Parquet format case is a significant improvement. This ensures that any errors during the file reading operation are properly handled and contextualized, which aligns with the PR's objectives to enhance error handling mechanisms.


306-307: Consistent application of enhanced error handling in asynchronous operations.

Similar to the previous comment, the use of .await and .context in the build_read_stream method for the Parquet case is commendable. This consistency in applying enhanced error handling across different parts of the file ensures that the system's robustness and reliability are maintained, especially when dealing with asynchronous operations.

src/object-store/src/layers/lru_cache.rs Outdated Show resolved Hide resolved
Signed-off-by: tison <wander4096@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6e324fc and c4025fb.

Files selected for processing (1)
  • src/mito2/src/sst/index/applier.rs (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/mito2/src/sst/index/applier.rs

@Xuanwo
Copy link
Contributor

Xuanwo commented Jun 28, 2024

The experience of implementing caching in OpenDAL layer is not cool. I plan to make it easier in the future.

* Fix cache for 0471

Signed-off-by: Xuanwo <github@xuanwo.io>

* Make clippy happy

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c4025fb and 7867276.

Files selected for processing (4)
  • src/datanode/src/store.rs (3 hunks)
  • src/object-store/src/layers/lru_cache.rs (4 hunks)
  • src/object-store/src/layers/lru_cache/read_cache.rs (7 hunks)
  • src/object-store/tests/object_store_test.rs (4 hunks)
Additional comments not posted (19)
src/object-store/src/layers/lru_cache.rs (3)

Line range hint 36-51: Ensure correctness of async operations and error handling.

The new function initializes the LruCacheLayer with a ReadCache and then recovers the cache. Verify the correctness of the async operations and error handling.


Line range hint 78-119: Ensure correctness and thread safety of async and blocking operations.

The LayeredAccess implementation for LruCacheAccess defines various async and blocking operations. Verify the correctness and thread safety of these operations.


Line range hint 59-65: Ensure correctness of cloning operation.

The Layer implementation for LruCacheLayer defines the LayeredAccess type and the layer function. Verify the correctness of the cloning operation.

src/datanode/src/store.rs (2)

Line range hint 32-44: Ensure correctness of async operations and error handling.

The new_object_store function initializes an object store based on the provided configuration and data home. Verify the correctness of the async operations and error handling.

Verification successful

Line range hint 32-44: Verified async operations and error handling.

The new_object_store function correctly handles async operations using await and ensures proper error handling with context.

  • Verified usage of await for async operations.
  • Verified usage of context for error handling.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the correctness of async operations and error handling in the `new_object_store` function.

# Test: Check for the usage of `await` and `context` in the `new_object_store` function.
rg 'await' src/datanode/src/store.rs
rg 'context' src/datanode/src/store.rs

Length of output: 854


112-114: Ensure correctness of async operations and error handling.

The create_object_store_with_cache function initializes a cache layer for the object store based on the provided configuration. Verify the correctness of the async operations and error handling.

Verification successful

Correctness of async operations and error handling verified.

The create_object_store_with_cache function correctly uses async operations and handles errors as expected.

  • The function is defined as async fn.
  • It uses await for async operations.
  • Error handling is done using the context method.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the correctness of async operations and error handling in the `create_object_store_with_cache` function.

# Test: Check for the usage of `await` and `context` in the `create_object_store_with_cache` function.
rg 'await' src/datanode/src/store.rs
rg 'context' src/datanode/src/store.rs

Length of output: 854



Script:

#!/bin/bash
# Extract the implementation of the `create_object_store_with_cache` function.
ast-grep --lang rust --pattern 'async fn create_object_store_with_cache($_) -> $_ { $$$ }' src/datanode/src/store.rs

Length of output: 118



Script:

#!/bin/bash
# Broaden the search to include both async and non-async functions for `create_object_store_with_cache`.
rg 'fn create_object_store_with_cache' src/datanode/src/store.rs

Length of output: 107



Script:

#!/bin/bash
# Extract the complete implementation of the `create_object_store_with_cache` function.
rg -A 20 'async fn create_object_store_with_cache' src/datanode/src/store.rs

Length of output: 875



Script:

#!/bin/bash
# Extract more lines to get the complete implementation of the `create_object_store_with_cache` function.
rg -A 50 'async fn create_object_store_with_cache' src/datanode/src/store.rs

Length of output: 2017



Script:

#!/bin/bash
# Extract more lines to get the complete implementation of the `create_object_store_with_cache` function.
rg -A 100 'async fn create_object_store_with_cache' src/datanode/src/store.rs

Length of output: 3721

src/object-store/src/layers/lru_cache/read_cache.rs (5)

Line range hint 81-252: Ensure correctness of async operations and error handling.

The ReadCache implementation defines various async operations for cache management. Verify the correctness of these operations and error handling.


72-79: Ensure correctness of cloning operation.

The Clone implementation for ReadCache clones the file cache and memory cache. Verify the correctness of the cloning operation.


Line range hint 302-312: Ensure correctness of test cases.

The test_can_cache function tests the can_cache function. Verify the correctness of the test cases.

Verification successful

Ensure correctness of test cases.

The test cases in the test_can_cache function are correct and consistent with the implementation of the can_cache function.

  • The function can_cache returns true if the path does not end with _last_checkpoint.
  • The test cases correctly cover both scenarios where the path ends and does not end with _last_checkpoint.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the correctness of test cases in the `test_can_cache` function.

# Test: Check for the usage of `assert!` in the `test_can_cache` function.
rg 'assert!' src/object-store/src/layers/lru_cache/read_cache.rs

Length of output: 394



Script:

#!/bin/bash
# Locate the implementation of the `can_cache` function to verify its correctness.
ast-grep --lang rust --pattern 'fn can_cache($_) { $$$ }'

Length of output: 59



Script:

#!/bin/bash
# Broaden search to locate the `can_cache` function across all Rust files.
rg 'fn can_cache' --type rust

Length of output: 118



Script:

#!/bin/bash
# Extract the implementation of the `can_cache` function from the identified file.
ast-grep --lang rust --pattern 'fn can_cache($_) { $$$ }' src/object-store/src/layers/lru_cache/read_cache.rs

Length of output: 111



Script:

#!/bin/bash
# Extract the implementation of the `can_cache` function from the identified file using sed.
sed -n '/fn can_cache/,/}/p' src/object-store/src/layers/lru_cache/read_cache.rs

Length of output: 198


55-60: Ensure correctness of cache key generation.

The read_cache_key function generates a unique cache key for the read path and range. Verify the correctness of the key generation.


298-299: Ensure correctness of reader conversion.

The to_output_reader function converts the input reader to the output reader. Verify the correctness of the conversion.

Verification successful

Verification successful.

The conversion in the to_output_reader function is correct. The type alias Reader is defined as Box<dyn ReadDyn>, making the use of Box::new(input.1) appropriate.

  • The to_output_reader function takes a tuple (RpRead, R) and returns (RpRead, Reader).
  • The second element of the tuple, input.1, is boxed into a Box<dyn ReadDyn>.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the correctness of reader conversion in the `to_output_reader` function.

# Test: Check for the usage of `Box::new` in the `to_output_reader` function.
rg 'Box::new' src/object-store/src/layers/lru_cache/read_cache.rs

Length of output: 98



Script:

#!/bin/bash
# Description: Extract the types of `input.0` and `input.1` in the `to_output_reader` function to verify the correctness of the conversion.

# Locate the `to_output_reader` function and extract its definition along with the types of `input.0` and `input.1`.
ast-grep --lang rust --pattern $'fn to_output_reader($_: ($_, $_)) -> $_ {
  $$$
}' src/object-store/src/layers/lru_cache/read_cache.rs

Length of output: 136



Script:

#!/bin/bash
# Description: Locate the `to_output_reader` function definition and extract its details to verify the correctness of the conversion.

# Search for the `to_output_reader` function definition in the specified file.
rg 'fn to_output_reader' src/object-store/src/layers/lru_cache/read_cache.rs -A 10

Length of output: 319



Script:

#!/bin/bash
# Description: Confirm the type of `Reader` to ensure the correctness of the conversion in the `to_output_reader` function.

# Search for the definition or type alias of `Reader` in the specified file.
rg 'type Reader' src/object-store/src/layers/lru_cache/read_cache.rs

Length of output: 70



Script:

#!/bin/bash
# Description: Confirm the type of `Reader` to ensure the correctness of the conversion in the `to_output_reader` function.

# Search for the definition or type alias of `Reader` in the entire repository.
rg 'type Reader' -t rust

Length of output: 946

src/object-store/tests/object_store_test.rs (9)

25-25: Import List and Read from opendal::raw::oio.

This change aligns with the updated opendal dependency.


26-26: Import Access, OpList, and OpRead from opendal::raw.

This change aligns with the updated opendal dependency.


232-232: Use builder.build() instead of ObjectStore::new(builder).unwrap().finish().

This change aligns with the updated opendal dependency and simplifies the code.


240-240: Use builder.build() instead of ObjectStore::new(builder).unwrap().finish().

This change aligns with the updated opendal dependency and simplifies the code.


245-247: Use OperatorBuilder to create the store with cache_layer.

This change aligns with the updated opendal dependency and improves the readability and maintainability of the code.


257-257: Add assert_lru_cache utility function.

This function enhances the test coverage for the LRU cache functionality.


263-286: Add assert_cache_files utility function.

This function enhances the test coverage for the LRU cache functionality.


315-315: Use file_cache.clone() instead of file_cache.

This change aligns with the updated opendal dependency and simplifies the code.


315-315: Use file_cache.clone() instead of file_cache.

This change aligns with the updated opendal dependency and simplifies the code.

src/object-store/src/layers/lru_cache.rs Show resolved Hide resolved
Copy link

codecov bot commented Jun 29, 2024

Codecov Report

Attention: Patch coverage is 76.42276% with 29 lines in your changes missing coverage. Please review.

Project coverage is 84.55%. Comparing base (e531326) to head (0834917).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4224      +/-   ##
==========================================
- Coverage   84.84%   84.55%   -0.29%     
==========================================
  Files        1049     1049              
  Lines      186267   186312      +45     
==========================================
- Hits       158029   157541     -488     
- Misses      28238    28771     +533     

Signed-off-by: tison <wander4096@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7867276 and c8b3204.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (2)
  • src/datanode/src/store.rs (3 hunks)
  • src/object-store/src/layers/lru_cache/read_cache.rs (7 hunks)
Files skipped from review as they are similar to previous changes (2)
  • src/datanode/src/store.rs
  • src/object-store/src/layers/lru_cache/read_cache.rs

@Xuanwo
Copy link
Contributor

Xuanwo commented Jun 29, 2024

coderabbitai doesn't look good to me, more like spam.

Copy link
Member

@WenyXu WenyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c8b3204 and c6e9f12.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (2)
  • src/object-store/src/layers/lru_cache.rs (4 hunks)
  • src/object-store/src/layers/lru_cache/read_cache.rs (7 hunks)
Files skipped from review as they are similar to previous changes (2)
  • src/object-store/src/layers/lru_cache.rs
  • src/object-store/src/layers/lru_cache/read_cache.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c6e9f12 and 8e434db.

Files selected for processing (2)
  • src/object-store/src/layers/lru_cache.rs (4 hunks)
  • src/object-store/src/layers/lru_cache/read_cache.rs (7 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/object-store/src/layers/lru_cache.rs
Additional comments not posted (10)
src/object-store/src/layers/lru_cache/read_cache.rs (10)

55-60: LGTM!

The function read_cache_key now uses OpRead for generating the cache key. This change is consistent with the new structure.


65-67: LGTM!

The ReadCache struct is now generic over a type C that implements the Access trait. The Clone implementation has been added correctly. No issues found.

Also applies to: 72-79


Line range hint 81-109: LGTM!

The new function now accepts an Arc<C> instead of an Operator. The changes are consistent and correctly implemented.


120-123: LGTM!

The stat method has been renamed to cache_stat. The changes are consistent and correctly implemented.


Line range hint 145-165: LGTM!

The recover_cache method now uses OpList and OpStat. The changes are consistent and correctly implemented.


178-231: LGTM!

The read_from_cache method now returns a Box<dyn ReadDyn> instead of Arc<dyn ReadDyn>. The changes are consistent and correctly implemented.


234-252: LGTM!

The try_write_cache method now accepts a Reader instead of a Buffer. The changes are consistent and correctly implemented.


Line range hint 254-294: LGTM!

The read_remote method now accepts a Reader instead of a Buffer. The changes are consistent and correctly implemented.


298-299: LGTM!

The function to_output_reader has been updated to return a Box<dyn ReadDyn> instead of an Arc<dyn ReadDyn>. This change is consistent with the new structure.


Line range hint 303-312: LGTM!

The test cases for the can_cache function cover various scenarios and are correctly implemented.

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun enabled auto-merge July 1, 2024 16:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8e434db and 0834917.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (1)
  • src/object-store/src/layers/lru_cache.rs (4 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/object-store/src/layers/lru_cache.rs

@tisonkun tisonkun added this pull request to the merge queue Jul 1, 2024
Merged via the queue into GreptimeTeam:main with commit 2665616 Jul 1, 2024
52 checks passed
@tisonkun tisonkun deleted the opendal-0471 branch July 1, 2024 17:23
v0y4g3r pushed a commit to v0y4g3r/greptimedb that referenced this pull request Jul 3, 2024
* catch up changes

Signed-off-by: tison <wander4096@gmail.com>

* fmt

Signed-off-by: tison <wander4096@gmail.com>

* Fix cache for 0471 (GreptimeTeam#7)

* Fix cache for 0471

Signed-off-by: Xuanwo <github@xuanwo.io>

* Make clippy happy

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* tidy

Signed-off-by: tison <wander4096@gmail.com>

* use opendal's exported type

Signed-off-by: tison <wander4096@gmail.com>

* clippy

Signed-off-by: tison <wander4096@gmail.com>

* fmt

Signed-off-by: tison <wander4096@gmail.com>

---------

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Xuanwo <github@xuanwo.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants