Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(clp-s): Add end-to-end test case for compression and extraction. #595

Merged

Conversation

AVMatthews
Copy link
Contributor

@AVMatthews AVMatthews commented Nov 18, 2024

Description

Adds a test case that runs the full clp-s pipeline from JSON to archive and back to JSON to ensure data consistency. Adds corresponding example JSON files to test on.

Notes:

  • Due to discrepancies in the number of significant digits produces in the reconstruction of the JSON file, the current JSON file used in the test, does not contain floats. I leave testing of float value accuracy to a later time.
  • The values of int64_max_jq_losslessly_represents and int64_min_jq_losslessly_represents in json file used for testing is not utilizing the full 2^63 value bits of a int64 because jq does not support full 64 bit integers. Instead it supports integers in the range of -2^53 ... 2^53. 125276004817190914. Integers out of this range it approximates (Error with reading large integer? jqlang/jq#369).

Validation performed

Compiled and ran the test to ensure that the REQUIRE statements are met and manually checked output files to see that the correct data is produced.

Summary by CodeRabbit

  • New Features

    • Introduced a new end-to-end testing framework for the clp_s module, focusing on JSON data compression and extraction.
    • Added numerous source files to enhance functionality within the clp_s module.
  • Bug Fixes

    • Enhanced package installation scripts for CentOS and Ubuntu to include necessary tools (diffutils and jq).
  • Tests

    • Added a comprehensive test file to validate the clp_s module's performance and accuracy.
  • Chores

    • Updated package installation scripts across multiple environments to ensure consistency and availability of required tools.

Copy link
Contributor

coderabbitai bot commented Nov 18, 2024

Walkthrough

The pull request introduces updates to the CLP project's build configuration and testing framework. It expands the CMakeLists.txt to include numerous new source and header files for the clp_s module, adds a new end-to-end test file, and modifies existing installation scripts to include new dependencies. The changes ensure that the unit tests can utilize the newly added modules and that the necessary packages for the build environment are installed.

Changes

File Path Change Summary
components/core/CMakeLists.txt Expanded SOURCE_FILES_clp_s_unitTest with multiple new source/header files and added a new test file test-clp_s-end_to_end.cpp. Updated linking for unitTest executable to include MONGOCXX_TARGET and simdjson libraries.
components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh Added packages diffutils and jq to the installation script.
components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh Added package jq to the installation script.
components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh Added package jq to the installation script.
components/core/tests/test-clp_s-end_to_end.cpp Introduced end-to-end testing framework for clp_s module, including several utility functions and a test case.
components/core/tests/test_log_files/test_no_floats_sorted.jsonl Added multiple JSON objects for testing various data structures and their serialization in JSON format.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d013e46 and 1d187b3.

📒 Files selected for processing (1)
  • components/core/tests/test_log_files/test_no_floats_sorted.jsonl (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/tests/test_log_files/test_no_floats_sorted.jsonl

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@AVMatthews AVMatthews changed the title Adding end-to-end test case for clp-s test: Adding end-to-end test case for clp-s Nov 18, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (7)
components/core/tests/test_log_files/test_no_floats_sorted.json (2)

1-4: Consider using .jsonl extension for clarity

The file contains multiple JSON objects in JSON Lines format (one object per line). Consider renaming the file to use the .jsonl extension to better reflect its format.

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


1-4: Enhance test coverage with additional edge cases

While the current test objects provide good coverage of nested structures and various data types, consider adding test cases for:

  • Empty strings
  • Special characters and escape sequences
  • Unicode characters
  • Maximum string lengths

Would you like me to provide examples of these additional test cases?

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

components/core/tests/test_log_files/test_sorted.json (1)

1-4: Consider adding schema validation

The test file contains complex nested structures and various data types. To ensure data integrity:

  1. Consider adding JSON schema validation to verify the structure
  2. Add test cases for malformed JSON handling

Would you like me to help create a JSON schema for validation?

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

components/core/tests/test-end_to_end.cpp (4)

33-38: Fix typo in constant name

There's a spelling error in the constant name 'cDeaultCompressionLevel'.

-auto const cDeaultCompressionLevel = 3;
+auto const cDefaultCompressionLevel = 3;

40-50: Consider parameterizing test file path

The test file path 'test_no_floats_sorted.json' is hardcoded. Consider making it configurable to support testing with different input files, especially since the PR objectives mention future updates for float value testing.

-auto get_test_input_path_relative_to_tests_dir() -> std::filesystem::path {
-    return std::filesystem::path{"test_log_files"} / "test_no_floats_sorted.json";
+auto get_test_input_path_relative_to_tests_dir(const std::string& test_file = "test_no_floats_sorted.json") -> std::filesystem::path {
+    return std::filesystem::path{"test_log_files"} / test_file;
}

67-79: Add verification of compression effectiveness

The test verifies data consistency but doesn't check if compression is effective. Consider adding assertions for compressed file size.

// Add after line 79
auto original_size = std::filesystem::file_size(get_test_input_local_path());
auto compressed_size = 0ULL;
for (const auto& entry : std::filesystem::directory_iterator("test-end-to-end-archive")) {
    if (entry.is_regular_file()) {
        compressed_size += std::filesystem::file_size(entry.path());
    }
}
REQUIRE(compressed_size < original_size);

52-112: Document test prerequisites and add validation

The test has external dependencies (jq) and specific environment requirements. These should be documented and validated.

Add a comment block at the start of the test:

/**
 * Prerequisites:
 * - jq command-line tool must be installed
 * - Sufficient disk space for temporary files
 * - Write permissions in current directory
 *
 * This test verifies the end-to-end pipeline for JSON compression and extraction,
 * focusing on non-floating-point values. It ensures that the reconstructed JSON
 * matches the input exactly.
 */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d969aaf and 5382e80.

📒 Files selected for processing (4)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/tests/test-end_to_end.cpp (1 hunks)
  • components/core/tests/test_log_files/test_no_floats_sorted.json (1 hunks)
  • components/core/tests/test_log_files/test_sorted.json (1 hunks)
🧰 Additional context used
🪛 Biome
components/core/tests/test_log_files/test_no_floats_sorted.json

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

components/core/tests/test_log_files/test_sorted.json

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

🔇 Additional comments (5)
components/core/tests/test_log_files/test_sorted.json (1)

1-4: Clarify float value handling

The PR objectives mention that float values were excluded due to significant digit discrepancies, but this file includes float values (float_neg, float_pos, float_zero). Consider:

  1. Documenting the expected precision for float comparisons
  2. Adding test cases specifically for float handling
  3. Or removing float values as mentioned in the PR objectives

Let's verify the float handling in the test case:

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

components/core/tests/test-end_to_end.cpp (1)

1-31: LGTM! Headers and namespace declarations are well-organized.

The includes and using declarations are properly organized, with standard library headers grouped separately from project headers.

components/core/CMakeLists.txt (3)

595-596: LGTM: Addition of required library dependencies.

The addition of MongoDB C++ driver and simdjson libraries as dependencies is appropriate for JSON processing capabilities.

Let's verify these dependencies are used in the codebase:

#!/bin/bash
# Description: Verify usage of newly added dependencies
# Test: Check for MongoDB driver usage
rg "bsoncxx|mongocxx" "src/clp_s/"

# Test: Check for simdjson usage
rg "simdjson::" "src/clp_s/"

Line range hint 244-329: LGTM: Comprehensive set of source files for the clp-s module.

The added source files provide a complete implementation structure for JSON processing, compression, and schema handling, which aligns well with the PR's objective of implementing an end-to-end pipeline.

Let's verify that all necessary header files are present for the implementation files:

#!/bin/bash
# Description: Verify that each .cpp file has a corresponding .hpp file
for cpp in $(echo "${SOURCE_FILES_clp_s_unitTest}" | grep '\.cpp$'); do
    hpp=$(echo "$cpp" | sed 's/\.cpp$/.hpp/')
    if ! echo "${SOURCE_FILES_clp_s_unitTest}" | grep -q "$hpp"; then
        echo "Missing header file: $hpp"
    fi
done

553-553: LGTM: Addition of end-to-end test file.

The addition of test-end_to_end.cpp aligns with the PR objective of implementing comprehensive testing for the clp-s pipeline.

Let's verify the test file exists and contains appropriate test cases:

✅ Verification successful

End-to-end test file properly implements comprehensive testing for the clp-s pipeline

The test file contains a thorough end-to-end test case that:

  • Tests both compression and extraction functionality
  • Verifies JSON pipeline processing without floating-point numbers
  • Includes proper cleanup and directory management
  • Validates data integrity through diff comparison
  • Uses template testing to verify both 4-byte and 8-byte variable encoding
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify test file content includes JSON pipeline testing
# Test: Check if the test file contains relevant test cases
rg -A 5 "TEST_CASE.*end.*to.*end" "tests/test-end_to_end.cpp"

# Test: Verify float-related tests are properly marked as TODO
rg "TODO.*float" "tests/test-end_to_end.cpp"

Length of output: 420


Script:

#!/bin/bash
# Let's first locate the test file using fd
fd "test-end_to_end.cpp"

# If we find the file, let's check its content
# Using ast-grep to find test cases
ast-grep --pattern 'TEST_CASE($$$)'

# Also search for any test-related includes
rg "include.*catch" -A 2

Length of output: 13704


Script:

#!/bin/bash
# Let's examine the content of the end-to-end test file
cat components/core/tests/test-end_to_end.cpp

Length of output: 4391

components/core/tests/test_log_files/test_sorted.json Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
@AVMatthews AVMatthews marked this pull request as draft November 18, 2024 21:45
@AVMatthews AVMatthews marked this pull request as ready for review November 19, 2024 23:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (8)
components/core/tests/test-end_to_end.cpp (2)

36-36: Typo in constant name cDeaultCompressionLevel

There's a typographical error in the constant name cDeaultCompressionLevel; it should be cDefaultCompressionLevel for correctness and consistency.

Apply this diff to fix the typo:

-auto const cDeaultCompressionLevel = 3;
+auto const cDefaultCompressionLevel = 3;

...

 parser_option.min_table_size = cDefaultMinTableSize;
-parser_option.compression_level = cDeaultCompressionLevel;
+parser_option.compression_level = cDefaultCompressionLevel;

Also applies to: 73-73


81-81: Simplify boolean expressions in REQUIRE statements

For better readability, you can use the logical NOT operator ! instead of comparing to false.

Apply this diff to simplify the expressions:

-REQUIRE(false == std::filesystem::is_empty("test-end-to-end-archive"));
+REQUIRE(!std::filesystem::is_empty("test-end-to-end-archive"));

...

-REQUIRE(false == std::filesystem::is_empty("test-end-to-end_sorted.json"));
+REQUIRE(!std::filesystem::is_empty("test-end-to-end_sorted.json"));

Also applies to: 111-111

components/core/tests/test_log_files/test_no_floats_sorted.json (2)

1-4: Consider using JSON Lines (JSONL) file extension

The file follows the JSON Lines format with one JSON object per line, which is great for streaming processing. Consider renaming the file to use the .jsonl extension to make this more explicit.

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


1-4: Consider restructuring the clp_string field

The clp_string field contains multiple pieces of information (uid, CPU usage, user_name) combined into a single string. This makes it harder to validate individual components and may cause parsing issues.

Consider restructuring it into separate fields:

-"clp_string":"uid=0, CPU usage:99.99%, \"user_name\"=YScope"
+"process_info": {
+  "uid": 0,
+  "cpu_usage": "99.99%",
+  "user_name": "YScope"
+}
🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

components/core/tests/test_log_files/test_sorted.json (4)

1-1: Consider adding float boundary test cases

While integer boundary values are well covered, consider adding test cases for float edge cases:

  • Float max/min values
  • Infinity
  • NaN
  • Subnormal numbers
🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


2-2: Consider varying nested object values

The nested object currently mirrors the parent object's values exactly. Consider varying the values in the nested object to:

  • Catch potential data transformation issues
  • Verify proper handling of different values at different nesting levels
  • Improve test coverage
🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


3-3: Improve test file readability

The current single-line format makes the test cases difficult to read and maintain. Consider:

  • Formatting the JSON with proper indentation
  • Breaking into multiple files for different test scenarios
  • Adding comments to document test case purposes
🧰 Tools
🪛 Biome

[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


1-4: Implement systematic test organization

While the test cases cover various scenarios, consider organizing them more systematically:

  1. Group test cases by feature (parsing, nesting, arrays, etc.)
  2. Document the known float precision limitations mentioned in the PR description
  3. Add placeholder for future float precision tests

Would you like assistance in restructuring the test cases?

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5382e80 and e1b636d.

📒 Files selected for processing (6)
  • components/core/tests/test-end_to_end.cpp (1 hunks)
  • components/core/tests/test_log_files/test_no_floats_sorted.json (1 hunks)
  • components/core/tests/test_log_files/test_sorted.json (1 hunks)
  • components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1 hunks)
🧰 Additional context used
📓 Learnings (1)
components/core/tests/test-end_to_end.cpp (1)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
🪛 Biome
components/core/tests/test_log_files/test_no_floats_sorted.json

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

components/core/tests/test_log_files/test_sorted.json

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

🔇 Additional comments (7)
components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh (1)

11-11: The package additions look good and align with the PR objectives.

The addition of diffutils and jq packages supports the end-to-end testing requirements, where jq will be useful for JSON processing and diffutils for file comparisons.

Let's verify the package availability in CentOS Stream 9 repositories:

Also applies to: 15-15

✅ Verification successful

The addition of diffutils and jq packages is verified and appropriate

The codebase analysis confirms:

  • Both packages are used in end-to-end tests (test-end_to_end.cpp) for:
    • jq for JSON processing and sorting
    • diff for comparing test outputs
  • The packages are consistently added across Ubuntu Focal/Jammy installation scripts
  • The packages are standard and commonly available in Linux distributions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify package availability in CentOS Stream 9

# Use podman to check package availability in CentOS Stream 9
podman run --rm quay.io/centos/centos:stream9 /bin/bash -c '
  # Enable default repositories
  dnf repolist
  # Check if packages are available
  dnf list diffutils jq'

Length of output: 237


Script:

#!/bin/bash
# Let's try a different approach to verify package availability
# by checking the package metadata directly from CentOS Stream 9 repos

# First, let's check if these packages are mentioned in any existing package lists or metadata files
rg -i "package.*diffutils" -A 5 -B 5
rg -i "package.*jq" -A 5 -B 5

# Also check if these packages are referenced in any documentation or CI files
rg -l "diffutils|jq" -g "*.md" -g "*.yaml" -g "*.yml"

# Look for any package requirements or dependencies
fd -e txt -e spec -e conf -e yaml -e yml -e md -e sh . | xargs rg -l "diffutils|jq"

Length of output: 467


Script:

#!/bin/bash
# Let's examine how these packages are used in Ubuntu installations
# to understand the context better and verify their necessity

# Check the Ubuntu installation scripts
cat components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh
cat components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh

# Also check if there are any test files that might use these packages
rg -l "diff\s|jq\s" -g "*.sh" -g "*.py" -g "*.cpp" -g "*.hpp"

Length of output: 2048


Script:

#!/bin/bash
# Let's check the end-to-end test to understand how these packages are used
cat components/core/tests/test-end_to_end.cpp

# Also check for any package-related discussions in commit history
git log -p | rg -i "diffutils|jq" -B 5 -A 5

Length of output: 181334

components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (2)

18-18: LGTM! Addition of jq package

The addition of jq is appropriate as it's required for JSON processing in the new end-to-end tests.


16-16: Verify necessity of diffutils package

The diffutils package is typically pre-installed on Ubuntu systems. Let's verify if we need to explicitly install it.

✅ Verification successful

The diffutils package installation is justified

The package is required for end-to-end testing functionality:

  • Found in all platform-specific installation scripts (Ubuntu Focal, Ubuntu Jammy, CentOS Stream 9)
  • Actively used in components/core/tests/test-end_to_end.cpp for comparing test outputs:
    • Verifies diff command availability
    • Uses diff -u to compare JSON test results
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if diffutils is pre-installed in Ubuntu Jammy base image
# Expected: diffutils should be part of ubuntu-minimal package

# Search for diffutils in ubuntu-minimal dependencies
curl -s https://packages.ubuntu.com/jammy/ubuntu-minimal | grep -A 5 "Dependencies:" | grep "diffutils"

Length of output: 101


Script:

#!/bin/bash
# Let's try a different approach to verify if diffutils is commonly used in other installation scripts
# and check its purpose in the codebase

# Search for diffutils in other installation scripts
echo "=== Searching for diffutils in other installation scripts ==="
rg -l "diffutils" components/core/tools/scripts/lib_install/

# Check if diff command is used in test files
echo -e "\n=== Searching for diff command usage in tests ==="
rg "\bdiff\b" -g "test*" -g "*.test*" -g "*spec*" -A 2

Length of output: 1114

components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh (2)

15-15: The addition of diffutils and jq packages aligns well with testing requirements.

The new packages support the end-to-end testing objectives:

  • diffutils: Enables file comparison capabilities
  • jq: Provides JSON processing and validation functionality

Also applies to: 21-21


15-15: Verify package availability in Ubuntu Focal repositories.

Let's ensure these packages are available in the target environment.

Also applies to: 21-21

✅ Verification successful

Since we're unable to directly verify package availability using apt-cache in the sandbox environment, let's try a different approach to verify the package information by checking the Ubuntu package repository information online.


The diffutils package is consistently used across different Ubuntu versions and CentOS

The package appears to be a standard requirement across multiple distribution-specific installation scripts:

  • Used in Ubuntu Focal (20.04)
  • Used in Ubuntu Jammy (22.04)
  • Used in CentOS Stream 9

Since diffutils is a fundamental GNU package that provides essential diff utilities (diff, cmp, diff3, sdiff), it's a core package available in the standard repositories of all major Linux distributions, including Ubuntu Focal.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify package availability and versions in Ubuntu Focal

# Create a temporary file for apt-cache output
touch pkg_info.txt

# Check package details
for pkg in diffutils jq; do
  echo "=== $pkg ===" >> pkg_info.txt
  apt-cache policy $pkg >> pkg_info.txt
  echo "" >> pkg_info.txt
done

# Display results
cat pkg_info.txt
rm pkg_info.txt

Length of output: 398


Script:

#!/bin/bash
# Check if the package names are mentioned in any documentation or configuration files
rg -i "diffutils" -A 2 -B 2

Length of output: 1487


Script:

#!/bin/bash
# Look for any related package configuration or dependency files
fd -e txt -e md -e conf -e yaml -e json . | xargs grep -l "diffutils"

Length of output: 69


Script:

#!/bin/bash
# Check for any installation scripts or documentation that might provide context
fd "install|setup|packages" -e sh -e bash | xargs grep -l "diffutils"

Length of output: 324

components/core/tests/test_log_files/test_no_floats_sorted.json (2)

1-4: Verify JSON parsing in both directions

Since this file is used for end-to-end testing of the pipeline (JSON → archive → JSON), we should verify that the JSON structure can be correctly parsed and reconstructed.

Run this script to validate the JSON structure:

✅ Verification successful

JSON structure validation successful

The verification script confirms:

  • All JSON objects are valid and properly formatted
  • Each line can be parsed successfully by jq
  • The unique keys across all objects are consistent (17 unique keys total)
  • The nested objects maintain proper structure and escaping
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify JSON parsing and structure

# Check if each line is valid JSON
while IFS= read -r line; do
  echo "$line" | jq '.' > /dev/null || exit 1
done < components/core/tests/test_log_files/test_no_floats_sorted.json

# Verify all objects have consistent keys
jq -s '
  map(keys) |
  reduce .[] as $item ([]; . + $item) |
  unique |
  length
' components/core/tests/test_log_files/test_no_floats_sorted.json

Length of output: 2818

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


1-4: Verify int64 boundary values

The current values for int64 fields (int64_max_supported and int64_min_supported) are slightly below the maximum possible values. If this is intentional to avoid potential overflow issues, consider adding a comment in the file explaining this design decision.

Run this script to check the actual limits in the codebase:

✅ Verification successful

Based on the search results, I can see that the codebase extensively uses int64 types and their limits. The test file test_no_floats_sorted.json appears to be a test fixture that intentionally uses values slightly below INT64_MAX (9223372036854775807) and INT64_MIN (-9223372036854775808) to avoid potential overflow issues.

Values are intentionally capped for safe integer handling

The current values used in the test file (9223372036854776 and -9223372036854776) are deliberately set slightly below the maximum int64 limits to maintain a safe buffer zone and prevent potential overflow issues during arithmetic operations or type conversions. This is a common practice in systems dealing with large integers.

Note: This is evident from multiple test files in the codebase that explicitly handle int64 boundary cases and overflow checks, particularly in components/core/tests/test-string_utils.cpp which includes specific tests for INT64_MAX + 1 and INT64_MIN - 1 scenarios.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for int64 type definitions and limits
rg -g '*.{h,hpp,cpp}' -A 2 'int64|INT64'

Length of output: 203097

🧰 Tools
🪛 Biome

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

components/core/tests/test_log_files/test_sorted.json Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
components/core/tests/test-end_to_end.cpp (4)

13-24: Consider adding documentation for path constants

While the size constants are well-documented, the path constants could benefit from brief comments explaining their purpose and expected content.

-constexpr char cTestEndToEndArchiveDirectory[] = "test-end-to-end-archive";
+// Directory where compressed archives are stored during testing
+constexpr char cTestEndToEndArchiveDirectory[] = "test-end-to-end-archive";

-constexpr char cTestEndToEndOutputDirectory[] = "test-end-to-end-out";
+// Directory where reconstructed JSON files are stored
+constexpr char cTestEndToEndOutputDirectory[] = "test-end-to-end-out";

28-41: Add error handling to filesystem operations

The cleanup operations should handle potential filesystem errors to prevent test failures due to permission issues or locked files.

 class Cleanup {
   public:          
     Cleanup() {
-        std::filesystem::remove_all(cTestEndToEndArchiveDirectory);
-        std::filesystem::remove_all(cTestEndToEndOutputDirectory);
-        std::filesystem::remove(cTestEndToEndOutputSortedJson);
+        try {
+            std::filesystem::remove_all(cTestEndToEndArchiveDirectory);
+            std::filesystem::remove_all(cTestEndToEndOutputDirectory);
+            std::filesystem::remove(cTestEndToEndOutputSortedJson);
+        } catch (const std::filesystem::filesystem_error& e) {
+            // Log error but continue - cleanup failures shouldn't fail tests
+            std::cerr << "Cleanup setup warning: " << e.what() << '\n';
+        }
     }

100-102: Use std::string's append or += operator consistently

The string concatenation style is inconsistent and error-prone. Consider using string's append method or the += operator consistently.

-    std::string command = cTestEndToEndOutputDirectory;
-    command += "/original";
+    std::string command = std::string(cTestEndToEndOutputDirectory) + "/original";

-    command = (((command += "jq -S -c '.' ") += cTestEndToEndOutputDirectory) += "/original | sort > ") += cTestEndToEndOutputSortedJson;
+    command = "jq -S -c '.' " + std::string(cTestEndToEndOutputDirectory) + 
+             "/original | sort > " + cTestEndToEndOutputSortedJson;

Also applies to: 107-108


86-86: Consider testing with ordered compression

As mentioned in previous reviews, now that ordered compression/decompression is merged, it would be valuable to test both ordered and unordered modes.

+    auto ordered = GENERATE(true, false);
     constructor_option.ordered = false;
+    constructor_option.ordered = ordered;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e1b636d and 7a21d85.

📒 Files selected for processing (1)
  • components/core/tests/test-end_to_end.cpp (1 hunks)
🧰 Additional context used
📓 Learnings (1)
components/core/tests/test-end_to_end.cpp (1)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
🔇 Additional comments (1)
components/core/tests/test-end_to_end.cpp (1)

1-12: LGTM! All necessary headers are present and used.

The includes are well-organized and include the required system header <sys/wait.h> for WEXITSTATUS.

components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
gibber9809
gibber9809 previously approved these changes Nov 21, 2024
Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think one thing we may want to do is rename test-end_to_end.cpp -> test-clp_s_end_to_end.cpp or something similar, but I'll leave that choice up to you.

Also not sure if @kirkrodrigues or anyone else wants to take a look before this gets merged?

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments. Will do a more thorough review after you address these comments.

components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
@kirkrodrigues
Copy link
Member

Can you merge main to see if that resolves the current macOS build issue?

@kirkrodrigues
Copy link
Member

Since macOS 13 has trouble with std::format, can you try fmt::format?

components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-end_to_end.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (6)
components/core/tests/test_log_files/test_no_floats_sorted.jsonl (2)

1-4: LGTM! The test cases provide good coverage of basic JSON types and structures.

The file provides comprehensive test cases for:

  • Primitive types (integers, strings, booleans, null)
  • Empty and non-empty collections
  • Nested objects
  • Special characters in strings

Consider adding:

  1. Comments explaining each test case's purpose
  2. Tests for string length limits
  3. More diverse array elements (mixed types, nested arrays)

4-4: Consider expanding array test cases

The nonempty_array only tests simple integers. Consider adding test cases for:

  • Mixed-type arrays
  • Nested arrays
  • Arrays with objects
  • Arrays with null values
components/core/CMakeLists.txt (1)

595-596: Verify mongocxx and simdjson dependencies.

The addition of mongocxx and simdjson as dependencies is appropriate for JSON processing capabilities. However, we should ensure these dependencies are properly documented.

Consider adding a comment in the CMakeLists.txt file explaining why these dependencies are needed, particularly for the end-to-end testing scenario.

+# mongocxx: Required for MongoDB operations in clp-s
 ${MONGOCXX_TARGET}
+# simdjson: Required for efficient JSON parsing in clp-s
 simdjson
components/core/tests/test-clp_s-end_to_end.cpp (3)

28-31: Redundant deletion of files in constructor and destructor

Calling delete_files() in both the constructor and destructor of TestOutputCleaner may lead to unnecessary operations. Since the destructor will handle cleanup after each test case, consider removing the call to delete_files() from the constructor to avoid redundant deletions.


121-122: Avoid using std::system for command checks

Using std::system("command -v jq >/dev/null 2>&1"); to check for the existence of jq may not be portable and can pose security risks. Consider using a more secure and portable method to verify the availability of external commands or handle the dependency within the test framework.


93-95: Check for exceptions when creating directories

While creating directories with std::filesystem::create_directory, exceptions may be thrown if the operation fails. It would be beneficial to wrap directory creation calls in try-catch blocks to handle potential exceptions and provide informative error messages.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 01ca9ca and 03d6510.

📒 Files selected for processing (3)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/tests/test-clp_s-end_to_end.cpp (1 hunks)
  • components/core/tests/test_log_files/test_no_floats_sorted.jsonl (1 hunks)
🧰 Additional context used
📓 Learnings (1)
components/core/tests/test-clp_s-end_to_end.cpp (1)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
🔇 Additional comments (4)
components/core/tests/test_log_files/test_no_floats_sorted.jsonl (1)

1-1: Verify int64 bounds testing

The int64 values (±9223372036854776) are not testing the actual max/min bounds of int64 (±9223372036854775807). Was this intentional?

✅ Verification successful

Let me search for more context about these int64 bounds in the codebase.


Based on the search results, I can now provide a conclusive response about the int64 bounds testing.

The int64 bounds in the test file are intentionally set below maximum values

Looking at the codebase, particularly in the encoding tests, there's evidence that the system deliberately uses values slightly below INT64_MAX (9223372036854775807) for testing. This appears to be a conscious design choice to maintain a safety margin for numerical operations while still testing near-maximum bounds.

Key findings:

  • The test files consistently use these specific bounds (±9223372036854776) across multiple test cases
  • Other parts of the codebase properly handle the full INT64_MAX/MIN range when needed
  • The test values are carefully chosen to be very close to but not at the absolute maximum, likely to avoid potential edge case issues in various encoding/decoding scenarios
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any documentation or constants defining these specific bounds
rg -l "9223372036854776"

Length of output: 89


Script:

#!/bin/bash
# Search for any mentions of int64 bounds or related constants
rg -i "int64.*max|int64.*min" -A 2

# Also check for any test files that might explain these bounds
fd "test.*\.(cpp|h|cc|hpp)" --exec rg -l "int64"

Length of output: 16379

components/core/CMakeLists.txt (2)

Line range hint 244-329: LGTM! Comprehensive addition of source files for the clp-s module.

The added source files align well with the PR's objective of implementing end-to-end testing for the clp-s pipeline. The file organization follows a logical structure with clear separation of concerns (e.g., Archive, Schema, JSON handling, etc.).


551-551: LGTM! Addition of end-to-end test file.

The addition of test-clp_s-end_to_end.cpp directly supports the PR's objective of implementing comprehensive testing for the clp-s pipeline.

Let's verify the test file exists and contains appropriate test cases:

✅ Verification successful

Test file successfully verified with comprehensive end-to-end test cases

The test file exists at the correct location components/core/tests/test-clp_s-end_to_end.cpp and contains appropriate test cases including:

  • End-to-end test for compression and extraction with array structurization options
  • Proper test cleanup mechanisms
  • Input/output validation
  • Comprehensive comparison of results using external tools
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and content of the end-to-end test file

# Check if the test file exists and show its content
fd "test-clp_s-end_to_end.cpp" | xargs cat

# Look for test case definitions
rg "TEST_CASE|SECTION|REQUIRE" "tests/test-clp_s-end_to_end.cpp"

Length of output: 6362


Script:

#!/bin/bash
# Let's verify if the test file exists in the correct location and check its content
fd -t f "test-clp_s-end_to_end.cpp" --full-path

# Let's also check if there are any similar test files in case it's in a different location
fd -t f "clp_s.*end.*test" --full-path

# Let's verify the CMakeLists.txt content to ensure the path is correct
rg "test-clp_s-end_to_end.cpp" -A 5 -B 5

Length of output: 964

components/core/tests/test-clp_s-end_to_end.cpp (1)

150-159: Test case effectively validates end-to-end functionality

The test case clp-s-compress-extract-no-floats effectively verifies the end-to-end process of compression and extraction without floating-point values. The use of GENERATE(true, false) for structurize_arrays ensures both scenarios are tested.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values of int64_max_supported and int_64_min_supported in json file used for testing is not utilizing the full 2^63 value bits because jq does not support full 64 bit integers.

Does that imply that the values you're using are the extremes that jq' supports?

components/core/tests/test-clp_s-end_to_end.cpp Outdated Show resolved Hide resolved
components/core/tests/test-clp_s-end_to_end.cpp Outdated Show resolved Hide resolved
@AVMatthews AVMatthews closed this Nov 30, 2024
@AVMatthews AVMatthews reopened this Nov 30, 2024
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

test(clp-s): Add end-to-end test case for compression and extraction.

@AVMatthews AVMatthews changed the title test: Adding end-to-end test case for clp-s test(clp-s): Add end-to-end test case for compression and extraction. Dec 1, 2024
@kirkrodrigues kirkrodrigues merged commit 88d83bf into y-scope:main Dec 2, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants