-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clp-s: Add support for projecting of a subset of columns during search. #510
Conversation
Seems like |
Issue being tracked in #545. You can merge main to temporarily resolve the (new) violations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Do you want to point out --projection
can only be used after all positional options or do you want to support something like ./clp-s s --projection a b c -- archive query
? Maybe we should also add a projection case in the doc later.
* rewrite a simpler version. Probably best to write the simpler version for now since types are | ||
* leaf-only. The type-per-token idea solves this problem (in the absence of wildcards). | ||
*/ | ||
void resolve_columns(std::shared_ptr<SchemaTree> tree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a description for this method?
} | ||
|
||
private: | ||
void resolve_column(std::shared_ptr<SchemaTree> tree, std::shared_ptr<ColumnDescriptor> column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for this method.
std::shared_ptr<search::Projection> m_projection{ | ||
new search::Projection{search::ProjectionMode::ReturnAllColumns} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using make_shared
?
/** | ||
* Add a column to the set of columns that should be included in the projected results | ||
* @param column | ||
* @throws OperationFailed if column contains a wildcard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more throw cases?
)( | ||
"projection", | ||
po::value<std::vector<std::string>>(&m_projection_columns) | ||
->multitoken() | ||
->value_name("COLUMN_A COLUMN_B ..."), | ||
"The set of projected columns that will be marshalled for matching results" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reduce the indent of this code block?
Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
WalkthroughThe changes introduce enhancements to the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (13)
components/core/src/clp_s/search/Projection.hpp (2)
13-16
: LGTM: ProjectionMode enum is well-defined.The enum definition is clear and concise. Using
uint8_t
as the underlying type is appropriate for this small enum.Consider adding a brief comment explaining the purpose of each enum value for improved clarity:
enum ProjectionMode : uint8_t { ReturnAllColumns, // Return all columns regardless of selection ReturnSelectedColumns // Return only the columns specified in the projection };
62-64
: LGTM: Private members are well-chosen and initialized.The use of
std::vector
for selected columns andabsl::flat_hash_set
for matching nodes is appropriate and efficient. The default projection mode ofReturnAllColumns
aligns with the expected behavior.Consider making the
m_projection_mode
member const since it's initialized in the constructor and doesn't appear to change:const ProjectionMode m_projection_mode;
components/core/src/clp_s/JsonSerializer.hpp (1)
78-84
: Improved robustness for closing JSON documentsThe changes to the
end_document()
method enhance its ability to handle various scenarios, including empty documents or improperly started ones. This is a good improvement in terms of robustness.However, we could optimize this slightly by using a ternary operator:
void end_document() { m_json_string.back() != '{' ? m_json_string.back() = '}' : m_json_string += '}'; }This achieves the same result with a more concise syntax, potentially improving readability and performance.
components/core/src/clp_s/ArchiveReader.hpp (1)
190-192
: Consider usingstd::make_shared
for initialization.While the current initialization is correct, using
std::make_shared
could potentially be more efficient:std::shared_ptr<search::Projection> m_projection{ std::make_shared<search::Projection>(search::ProjectionMode::ReturnAllColumns) };This approach performs a single allocation for both the
Projection
object and the control block, which can be more efficient than separate allocations.components/core/src/clp_s/CommandLineArguments.hpp (1)
197-197
: Consider initializing m_projection_columns in the constructor.The new
m_projection_columns
member variable is appropriately typed and named, following the class's conventions. However, it's not initialized in the constructor. While this might be intentional, consider adding an empty initialization in the constructor to ensure consistent behaviour across all instances of the class.Example:
CommandLineArguments(std::string const& program_name) : m_program_name(program_name), m_projection_columns() {}This would guarantee that
m_projection_columns
starts as an empty vector, preventing any potential issues with uninitialized data.components/core/src/clp_s/SchemaReader.hpp (1)
75-83
: LGTM: Updated reset method signatureThe addition of the
projection
parameter to thereset
method signature is appropriate and aligns with the PR objectives to support column projection.Suggestion: Consider updating the method documentation to include a brief description of the
projection
parameter.components/core/src/clp_s/SchemaReader.cpp (2)
560-562
: Good extension of projection to unordered objectsThis change correctly extends the projection feature to unordered objects in the schema, maintaining consistency in applying user-specified column projections. Well done!
For improved code consistency, consider extracting the
matches_node
check into a separate method, as it's used in multiple places.Here's a suggested refactor:
+bool should_include_in_local_tree(int32_t node_id) { + return m_projection->matches_node(node_id); +} void SchemaReader::initialize_serializer() { // ... for (int32_t global_column_id : m_ordered_schema) { - if (m_projection->matches_node(global_column_id)) { + if (should_include_in_local_tree(global_column_id)) { generate_local_tree(global_column_id); } } for (auto it = m_global_id_to_unordered_object.begin(); it != m_global_id_to_unordered_object.end(); ++it) { - if (m_projection->matches_node(it->first)) { + if (should_include_in_local_tree(it->first)) { generate_local_tree(it->first); } } // ... }This refactoring improves readability and makes it easier to modify the inclusion logic in the future if needed.
567-569
: Good defensive programming practiceThis change prevents attempting to generate a JSON template for an empty schema tree, which is an excellent defensive programming practice. It avoids potential errors and improves the robustness of the code.
To improve clarity, consider adding a brief comment explaining why this check is necessary:
+ // Only generate JSON template if we have nodes in the local schema tree + // This can happen if no columns match the projection if (false == m_local_schema_tree.get_nodes().empty()) { generate_json_template(m_local_schema_tree.get_root_node_id()); }This comment would help future maintainers understand the purpose of this check quickly.
components/core/src/clp_s/CommandLineArguments.cpp (1)
428-433
: LGTM! Consider clarifying the description.The implementation of the new
--projection
option looks good. It correctly allows for multiple column names and stores them appropriately.Consider slightly modifying the description to be more explicit about its purpose:
- "The set of projected columns that will be marshalled for matching results" + "Specify columns to include in the output for matching results"This change makes it clearer that the option affects the output content rather than the matching process itself.
components/core/src/clp_s/search/Projection.cpp (2)
38-53
: Consider moving extensive comments to documentationThe block comment from lines 38 to 53 provides valuable context but is quite lengthy. Consider moving this explanation to external documentation or a detailed comment in the header file to keep the codebase clean and maintainable.
66-66
: Simplify boolean expression for enhanced readabilityUsing
!last_token
instead offalse == last_token
improves clarity.Apply this diff:
- if (false == last_token && child_node.get_type() != NodeType::Object) { + if (!last_token && child_node.get_type() != NodeType::Object) {components/core/src/clp_s/clp-s.cpp (2)
198-198
: Enhance Error Logging for ClarityThe current error message may lack context, making it harder to diagnose issues.
Consider adding more descriptive information to the error log to assist with debugging:
-SPDLOG_ERROR("{}", e.what()); +SPDLOG_ERROR("Failed to add projection columns: {}", e.what());This change provides clearer insight into what operation was being performed when the error occurred.
185-189
: Simplify the Initialization ofprojection
ObjectThe ternary operator used for determining the
ProjectionMode
can be simplified for better readability.Consider simplifying the initialization as follows:
auto projection = std::make_shared<Projection>( - command_line_arguments.get_projection_columns().empty() - ? ProjectionMode::ReturnAllColumns - : ProjectionMode::ReturnSelectedColumns + command_line_arguments.get_projection_columns().empty() ? + ProjectionMode::ReturnAllColumns : + ProjectionMode::ReturnSelectedColumns );Alternatively, assign the mode to a variable for clarity:
ProjectionMode mode = command_line_arguments.get_projection_columns().empty() ? ProjectionMode::ReturnAllColumns : ProjectionMode::ReturnSelectedColumns; auto projection = std::make_shared<Projection>(mode);
This enhances readability by breaking down the logic into manageable parts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
- components/core/src/clp_s/ArchiveReader.cpp (1 hunks)
- components/core/src/clp_s/ArchiveReader.hpp (3 hunks)
- components/core/src/clp_s/CMakeLists.txt (1 hunks)
- components/core/src/clp_s/CommandLineArguments.cpp (1 hunks)
- components/core/src/clp_s/CommandLineArguments.hpp (2 hunks)
- components/core/src/clp_s/JsonSerializer.hpp (1 hunks)
- components/core/src/clp_s/SchemaReader.cpp (1 hunks)
- components/core/src/clp_s/SchemaReader.hpp (4 hunks)
- components/core/src/clp_s/clp-s.cpp (3 hunks)
- components/core/src/clp_s/search/Projection.cpp (1 hunks)
- components/core/src/clp_s/search/Projection.hpp (1 hunks)
🧰 Additional context used
cppcheck
components/core/src/clp_s/clp-s.cpp
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
🔇 Additional comments not posted (20)
components/core/src/clp_s/search/Projection.hpp (5)
1-12
: LGTM: File structure and includes are well-organized.The file header, include guards, and necessary includes are properly set up. The namespace is correctly defined, following best practices for C++ code organization.
24-39
: Enhance exception documentation and consider additional throw cases.The
OperationFailed
exception is well-defined, but consider expanding on potential throw cases in theadd_column
method documentation.Update the method documentation to include more specific throw cases:
/** * Add a column to the set of columns that should be included in the projected results * @param column The column descriptor to add * @throws OperationFailed if: * - column contains a wildcard * - column is null * - column name is empty * - column name is duplicated */ void add_column(std::shared_ptr<ColumnDescriptor> column);
40-47
: Add method description for resolve_columns.As per the previous review comment, please add a description for this method to improve code documentation.
Add a method description:
/** * Resolve the added columns against the provided schema tree. * This method should be called after adding all desired columns and before calling matches_node. * @param tree The schema tree to resolve columns against * @throws OperationFailed if resolution fails for any column */ void resolve_columns(std::shared_ptr<SchemaTree> tree);
59-60
: Add method description for resolve_column.As per the previous review comment, please add a description for this private method to improve code maintainability.
Add a method description:
/** * Resolve a single column against the provided schema tree. * @param tree The schema tree to resolve the column against * @param column The column descriptor to resolve * @throws OperationFailed if resolution fails for the column */ void resolve_column(std::shared_ptr<SchemaTree> tree, std::shared_ptr<ColumnDescriptor> column);
49-57
: LGTM: matches_node method is well-implemented.The
matches_node
method is efficiently implemented as an inline function. It correctly checks the projection mode and uses the hash set for fast matching, which aligns well with the PR objectives of allowing users to specify which columns to include in the search results.components/core/src/clp_s/ArchiveReader.hpp (2)
15-15
: LGTM: Appropriate include added for new functionality.The inclusion of "search/Projection.hpp" is necessary and correctly placed for the new projection functionality being introduced.
137-139
: LGTM: Well-designed setter for projection.The
set_projection
method is appropriately implemented:
- It uses
std::shared_ptr
for managing shared ownership of the projection object.- The method is concise and follows the Single Responsibility Principle.
components/core/src/clp_s/CMakeLists.txt (1)
148-149
: LGTM! The new Projection files are correctly added.The addition of
Projection.cpp
andProjection.hpp
to theCLP_S_SEARCH_SOURCES
variable is correct and aligns with the PR objectives of adding support for projecting a subset of columns during search.A few considerations:
- Ensure that any necessary include directives for these new files are added in other source files that will use the Projection functionality.
- If the Projection feature requires any new dependencies, make sure they are added to the
target_link_libraries
command for theclp-s
target.- Consider updating any relevant documentation or README files to reflect the new Projection feature.
components/core/src/clp_s/CommandLineArguments.hpp (2)
111-112
: LGTM: Well-designed getter method for projection columns.The new
get_projection_columns()
method is a well-implemented getter. It provides efficient, read-only access to the projection columns using a const reference return type. The method is appropriately marked as const, ensuring it doesn't modify the object's state.
111-112
: Verify implementation of parsing and setting projection columns.The changes to
CommandLineArguments
class provide a solid foundation for handling projection columns. However, the actual parsing of the--projection
command-line option and setting of them_projection_columns
vector is not visible in this file.To ensure complete implementation:
- Verify that the
--projection
option is properly parsed in the command-line parsing logic (likely in a separate file or method).- Confirm that
m_projection_columns
is correctly populated with the parsed values.- Check for any validation of the projection column names (e.g., disallowing wildcards, handling case sensitivity) as mentioned in the PR objectives.
Also applies to: 197-197
components/core/src/clp_s/SchemaReader.hpp (3)
14-14
: LGTM: Include statement for Projection headerThe addition of the include statement for "search/Projection.hpp" is appropriate and necessary for using the
search::Projection
type in theSchemaReader
class.
298-298
: LGTM: New member variable for projectionThe addition of the
m_projection
member variable is consistent with the PR objectives and matches the type used in thereset
method signature.
Line range hint
1-303
: Overall assessment: Changes align with PR objectivesThe modifications to the
SchemaReader
class, including the new include statement, updatedreset
method signature, and newm_projection
member variable, are well-implemented and consistent with the PR objectives to support column projection.These changes lay the groundwork for implementing the projection feature in the
SchemaReader
class, allowing for more efficient data retrieval based on user-specified columns.Ensure that the implementation of the projection functionality in other parts of the codebase (e.g., in the
reset
method body and other methods that will usem_projection
) is consistent with the design introduced here.components/core/src/clp_s/ArchiveReader.cpp (1)
238-238
: LGTM! Verify SchemaReader class update.The addition of the
m_projection
parameter toreader.reset()
aligns well with the PR objective of supporting column projection during search. This change appears to be a key component in implementing the new--projection
command line option.To ensure consistency across the codebase, please run the following script to verify the
SchemaReader
class update:✅ Verification successful
Verified SchemaReader class updates. No issues found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify SchemaReader class update to handle projections # Test 1: Check if SchemaReader class has been updated to handle projections rg --type cpp -g 'SchemaReader.*' 'class SchemaReader' -A 10 # Test 2: Verify if reader.reset() method signature has been updated rg --type cpp -g 'SchemaReader.*' 'void reset\(' -A 5 # Test 3: Check for any new methods related to projection handling rg --type cpp -g 'SchemaReader.*' 'projection'Length of output: 2715
components/core/src/clp_s/SchemaReader.cpp (2)
551-553
: Excellent implementation of column projection!This change effectively implements the projection feature, ensuring that only the columns specified by the user are included in the local tree. This aligns perfectly with the PR objective of allowing users to specify which columns to include in the search results using the
--projection
option.
551-569
: Overall excellent implementation of the projection featureThe changes in this file successfully implement the projection feature as described in the PR objectives. The modifications allow users to specify which columns to include in the search results, and they are well-integrated into the existing code structure.
Key points:
- The projection is correctly applied to both ordered and unordered schema elements.
- Defensive programming practices have been added to handle potential edge cases.
- The changes are consistent with the existing codebase style and structure.
Minor suggestions for improvement have been provided in the individual comments, but these are not critical for the functionality of the feature. Great job on this implementation!
components/core/src/clp_s/CommandLineArguments.cpp (1)
Line range hint
1-1
: Verify related changes in other files.While the addition of the
--projection
option is implemented correctly here, there are a couple of points to consider:
- Ensure that the
m_projection_columns
member variable is properly declared in the corresponding header file (likelyCommandLineArguments.hpp
).- Verify that the actual implementation of the projection functionality is present in the appropriate file(s) where the search results are processed and output.
To check for the member variable declaration and related implementations, you can run the following script:
This script will help identify where the member variable is declared and where the projection functionality is implemented.
✅ Verification successful
Verification Successful
The
m_projection_columns
member variable is properly declared inCommandLineArguments.hpp
, and the projection functionality is correctly implemented across the relevant files.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for m_projection_columns declaration echo "Checking for m_projection_columns declaration:" rg "std::vector<std::string>\s+m_projection_columns;" --type cpp echo -e "\nChecking for projection-related implementations:" rg "projection|m_projection_columns" --type cpp -g '!CommandLineArguments.cpp'Length of output: 3175
components/core/src/clp_s/clp-s.cpp (3)
32-32
: IncludeProjection.hpp
to Use Projection FunctionalityThe inclusion of
"search/Projection.hpp"
is necessary to utilize theProjection
class for handling column projections.
43-43
: AddStringUtils
to Namespace ScopeThe using directive
using clp_s::StringUtils;
correctly bringsStringUtils
into scope, allowing for cleaner code when calling its methods.
194-194
: Check for Exceptions fromColumnDescriptor::create
Ensure that
ColumnDescriptor::create(descriptor_tokens)
handles invalid input gracefully and does not throw unexpected exceptions.Run the following script to verify how
ColumnDescriptor::create
handles invalid descriptors:Ensure that any exceptions are properly documented and managed.
✅ Verification successful
Verification Complete: Exceptions Handling in
ColumnDescriptor::create
ConfirmedThe
ColumnDescriptor::create(descriptor_tokens)
implementation does not contain anythrow
statements, indicating that it handles invalid input without raising exceptions directly. Additionally, the surrounding code inclp-s.cpp
appropriately manages potential exceptions.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Inspect ColumnDescriptor::create for exception safety. # Find the implementation of ColumnDescriptor::create rg --type cpp 'ColumnDescriptor::create\(' -A 10 # Look for any throw statements or exception handling mechanismsLength of output: 7727
Script:
#!/bin/bash # Description: Inspect ColumnDescriptor::create for exception safety. # Find the implementation files of ColumnDescriptor::create rg --type cpp 'ColumnDescriptor::create\s*\(' > create_implementations.txt # Extract file paths from the previous command FILES=$(rg --type cpp --files-with-matches 'ColumnDescriptor::create\s*\(') # Search for throw statements within these implementations for file in $FILES; do echo "Checking $file for throw statements..." rg 'throw\s+' "$file" -C 2 doneLength of output: 1688
Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
components/core/src/clp_s/CommandLineArguments.cpp (1)
428-435
: LGTM! Consider clarifying the projection documentation.The new options "archive-id" and "projection" are well-implemented and properly integrated into the existing command-line argument structure. The code follows the established patterns and is consistent with the rest of the file.
A minor suggestion for improvement:
Consider adding a brief example of how to use the "projection" option in the documentation string. This could help users understand the feature more quickly. For instance:
"Project only the given set of columns for matching results. This option must be" " specified after all positional options. Values that are objects or structured" -" arrays are currently unsupported." +" arrays are currently unsupported. Example: --projection column1 column2 column3"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- components/core/src/clp_s/CommandLineArguments.cpp (1 hunks)
- components/core/src/clp_s/search/Projection.hpp (1 hunks)
🔇 Additional comments (1)
components/core/src/clp_s/search/Projection.hpp (1)
63-66
: Functionmatches_node
implementation is correct.The function correctly determines whether a node should be included based on the projection mode and matching nodes.
|
||
std::vector<std::shared_ptr<ColumnDescriptor>> m_selected_columns; | ||
absl::flat_hash_set<int32_t> m_matching_nodes; | ||
ProjectionMode m_projection_mode{ProjectionMode::ReturnAllColumns}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove redundant in-class initialization of m_projection_mode
.
Since m_projection_mode
is always initialized by the constructor, the in-class initializer is redundant and can be removed to avoid potential confusion.
Apply this diff to remove the redundant initialization:
- ProjectionMode m_projection_mode{ProjectionMode::ReturnAllColumns};
+ ProjectionMode m_projection_mode;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ProjectionMode m_projection_mode{ProjectionMode::ReturnAllColumns}; | |
ProjectionMode m_projection_mode; |
class OperationFailed : public TraceableException { | ||
public: | ||
// Constructors | ||
OperationFailed(ErrorCode error_code, char const* const filename, int line_number) | ||
: TraceableException(error_code, filename, line_number) {} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider renaming OperationFailed
to ProjectionOperationFailed
for clarity.
The exception class OperationFailed
has a generic name that may lead to confusion if similar exceptions exist elsewhere in the codebase. Renaming it to ProjectionOperationFailed
will make its purpose more explicit and improve readability.
Apply this diff to rename the exception class:
- class OperationFailed : public TraceableException {
+ class ProjectionOperationFailed : public TraceableException {
Remember to update all references to OperationFailed
accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class OperationFailed : public TraceableException { | |
public: | |
// Constructors | |
OperationFailed(ErrorCode error_code, char const* const filename, int line_number) | |
: TraceableException(error_code, filename, line_number) {} | |
}; | |
class ProjectionOperationFailed : public TraceableException { | |
public: | |
// Constructors | |
ProjectionOperationFailed(ErrorCode error_code, char const* const filename, int line_number) | |
: TraceableException(error_code, filename, line_number) {} | |
}; |
…h. (y-scope#510) Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com> Co-authored-by: Kirk Rodrigues <2454684+kirkrodrigues@users.noreply.github.com>
…h. (y-scope#510) Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com> Co-authored-by: Kirk Rodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Description
This PR adds support for basic projection during search. We allow the user to specify which columns they want to be marshalled for matching search results using the --projection command line option.
For example
clp-s s archives '*' --archive-id XYZ --projection @timestamp level msg.error
Would return all records in the archive XYZ, showing only the @timestamp, level, and msg.error fields.
Note that if a record doesn't contain all of columns we are asked to project we simply output the record without doing anything about the missing columns. In our example above if a record were missing all three of the fields we asked for it would simply be marshalled as
{}
.Other things to note are that we allow projected columns to be polymorphic, we do not allow the set of projected columns to contain wildcards, and we do not allow projection to travel inside of arrays. Specifically if you have an array a containing objects with the field b then asking to project "a" is legal (leading to the whole array being marshalled), but asking for "a.b" is illegal.
Also note that unlike languages like SQL the names of columns being projected are case sensitive.
We can add support for things like case-insensitive column matching, advanced handling for records that don't contain a column we're asking for, polymorphism, and more in future PRs.
The implementation is quite straightforward. We take in a set of columns, expand them against the MPT to find all matching nodes, and then check if nodes in the schema of an ERT belong to that set when deciding whether to marshal the corresponding column.
Validation performed
Summary by CodeRabbit
Release Notes
New Features
--projection
for specifying projected columns.Projection
class for managing and resolving column projections.Bug Fixes
Enhancements
ArchiveReader
andSchemaReader
to utilize projection information for schema processing.