Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Manifest v12 json schema #130

Merged
merged 2 commits into from
Dec 16, 2024
Merged

Update Manifest v12 json schema #130

merged 2 commits into from
Dec 16, 2024

Conversation

ulixius9
Copy link
Contributor

@ulixius9 ulixius9 commented Dec 15, 2024

User description

Current json schema does not match the original v12 schema https://schemas.getdbt.com/dbt/manifest/v12.json

hence when parsing manifest file in some cases where offset_to_grain is not one of the previous enums it fails to parse.

This pr updates the json schema and updates the model.


PR Type

Bug fix


Description

  • Modified manifest schema to accept any string value for time granularity fields instead of restricting to predefined enum values
  • Consolidated duplicate OffsetWindow classes into a single reusable class
  • Updated JSON schema to match the original v12 schema specification
  • Fixed parsing issues when offset_to_grain contains values outside the previous enum set

Changes walkthrough 📝

Relevant files
Enhancement
manifest_v12.py
Refactor time granularity fields to accept string values 

dbt_artifacts_parser/parsers/manifest/manifest_v12.py

  • Removed enum restrictions for time-related fields (granularity,
    offset_to_grain, time_granularity)
  • Consolidated duplicate OffsetWindow classes into a single class
  • Changed granularity and time-related fields from enum to string type
  • +34/-104
    manifest_v12.json
    Update JSON schema to support flexible time granularity values

    dbt_artifacts_parser/resources/manifest/manifest_v12.json

  • Updated JSON schema to allow any string value for time granularity
    fields
  • Removed enum restrictions from granularity, offset_to_grain, and
    time_granularity properties
  • +20/-260

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    @ulixius9 ulixius9 requested a review from yu-iskw as a code owner December 15, 2024 10:43
    Copy link

    coderabbitai bot commented Dec 15, 2024

    Walkthrough

    The pull request introduces significant modifications to the manifest_v12.py file, focusing on restructuring classes and attribute types within the dbt artifacts parser. The changes primarily involve converting certain attributes from Granularity enum to string types, introducing a new TimeGranularity enum, and consolidating multiple OffsetWindow classes into a single implementation. These modifications aim to simplify the manifest structure, enhance type flexibility, and improve the overall maintainability of the parsing logic.

    Changes

    File Change Summary
    dbt_artifacts_parser/parsers/manifest/manifest_v12.py - Converted granularity and offset_to_grain attributes to str type in multiple classes
    - Added TimeGranularity enum with various time unit definitions
    - Updated TypeParams2 to include time_granularity attribute
    - Consolidated multiple OffsetWindow classes into a single implementation

    Possibly related PRs

    • Fix the returned types #79: Updates to the parse_manifest function to support additional manifest versions, directly complementing the structural changes in manifest_v12.py

    Poem

    🐰 In the realm of dbt's grand design,
    Where manifests dance and types align,
    Enums transform, classes refine,
    A rabbit's code, both clear and fine!
    Parsing magic, version twelve's shrine! 🔍

    Tip

    CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. This feature will be included in our Pro Plan when released.


    Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

    ❤️ Share
    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Generate unit testing code for this file.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai generate unit testing code for this file.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and generate unit testing code.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Data Validation
    Removing enum restrictions on time granularity fields allows any string value. Consider adding input validation to ensure values are still reasonable time units.

    Code Duplication
    TimeGranularity enum is still defined but no longer used after switching to string type. Consider removing unused enum definition.

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add validation to ensure time granularity values match the allowed set of values

    Add input validation to ensure that string values for granularity and
    time_granularity fields match the previously defined enum values, to maintain data
    integrity and prevent invalid time granularity values.

    dbt_artifacts_parser/parsers/manifest/manifest_v12.py [1621-1626]

     class OffsetWindow(BaseParserModel):
         model_config = ConfigDict(
             extra='forbid',
         )
         count: int
         granularity: str
    +    
    +    @field_validator('granularity')
    +    def validate_granularity(cls, v):
    +        valid_values = {'nanosecond', 'microsecond', 'millisecond', 'second', 
    +                      'minute', 'hour', 'day', 'week', 'month', 'quarter', 'year'}
    +        if v not in valid_values:
    +            raise ValueError(f'Invalid granularity value. Must be one of: {valid_values}')
    +        return v
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: The suggestion adds important validation to ensure data integrity is maintained after removing enum restrictions, preventing invalid time granularity values from being used. This helps catch errors early and maintains backwards compatibility.

    8

    Copy link

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 1

    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 19a5d05 and c6f8e7d.

    ⛔ Files ignored due to path filters (1)
    • dbt_artifacts_parser/resources/manifest/manifest_v12.json is excluded by !**/*.json
    📒 Files selected for processing (1)
    • dbt_artifacts_parser/parsers/manifest/manifest_v12.py (14 hunks)
    🔇 Additional comments (13)
    dbt_artifacts_parser/parsers/manifest/manifest_v12.py (13)

    3226-3227: Ensure consistency in 'offset_window' and 'offset_to_grain' attributes

    The offset_window and offset_to_grain attributes have been updated to Optional[OffsetWindow] and Optional[str]. Verify that all code interacting with these attributes in this class is updated accordingly.


    3244-3245: Ensure consistency in 'offset_window' and 'offset_to_grain' attributes

    The offset_window and offset_to_grain attributes have been updated similarly in this section. Please refer to the earlier comments for verification steps.


    3270-3271: Ensure consistency in 'offset_window' and 'offset_to_grain' attributes

    These attributes have undergone the same type changes. Ensure all related code is updated accordingly.


    3397-3397: Adjust code for 'time_granularity' type change in Disabled10 class

    The time_granularity attribute in the Disabled10 class is now Optional[str]. Ensure that any code utilizing this attribute handles the new type appropriately.

    Run the following script to find usages of Disabled10.time_granularity:


    1777-1778: Review changes in CumulativeTypeParams class attributes

    The window attribute now uses Optional[Window1], and grain_to_date is updated to Optional[str]. Ensure that the codebase correctly handles these updated types wherever CumulativeTypeParams is utilized.

    Run the following script to find usages of CumulativeTypeParams attributes:


    3334-3335: Update 'CumulativeTypeParams1' class attributes

    The window attribute is updated to Optional[Window4], and grain_to_date is Optional[str]. Verify that these changes are accommodated in all relevant parts of the code.

    Run the following script to find usages of CumulativeTypeParams1 attributes:


    1744-1744: Ensure 'granularity' type consistency in Window1 class

    The granularity attribute in the Window1 class has been changed to type str. Please verify that all instances where Window1 is used are updated accordingly and that any related logic is adjusted.

    Run the following script to locate usages of Window1.granularity:


    1654-1655: Confirm updates to 'Denominator' class attributes

    The offset_window and offset_to_grain attributes in the Denominator class have been updated to Optional[OffsetWindow] and Optional[str] respectively. Ensure that any code utilizing these attributes is updated to handle the new types.

    Run the following script to check references to Denominator class attributes:


    1637-1637: Verify the type change of 'offset_to_grain' in the Numerator class

    The offset_to_grain attribute in the Numerator class is now of type Optional[str]. Please ensure that this change is consistently reflected throughout the codebase and that any logic depending on this attribute is updated accordingly.

    Run the following script to locate usages of offset_to_grain:

    ✅ Verification successful

    Let me gather more information about the type definition and usage.


    Let me try one more verification to understand the type changes across versions.


    Type change in offset_to_grain is an intentional API evolution

    The change from Optional[OffsetToGrain] (an enum) to Optional[str] in manifest v12 appears to be an intentional API evolution. Previous versions (v10, v11) defined OffsetToGrain as an enum with a 'day' value, while v12 simplifies this to use strings directly. This change is consistently applied in both Numerator and Numerator1 classes within the same file.

    • manifest_v10.py: Uses Optional[OffsetToGrain] with enum
    • manifest_v11.py: Uses Optional[OffsetToGrain] with enum
    • manifest_v12.py: Uses Optional[str] in both Numerator classes
    🏁 Scripts executed

    The following scripts were executed for the analysis:

    Script:

    #!/bin/bash
    # Description: Find all usages of 'offset_to_grain' outside the updated file.
    
    # Search for 'offset_to_grain' in all Python files, excluding the current file.
    rg --type py 'offset_to_grain' --glob '!dbt_artifacts_parser/parsers/manifest/manifest_v12.py'
    

    Length of output: 306


    Script:

    #!/bin/bash
    # Check the definition and usage of OffsetToGrain type and its context
    ast-grep --pattern 'class $_ {
      $$$
      offset_to_grain: $_
      $$$
    }'
    
    # Also check for any direct string assignments or comparisons
    rg --type py "offset_to_grain.*=.*['\"]" -A 2
    

    Length of output: 119


    Script:

    #!/bin/bash
    # Check imports and type definitions in all manifest versions
    rg --type py "^from.*OffsetToGrain|^class.*OffsetToGrain|type.*OffsetToGrain" -B 1 -A 1
    
    # Check the Numerator class definition in v12
    rg --type py "class Numerator" -B 2 -A 10 dbt_artifacts_parser/parsers/manifest/manifest_v12.py
    

    Length of output: 1153


    1850-1850: Adjust code for 'time_granularity' type change in Metrics class

    The time_granularity attribute in the Metrics class is now Optional[str]. Verify that all usages of this attribute are updated to handle the new type, and update any validations or dependent code as necessary.

    Run the following script to find usages of Metrics.time_granularity:


    1694-1695: Update usages of 'Metric' class attributes

    The offset_window and offset_to_grain attributes in the Metric class are now Optional[OffsetWindow] and Optional[str]. Verify that all code interacting with these attributes accommodates the type changes.

    Run the following script to find usages of Metric class attributes:

    ✅ Verification successful

    Type changes in Metric class are consistent with previous versions

    The type changes for offset_window and offset_to_grain in manifest_v12.py are actually maintaining consistency with previous manifest versions (v10 and v11), where these attributes were already defined as Optional types. The only difference is the type of offset_window changing from MetricTimeWindow to OffsetWindow, and offset_to_grain from OffsetToGrain to str, which appears to be a deliberate type refinement.

    🏁 Scripts executed

    The following scripts were executed for the analysis:

    Script:

    #!/bin/bash
    # Description: Find usages of 'Metric' class attributes outside the updated file.
    
    # Search for references to 'Metric.offset_window' and 'Metric.offset_to_grain'.
    rg --type py 'Metric\.(offset_window|offset_to_grain)' --glob '!dbt_artifacts_parser/parsers/manifest/manifest_v12.py'
    

    Length of output: 120


    Script:

    #!/bin/bash
    # Let's try a broader search to find any references to these attributes
    # First, let's find all Python files that mention 'offset_window' or 'offset_to_grain'
    rg --type py -l "(offset_window|offset_to_grain)"
    
    # Then, let's also look for any usage of Metric class to understand the context
    rg --type py -l "class Metric"
    

    Length of output: 726


    Script:

    #!/bin/bash
    # Let's examine how these attributes are used in v10 and v11 manifest files
    rg -A 2 -B 2 "(offset_window|offset_to_grain)" dbt_artifacts_parser/parsers/manifest/manifest_v10.py dbt_artifacts_parser/parsers/manifest/manifest_v11.py
    
    # Also let's check if there are any type hints for these attributes in previous versions
    ast-grep --pattern 'class Metric:
      $$$
      offset_window: $_
      $$$'
    

    Length of output: 1248


    1626-1626: Ensure consistent handling of 'granularity' attribute

    The granularity attribute in the OffsetWindow class has been changed to type str from the Granularity enum. Ensure that all references to this attribute across the codebase are updated accordingly, and that any enum-based validations or constraints are appropriately adjusted.

    Run the following script to identify usages of the granularity attribute:


    3315-3315: Verify 'granularity' type change in Window4 class

    The granularity attribute in the Window4 class is now of type str. Ensure that this change is propagated throughout the codebase where Window4 is utilized.

    Run the following script to find usages of Window4.granularity:

    Comment on lines +3602 to +3615
    class TimeGranularity(Enum):
    nanosecond = 'nanosecond'
    microsecond = 'microsecond'
    millisecond = 'millisecond'
    second = 'second'
    minute = 'minute'
    hour = 'hour'
    day = 'day'
    week = 'week'
    month = 'month'
    quarter = 'quarter'
    year = 'year'


    Copy link

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Consider using 'TimeGranularity' enum for 'granularity' attributes

    The TimeGranularity enum is now defined, providing a set of valid time granularities. To enhance type safety and maintain consistency, consider updating the granularity attributes in the respective classes to use TimeGranularity instead of str.

    Apply this diff to update the OffsetWindow class:

     class OffsetWindow(BaseParserModel):
         count: int
    -    granularity: str
    +    granularity: TimeGranularity

    Repeat similar updates for other classes where granularity is currently typed as str.

    Committable suggestion skipped: line range outside the PR's diff.

    Copy link
    Owner

    @yu-iskw yu-iskw left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    @ulixius9 Thank you for your contribution.

    @yu-iskw yu-iskw merged commit 26539ed into yu-iskw:main Dec 16, 2024
    6 checks passed
    @yu-iskw
    Copy link
    Owner

    yu-iskw commented Dec 16, 2024

    This change has been released in v0.8.0.

    https://pypi.org/project/dbt-artifacts-parser/0.8.0/

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants