-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size #11051
Conversation
WalkthroughThe recent updates to the DataHub CLI bring important enhancements for managing data ingestion. A new Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
docs/how/updating-datahub.md (2)
84-84
: Possible missing article.Consider revising the sentence for clarity.
- mimick transformers when a transformer for aspect being written does not exist. + mimic transformers when a transformer for the aspect being written does not exist.Tools
LanguageTool
[uncategorized] ~84-~84: Possible missing article found.
Context: ...ick transformers when a transformer for aspect being written does not exist. - #11051 ...(AI_HYDRA_LEO_MISSING_THE)
85-85
: Document the character limit for summary text.The documentation should clearly state the reason for limiting the summary text to 800,000 characters to avoid generating oversized
dataHubExecutionRequestResult
objects.- Ingestion reports will now trim the summary text to a maximum of 800k characters to avoid generating `dataHubExecutionRequestResult` that are too large for GMS to handle. + Ingestion reports will now trim the summary text to a maximum of 800,000 characters to avoid generating `dataHubExecutionRequestResult` objects that are too large for GMS to handle.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- docs/how/updating-datahub.md (1 hunks)
- metadata-ingestion/src/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py (1 hunks)
Additional context used
LanguageTool
docs/how/updating-datahub.md
[uncategorized] ~84-~84: Possible missing article found.
Context: ...ick transformers when a transformer for aspect being written does not exist. - #11051 ...(AI_HYDRA_LEO_MISSING_THE)
Additional comments not posted (1)
metadata-ingestion/src/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py (1)
212-214
: Ensure the truncation logic is correct.The truncation logic ensures that the
summary
does not exceed 800,000 characters. This approach mitigates the risk of exceeding GMS's payload limit.However, consider adding a comment to explain why 800,000 characters were chosen as the limit for future maintainability.
metadata-ingestion/src/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py (3 hunks)
Additional comments not posted (1)
metadata-ingestion/src/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py (1)
214-216
: Ensure truncation logic is appropriate.The truncation logic retains only the last 800,000 characters of the summary. Verify that this does not remove critical information from the summary.
When ingestion jobs have a large log output (i.e. debug logging enabled), the request to send the
dataHubExecutionRequestResult
aspect will fail with 413 payload too large error.This will then cause the entire job to exit with status 1, even if the ingestion logic itself run was successful.
Checklist
Summary by CodeRabbit
--run-id
parameter for theput
command, allowing users to associate data writes with specific ingestion processes.DatahubClientConfig
, requiring users to initialize or set environment variables for configuration.