Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modular subdag example #1251

Merged
merged 6 commits into from
Dec 12, 2024
Merged

Modular subdag example #1251

merged 6 commits into from
Dec 12, 2024

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Dec 7, 2024

Shows modular subdag example, and improves subdag config behavior.

Changes

  • example for modularity using subdag
  • enables subdag config to handle source and value

How I tested this

  • locally
  • unit tests

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz requested a review from elijahbenizzy December 7, 2024 07:15
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 2d0746b in 35 seconds

More details
  • Looked at 302 lines of code in 8 files
  • Skipped 2 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. examples/model_examples/modular_example/inference.py:7
  • Draft comment:
    Ensure that fit_model.predict returns a pd.DataFrame. If not, consider converting the output to a DataFrame or handling the type accordingly.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. examples/model_examples/modular_example/run.py:14
  • Draft comment:
    The dr.execute line is commented out, so the pipeline won't run. Uncomment it to execute the pipeline.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The run.py script has the dr.execute line commented out, which means the pipeline won't actually run when the script is executed. This might be intentional for testing purposes, but it should be noted.
3. examples/model_examples/modular_example/inference.py:6
  • Draft comment:
    Consider renaming the function predicted_data to something more descriptive, such as generate_predictions, to better reflect its purpose and avoid conflicts with other functions of the same name.
  • Reason this comment was not posted:
    Marked as duplicate.
4. examples/model_examples/modular_example/features.py:8
  • Draft comment:
    Consider adding a comment to explain the purpose of the transformed_data function, as it involves data transformation which might not be immediately clear to all users.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The transformed_data function performs a simple operation of dropping missing values. This is a common and straightforward operation in data processing, and the function name is descriptive enough. The comment suggests adding an explanation, but the operation is not complex enough to warrant additional comments.
    I might be underestimating the need for comments for users who are not familiar with data processing operations. However, the function name is quite descriptive, which should suffice for most users.
    Even for users unfamiliar with data processing, the function name transformed_data and the use of dropna() are clear enough to convey the function's purpose.
    The comment is not necessary as the function is simple and the name is descriptive. The comment should be deleted.

Workflow ID: wflow_yru3LWtVyiY7KYqN


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped PR review on 7495006 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts:

  1. Break it out into a function to make it easier
  2. The API is a bit odd -- source in the rest of Hamilton means that it's another node's output. Maybe we want to add a config/from_config? Or maybe a config mapping? Hmm...

hamilton/function_modifiers/recursive.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on da4cc39 in 41 seconds

More details
  • Looked at 257 lines of code in 6 files
  • Skipped 1 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. examples/model_examples/modular_example/pipeline.py:8
  • Draft comment:
    The import statement for 'configuration' is not used in this file. Consider removing it to clean up the code.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The import statement for 'configuration' in 'pipeline.py' is not used. It should be removed to clean up the code.
2. examples/model_examples/modular_example/run.py:12
  • Draft comment:
    Consider uncommenting the dr.execute line if the intention is to run the pipeline. If it's commented out for testing purposes, please ignore this suggestion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The function run in examples/model_examples/modular_example/run.py is commented out, which might be intentional for testing purposes. However, if this is not intentional, it should be uncommented to ensure the pipeline executes as expected.

Workflow ID: wflow_T4hwTbaPMAYlNIAb


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on d616e14 in 2 minutes and 26 seconds

More details
  • Looked at 45 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_a1896twfhc5So3Ed


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -44,6 +44,10 @@ def fn() -> int:
assert node_copy_copy.name == "rename_fn_again"


np_version = np.__version__
major, minor, _ = map(int, np_version.split("."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numpy version parsing assumes the version string has exactly three components. Consider using packaging.version.parse for more robust version comparison.

),
"other": (float, DependencyType.OPTIONAL),
}
if major == 2 and minor > 1: # greater that 2.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for checking the numpy version and setting the expected dictionary is repeated. Consider refactoring to avoid repetition and adhere to the DRY principle.

@skrawcz skrawcz force-pushed the modular_subdag_example branch from d616e14 to b59b99a Compare December 11, 2024 07:12
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on b59b99a in 29 seconds

More details
  • Looked at 48 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_PERN6kwrxUHr5mQ4


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -44,6 +44,10 @@ def fn() -> int:
assert node_copy_copy.name == "rename_fn_again"


np_version = np.__version__
major, minor, _ = map(int, np_version.split("."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version parsing logic for numpy is incorrect. It assumes the version string has exactly three components, which may not be true for versions like '1.21.0.dev0'. Consider using packaging.version.parse for more robust version handling.

),
"other": (float, DependencyType.OPTIONAL),
}
if major == 2 and minor > 1: # greater that 2.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for determining the expected dictionary based on the numpy version is repeated. Consider refactoring this into a separate function to adhere to the DRY principle.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, two small nits

examples/model_examples/modular_example/README.md Outdated Show resolved Hide resolved
hamilton/function_modifiers/recursive.py Show resolved Hide resolved
@skrawcz skrawcz force-pushed the modular_subdag_example branch from b59b99a to 169862d Compare December 12, 2024 07:14
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 169862d in 27 seconds

More details
  • Looked at 257 lines of code in 6 files
  • Skipped 1 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. examples/model_examples/modular_example/pipeline.py:8
  • Draft comment:
    The import statement for 'configuration' is unnecessary since it's already imported from 'hamilton.function_modifiers'.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The import statement for 'configuration' in 'pipeline.py' is unnecessary since it's already imported from 'hamilton.function_modifiers'.
2. examples/model_examples/modular_example/run.py:6
  • Draft comment:
    The run function is not following the Single Responsibility Principle. Consider splitting it into separate functions for building the driver, displaying functions, and executing the pipeline.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_6LbjNhLOW7mTmAuo


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 4611553 in 18 seconds

More details
  • Looked at 48 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. hamilton/function_modifiers/recursive.py:152
  • Draft comment:
    Consider using a single string for the exception message for better readability and consistency.
                f"`{value}` is not allowed in the config= part of the subdag decorator. Please use `configuration()` or `value()` or literal python values."
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code in recursive.py has a potential issue with the exception message formatting. The message is split across multiple lines, which can be improved for readability and consistency.
2. hamilton/function_modifiers/recursive.py:147
  • Draft comment:
    This function is complex and would benefit from a comment explaining its purpose and logic.
  • Reason this comment was not posted:
    Comment was not on a valid diff hunk.

Workflow ID: wflow_lqla674eX1Z60VE2


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on cf90cc6 in 7 seconds

More details
  • Looked at 10 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. examples/model_examples/modular_example/README.md:34
  • Draft comment:
    Ensure that the image link my_dag_annotated.png is correct and that the image displays as expected. Also, verify that any new functionality or changes are reflected in this README as per the checklist in the PR description.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The README file is clear and concise, explaining the modular pipeline example well. However, the checklist in the PR description mentions that project documentation should be updated if functionality changes. Since this README is part of the example, it might be beneficial to ensure that any new functionality or changes are reflected here. Additionally, the image link at the end should be verified to ensure it displays correctly.
2. examples/model_examples/modular_example/README.md:31
  • Draft comment:
    Consider adding this README to the sphinx documentation under docs/ to enhance visibility and understanding of the modular subdag example.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The README file is part of the documentation and should be included in the sphinx documentation for better visibility and understanding of the modular subdag example.

Workflow ID: wflow_I9XbHofCkOZqX61l


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

To be consistent and to allow people to remap external values,

e.g. subdag config is foo, but we need two different values
because we have two subdags that need it. So we can
remap the values from external config:

Global config: {"foo1": "foo_v1", "foo2": "foo_v2"}
Subdag1 config: {"foo": source("foo1")}
Subdag2 config: {"foo": source{"foo2")}

This also handles if someone wants to use value().

Otherwise to preserve backwards compatibility we need
to do this in a specific order and only override config values
appropriately.
This shows how you can construct a pipeline from components
and then use subdag to parameterize it for reuse.
Named it configuration to not clash with config decorator.
@skrawcz skrawcz force-pushed the modular_subdag_example branch from cf90cc6 to 3e86556 Compare December 12, 2024 07:48
@skrawcz skrawcz merged commit ed90580 into main Dec 12, 2024
20 of 24 checks passed
@skrawcz skrawcz deleted the modular_subdag_example branch December 12, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants