split content of `examples/ibis` into `examples/ibisml` #741

zilto · 2024-03-05T19:37:46Z

Split the code from examples/ibis and examples/ibisml.

This includes a few fixes:

temporarily removed schema validator to investigate potential bug
changed cross-validation dataflow structure because of task executor edge case
fixed requirements
keep only table-level dataflow for ibisml example

Changes

How I tested this

ibis example runs for table and column level
ibisml example runs for linear, random_forest, and boosting models

Notes

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

	🚀 This PR description was created by Ellipsis for commit `26dc7c8`.

Summary:

This PR splits the examples/ibis into examples/ibisml, removes column_dataflow.py, makes changes to model_training.py and run.py, fixes requirements, changes cross-validation dataflow structure, keeps only table-level dataflow for ibisml example, and temporarily removes schema validator.

Key points:

Split the content of examples/ibis into examples/ibisml
Removed column_dataflow.py from examples/ibisml
Made changes to model_training.py and run.py
Fixed requirements and changed cross-validation dataflow structure
Kept only table-level dataflow for ibisml example
Temporarily removed schema validator

Generated with ❤️ by ellipsis.dev

- Added a README with the basic instructions to run the example and additional resources links - removed the schema validator from `table_dataflow.py` because it's causing some unexpected bugs; currently investigating - the IbisML team is saying their API is under refactor

ellipsis-dev

👍 Looks good to me!

Reviewed the entire pull request up to 7ab2b5d
Looked at 363 lines of code in 7 files
Took 1 minute and 47 seconds to review

More info

Skipped 4 files when reviewing.
Skipped posting 4 additional comments because they didn't meet confidence threshold of 50%.

1. examples/ibis/table_dataflow.py:25:

Assessed confidence : 50%
Comment:
The schema validator has been removed from the feature_set function. Please ensure to add it back once the potential bug has been investigated.
Reasoning:
The PR description mentions that the schema validator was temporarily removed to investigate a potential bug. This is confirmed by the removal of the @check_output decorator in table_dataflow.py. This could potentially lead to issues if the output of the feature_set function does not match the expected schema. However, since this is a temporary measure and the author is aware of it, it's not a major concern.

2. examples/ibisml/model_training.py:102:

Assessed confidence : 50%
Comment:
The cross_validation_fold function now returns the actual validation set instead of the indices. Please ensure that all parts of the codebase that use this function have been updated to handle the new return type.
Reasoning:
The cross_validation_fold function in model_training.py has been modified to return the actual validation set instead of the indices. This is a logical change that should improve the function's usability. However, it's important to ensure that the rest of the codebase that uses this function has been updated to handle the new return type.

3. examples/ibisml/model_training.py:82:

Assessed confidence : 0%
Comment:
Good job on ensuring that the y_train and y_val outputs are converted to numpy arrays and reshaped. This ensures compatibility with the scikit-learn models used later in the code.
Reasoning:
The prepare_data function in model_training.py has been updated to convert the y_train and y_val outputs to numpy arrays and reshape them. This is likely done to ensure compatibility with the scikit-learn models used later in the code. This is a good practice as it ensures that the data is in the correct format for the model.

4. examples/ibisml/run.py:19:

Assessed confidence : 0%
Comment:
The run.py script is well-structured and follows good practices such as using argparse for command-line arguments and using a main function to encapsulate the script's functionality.
Reasoning:
The run.py file in the ibisml directory has been added as part of this PR. It appears to be a script to run the machine learning model training process using the Hamilton library. The script seems to be well-structured and follows good practices such as using argparse for command-line arguments and using a main function to encapsulate the script's functionality.

Workflow ID: wflow_xwTqm5G8DH2oQfTM

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 26dc7c8
Looked at 164 lines of code in 3 files
Took 9 minutes and 25 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. examples/ibisml/model_training.py:89:

Assessed confidence : 100%
Grade: 0%
Comment:
The data_split parameter is not used in this function. If it's not needed, consider removing it to avoid confusion.


def cross_validation_fold(
    X_train: pd.DataFrame,
    X_val: pd.DataFrame,
    y_train: pd.DataFrame,
    y_val: pd.DataFrame,
    base_model: BaseEstimator,
) -> dict:
    """Train model and make predictions on validation"""
    model = clone(base_model)

    model.fit(X_train, y_train)

    y_val_pred = model.predict(X_val)
    score = mean_squared_error(y_val, y_val_pred)

    return dict(y_true=y_val, y_pred=y_val_pred, score=score)

Reasoning:
The code seems to be well-structured and follows good practices. However, there is a minor issue in the model_training.py file. The function cross_validation_fold has a parameter data_split which is not used in the function. This could be a leftover from a previous version of the code and should be removed if it's not needed.

Workflow ID: wflow_4OCqSWQtPEhyUO0e

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

zilto added 4 commits March 5, 2024 13:51

split ibis and ibisml

6fdeed8

updated ibisml directory

285474b

simplified to only table-level dataflow

7ab2b5d

zilto requested a review from skrawcz March 5, 2024 19:37

ellipsis-dev bot reviewed Mar 5, 2024

View reviewed changes

skrawcz approved these changes Mar 5, 2024

View reviewed changes

skrawcz merged commit 26dc7c8 into main Mar 5, 2024
23 checks passed

skrawcz deleted the examples/ibisml branch March 5, 2024 20:47

ellipsis-dev bot reviewed Mar 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split content of `examples/ibis` into `examples/ibisml` #741

split content of `examples/ibis` into `examples/ibisml` #741

zilto commented Mar 5, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

split content of examples/ibis into examples/ibisml #741

split content of examples/ibis into examples/ibisml #741

Conversation

zilto commented Mar 5, 2024 • edited by ellipsis-dev bot Loading

Changes

How I tested this

Notes

Checklist

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

split content of `examples/ibis` into `examples/ibisml` #741

split content of `examples/ibis` into `examples/ibisml` #741

zilto commented Mar 5, 2024 •

edited by ellipsis-dev bot

Loading