Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split content of examples/ibis into examples/ibisml #741

Merged
merged 4 commits into from
Mar 5, 2024
Merged

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Mar 5, 2024

Split the code from examples/ibis and examples/ibisml.

This includes a few fixes:

  • temporarily removed schema validator to investigate potential bug
  • changed cross-validation dataflow structure because of task executor edge case
  • fixed requirements
  • keep only table-level dataflow for ibisml example

Changes

How I tested this

  • ibis example runs for table and column level
  • ibisml example runs for linear, random_forest, and boosting models

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Ellipsis 🚀 This PR description was created by Ellipsis for commit 26dc7c8.

Summary:

This PR splits the examples/ibis into examples/ibisml, removes column_dataflow.py, makes changes to model_training.py and run.py, fixes requirements, changes cross-validation dataflow structure, keeps only table-level dataflow for ibisml example, and temporarily removes schema validator.

Key points:

  • Split the content of examples/ibis into examples/ibisml
  • Removed column_dataflow.py from examples/ibisml
  • Made changes to model_training.py and run.py
  • Fixed requirements and changed cross-validation dataflow structure
  • Kept only table-level dataflow for ibisml example
  • Temporarily removed schema validator

Generated with ❤️ by ellipsis.dev

zilto added 4 commits March 5, 2024 13:51
- Added a README with the basic instructions to
run the example and additional resources links
- removed the schema validator from `table_dataflow.py`
because it's causing some unexpected bugs; currently
investigating
- the IbisML team is saying their API is under
refactor
@zilto zilto requested a review from skrawcz March 5, 2024 19:37
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Reviewed the entire pull request up to 7ab2b5d
  • Looked at 363 lines of code in 7 files
  • Took 1 minute and 47 seconds to review
More info
  • Skipped 4 files when reviewing.
  • Skipped posting 4 additional comments because they didn't meet confidence threshold of 50%.
1. examples/ibis/table_dataflow.py:25:
  • Assessed confidence : 50%
  • Comment:
    The schema validator has been removed from the feature_set function. Please ensure to add it back once the potential bug has been investigated.
  • Reasoning:
    The PR description mentions that the schema validator was temporarily removed to investigate a potential bug. This is confirmed by the removal of the @check_output decorator in table_dataflow.py. This could potentially lead to issues if the output of the feature_set function does not match the expected schema. However, since this is a temporary measure and the author is aware of it, it's not a major concern.
2. examples/ibisml/model_training.py:102:
  • Assessed confidence : 50%
  • Comment:
    The cross_validation_fold function now returns the actual validation set instead of the indices. Please ensure that all parts of the codebase that use this function have been updated to handle the new return type.
  • Reasoning:
    The cross_validation_fold function in model_training.py has been modified to return the actual validation set instead of the indices. This is a logical change that should improve the function's usability. However, it's important to ensure that the rest of the codebase that uses this function has been updated to handle the new return type.
3. examples/ibisml/model_training.py:82:
  • Assessed confidence : 0%
  • Comment:
    Good job on ensuring that the y_train and y_val outputs are converted to numpy arrays and reshaped. This ensures compatibility with the scikit-learn models used later in the code.
  • Reasoning:
    The prepare_data function in model_training.py has been updated to convert the y_train and y_val outputs to numpy arrays and reshape them. This is likely done to ensure compatibility with the scikit-learn models used later in the code. This is a good practice as it ensures that the data is in the correct format for the model.
4. examples/ibisml/run.py:19:
  • Assessed confidence : 0%
  • Comment:
    The run.py script is well-structured and follows good practices such as using argparse for command-line arguments and using a main function to encapsulate the script's functionality.
  • Reasoning:
    The run.py file in the ibisml directory has been added as part of this PR. It appears to be a script to run the machine learning model training process using the Hamilton library. The script seems to be well-structured and follows good practices such as using argparse for command-line arguments and using a main function to encapsulate the script's functionality.

Workflow ID: wflow_xwTqm5G8DH2oQfTM


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

@skrawcz skrawcz merged commit 26dc7c8 into main Mar 5, 2024
23 checks passed
@skrawcz skrawcz deleted the examples/ibisml branch March 5, 2024 20:47
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on 26dc7c8
  • Looked at 164 lines of code in 3 files
  • Took 9 minutes and 25 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.
1. examples/ibisml/model_training.py:89:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The data_split parameter is not used in this function. If it's not needed, consider removing it to avoid confusion.

def cross_validation_fold(
    X_train: pd.DataFrame,
    X_val: pd.DataFrame,
    y_train: pd.DataFrame,
    y_val: pd.DataFrame,
    base_model: BaseEstimator,
) -> dict:
    """Train model and make predictions on validation"""
    model = clone(base_model)

    model.fit(X_train, y_train)

    y_val_pred = model.predict(X_val)
    score = mean_squared_error(y_val, y_val_pred)

    return dict(y_true=y_val, y_pred=y_val_pred, score=score)
  • Reasoning:
    The code seems to be well-structured and follows good practices. However, there is a minor issue in the model_training.py file. The function cross_validation_fold has a parameter data_split which is not used in the function. This could be a leftover from a previous version of the code and should be removed if it's not needed.

Workflow ID: wflow_4OCqSWQtPEhyUO0e


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants