Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/ibis and Ibis plugin #725

Merged
merged 13 commits into from
Mar 5, 2024
Merged

examples/ibis and Ibis plugin #725

merged 13 commits into from
Mar 5, 2024

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Mar 1, 2024

Example on how to use Hamilton + Ibis for feature engineering and machine learning model training.

Changes

  • added a examples/ibis directory with table-level and column-level feature engineering
  • the example includes machine learning training with ibisml for preprocessing
  • added hamilton.plugins.ibis_extensions to support column-level operations on Ibis tables
  • the ibis_extensions also has a SchemaValidatorIbis that uses ibis Schema().equals() (docs)

How I tested this

  • currently no tests have been added; will take advice on what tests are required. I can start by looking at hamilton.plugins.vaex_extensions

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Ellipsis 🚀 This PR description was created by Ellipsis for commit 2dadf51.

Summary:

This PR introduces an example of using Hamilton with Ibis for feature engineering and machine learning model training, adds a new plugin for supporting column-level operations on Ibis tables, and modifies hamilton/function_modifiers/base.py to include 'ibis' in the list of plugin modules.

Key points:

  • Introduced an example of using Hamilton with Ibis for feature engineering and machine learning model training.
  • Added a new plugin for supporting column-level operations on Ibis tables.
  • Modified hamilton/function_modifiers/base.py to include 'ibis' in the list of plugin modules.
  • Added examples/ibis directory with scripts for table-level and column-level feature engineering.
  • Included machine learning training with ibisml for preprocessing in the example.
  • Included a SchemaValidatorIbis in the ibis_extensions for schema validation.

Generated with ❤️ by ellipsis.dev

@zilto zilto requested a review from elijahbenizzy March 1, 2024 16:06
@zilto zilto added the enhancement New feature or request label Mar 1, 2024
@zilto zilto marked this pull request as ready for review March 3, 2024 19:11
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Reviewed the entire pull request up to 85d7554
  • Looked at 471 lines of code in 6 files
  • Took 49 seconds to review
More info
  • Skipped 5 files when reviewing.
  • Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.
1. examples/ibis/model_training.py:1:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The PR lacks tests for the new functionality. Please add tests to ensure the new functionality works as expected.
  • Reasoning:
    The PR adds support for Ibis in the Hamilton library. It includes examples of how to use Hamilton with Ibis for feature engineering and machine learning model training. The code seems to follow good practices, and the logic seems sound. However, there are no tests included in the PR, which is a concern. The author should add tests to ensure the new functionality works as expected.

Workflow ID: wflow_7SBEevi6qQlLm96O


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

@zilto zilto merged commit 2dadf51 into main Mar 5, 2024
23 checks passed
@zilto zilto deleted the examples/ibis branch March 5, 2024 13:28
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on 2dadf51
  • Looked at 480 lines of code in 6 files
  • Took 4 minutes and 53 seconds to review
More info
  • Skipped 1 files when reviewing.
  • Skipped posting 6 additional comments because they didn't meet confidence threshold of 50%.
1. hamilton/plugins/ibis_extensions.py:69:
  • Assessed confidence : 10%
  • Comment:
    The ibis_extensions.py file is adding support for Ibis tables and columns in the Hamilton framework. It defines a new SchemaValidatorIbis class for schema validation of Ibis tables. The get_column_ibis function is registered to handle column extraction from Ibis tables. The code seems to follow the best practices and there are no apparent logical or performance issues.
  • Reasoning:
    The ibis_extensions.py file is adding support for Ibis tables and columns in the Hamilton framework. It defines a new SchemaValidatorIbis class for schema validation of Ibis tables. The get_column_ibis function is registered to handle column extraction from Ibis tables. The code seems to follow the best practices and there are no apparent logical or performance issues.
2. hamilton/function_modifiers/base.py:37:
  • Assessed confidence : 10%
  • Comment:
    The base.py file in the function_modifiers directory has been modified to include 'ibis' in the list of plugin modules. This change is necessary to enable the use of the Ibis plugin that has been added in this PR. The change is correctly implemented and there are no apparent logical or performance issues.
  • Reasoning:
    The base.py file in the function_modifiers directory has been modified to include 'ibis' in the list of plugin modules. This change is necessary to enable the use of the Ibis plugin that has been added in this PR. The change is correctly implemented and there are no apparent logical or performance issues.
3. examples/ibis/table_dataflow.py:48:
  • Assessed confidence : 10%
  • Comment:
    The table_dataflow.py file in the ibis examples directory provides an example of table-level feature engineering using Ibis and Hamilton. The raw_table function reads a CSV file into an Ibis table and renames the columns to snake_case. The feature_table function adds new feature columns to the table. The feature_set function selects feature columns and filters rows. The code seems to follow the best practices and there are no apparent logical or performance issues.
  • Reasoning:
    The table_dataflow.py file in the ibis examples directory provides an example of table-level feature engineering using Ibis and Hamilton. The raw_table function reads a CSV file into an Ibis table and renames the columns to snake_case. The feature_table function adds new feature columns to the table. The feature_set function selects feature columns and filters rows. The code seems to follow the best practices and there are no apparent logical or performance issues.
4. examples/ibis/run.py:83:
  • Assessed confidence : 10%
  • Comment:
    The run.py file in the ibis examples directory is the main script to run the Ibis examples. It imports the necessary modules and defines the main function which builds the Hamilton driver with the appropriate dataflow components based on the command-line arguments. The main function also visualizes the execution of the driver and prints the keys of the result. The code seems to follow the best practices and there are no apparent logical or performance issues.
  • Reasoning:
    The run.py file in the ibis examples directory is the main script to run the Ibis examples. It imports the necessary modules and defines the main function which builds the Hamilton driver with the appropriate dataflow components based on the command-line arguments. The main function also visualizes the execution of the driver and prints the keys of the result. The code seems to follow the best practices and there are no apparent logical or performance issues.
5. examples/ibis/model_training.py:172:
  • Assessed confidence : 10%
  • Comment:
    The model_training.py file in the ibis examples directory provides an example of machine learning model training using Ibis and Hamilton. It defines several functions for model training, data preprocessing, and cross-validation. The base_model__linear, base_model__random_forest, and base_model__boosting functions define the base models for linear regression, random forest regression, and gradient boosting regression, respectively. The preprocessing_recipe function defines the preprocessing steps. The data_split function generates indices for train/validation splits. The prepare_data function splits the data and applies the preprocessing recipe. The cross_validation_fold function trains the model and makes predictions on the validation set. The cross_validation_fold_collection function collects the results from cross-validation folds. The prediction_table function creates a table with cross-validation predictions. The store_predictions function stores the cross-validation predictions table. The train_full_model function trains a model on the full dataset for inference. The code seems to follow the best practices and there are no apparent logical or performance issues.
  • Reasoning:
    The model_training.py file in the ibis examples directory provides an example of machine learning model training using Ibis and Hamilton. It defines several functions for model training, data preprocessing, and cross-validation. The base_model__linear, base_model__random_forest, and base_model__boosting functions define the base models for linear regression, random forest regression, and gradient boosting regression, respectively. The preprocessing_recipe function defines the preprocessing steps. The data_split function generates indices for train/validation splits. The prepare_data function splits the data and applies the preprocessing recipe. The cross_validation_fold function trains the model and makes predictions on the validation set. The cross_validation_fold_collection function collects the results from cross-validation folds. The prediction_table function creates a table with cross-validation predictions. The store_predictions function stores the cross-validation predictions table. The train_full_model function trains a model on the full dataset for inference. The code seems to follow the best practices and there are no apparent logical or performance issues.
6. examples/ibis/column_dataflow.py:61:
  • Assessed confidence : 10%
  • Comment:
    The column_dataflow.py file in the ibis examples directory provides an example of column-level feature engineering using Ibis and Hamilton. The raw_table function reads a CSV file into an Ibis table and renames the columns to snake_case. The has_children, has_pet, and is_summer_brazil functions define new feature columns based on the existing columns. The feature_table function adds the new feature columns to the table. The feature_set function selects feature columns and filters rows. The code seems to follow the best practices and there are no apparent logical or performance issues.
  • Reasoning:
    The column_dataflow.py file in the ibis examples directory provides an example of column-level feature engineering using Ibis and Hamilton. The raw_table function reads a CSV file into an Ibis table and renames the columns to snake_case. The has_children, has_pet, and is_summer_brazil functions define new feature columns based on the existing columns. The feature_table function adds the new feature columns to the table. The feature_set function selects feature columns and filters rows. The code seems to follow the best practices and there are no apparent logical or performance issues.

Workflow ID: wflow_Ne78jOX24ONR8E0K


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants