v0.9.0
Breaking Changes
We have changed the execution engine for derived features to Spark SQL so this might introduce a little bit breaking changes for users who is not running the up-to-date sample notebooks. Specifically, they might face this failure:
Preprocessed DataFrames are:
{'feature_user_age,feature_user_gift_card_balance,feature_user_has_valid_credit_card,feature_user_tax_rate': JavaObject id=o243}
Traceback (most recent call last):
File "feathr_pyspark_driver.py", line 107, in <module>
submit_spark_job(feature_names_funcs)
File "feathr_pyspark_driver.py", line 85, in submit_spark_job
py4j_feature_job.mainWithPreprocessedDataFrame(job_param_java_array, new_preprocessed_df_map)
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
return_value = get_return_value(
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
pyspark.sql.utils.AnalysisException: Undefined function: 'toBoolean'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 84
)
Users should change:
feature_user_purchasing_power = DerivedFeature(name="feature_user_purchasing_power",
key=user_id,
feature_type=FLOAT,
input_features=[
feature_user_gift_card_balance, feature_user_has_valid_credit_card],
transform="feature_user_gift_card_balance + if_else(toBoolean(feature_user_has_valid_credit_card), 100, 0)")
to
feature_user_purchasing_power = DerivedFeature(name="feature_user_purchasing_power",
key=user_id,
feature_type=FLOAT,
input_features=[
feature_user_gift_card_balance, feature_user_has_valid_credit_card],
transform="feature_user_gift_card_balance + if(boolean(feature_user_has_valid_credit_card), 100, 0)")
What's Changed
- Fix a feature type bug by @jaymo001 in #701
- Fix wheel building problem in Windows by @xiaoyongzhu in #702
- Fix Purview+RBAC registry web app issue by @Yuqing-cat in #700
- Remove hard coded resources in docs by @enya-yx in #696
- Add e2e test for purview registry and rbac registry by @blrchen in #689
- Update tests use runtime jar from maven for spark submission to cover Databricks by @blrchen in #706
- Enhance databricks submission error message by @enya-yx in #710
- Enhance purview registry error messages by @blrchen in #709
- [WIP] hot fix databricks es dependency issue by @Yuqing-cat in #713
- Fix materialize to sql e2e test failure by @blrchen in #717
- Add Data Models in Feathr by @hyingyang-linkedin in #659
- Revert "Enhance purview registry error messages (#709)" by @blrchen in #720
- Improve Avro GenericRecord and SpecificRecord based row-level extractor performance by @jaymo001 in #723
- Fix lookup feature missing issue when converting feature definition to HOCON files by @jaymo001 in #732
- Fix function string parsing by @loomlike in #725
- Apply a same credential within each sample [ Docs ] by @enya-yx in #718
- Enable incremental for HDFS sink by @enya-yx in #695
- #492 fix, fail only if different sources have same name by @windoze in #733
- Remove unused credentials and deprecated purview settings by @enya-yx in #708
- Revoke adb token submitted by mistaken by @blrchen in #730
- Fix synapse errors not print out issue by @enya-yx in #734
- Spark config passing bug fix for local spark submission by @loomlike in #729
- Fix direct purview client missing transformation by @YihuiGuo in #736
- Support SQL expression in derived feature transformation by @jaymo001 in #731
- Support SWA with groupBy to 1d tensor conversion by @jaymo001 in #748
- Rijai/armfix by @jainr in #742
- bump version to 0.8.2 by @Yuqing-cat in #722
- Added latest deltalake version by @ahlag in #735
- Fix #474 Disable local mode by @windoze in #738
- Allow recreating entities for PurView registry by @windoze in #691
- Adding DevSkim linter to Github actions by @jainr in #657
- Fix icons in UI cannot auto scale (#737) by @Fendoe in #744
- Expose 'timePartitionPattern' in Python API [ WIP ] by @enya-yx in #714
- Setting up component governance pipeline by @jainr in #655
- Add docs to explain on feature materialization behavior by @xiaoyongzhu in #688
- Fix protobuf version by @enya-yx in #711
- Add some notes based on on-call issues by @enya-yx in #753
- Refine spark runtime error message by @Yuqing-cat in #755
- Serialization bug due to version incompatibility between azure-core and msrest by @jainr in #763
- Unify Python SDK Build Version and decouple Feathr Maven Version by @Yuqing-cat in #746
- Replace hard code string in notebook and align with others by @Yuqing-cat in #765
- Add flag to enable generation non-agg features by @windoze in #719
- roll back 0.8.2 version bump by @Yuqing-cat in #771
- Refactor Product Recommendation sample notebook by @jainr in #743
- Update role-management page in UI (#751) by @Fendoe in #764
- Create Feature less module in UI code and import alias by @Fendoe in #768
- Add extra dependencies to setup.py by @loomlike in #773
- Fix Windows compatibility issues by @xiaoyongzhu in #776
- UI: Replace logo icon by @Fendoe in #778
- Refine example notebooks by @loomlike in #756
- UI: Display version by @Fendoe in #779
- Add nightly Notification to PR Test GitHub Action by @Yuqing-cat in #783
- Fix broken links for #743 by @Yuqing-cat in #789
- Update notebook image links for github rendering by @loomlike in #787
- Revert 756 by @blrchen in #798
- remove unnecessary spark job from registry test by @Yuqing-cat in #790
- Revert "Expose 'timePartitionPattern' in Python API [ WIP ]" by @blrchen in #799
- Update CONTRIBUTING.md with committers information by @hangfei in #793
- Fix test_azure_spark_maven_e2e ci test error by @blrchen in #800
- Add failure warning and run link to daily notification by @Yuqing-cat in #802
- Minor documentation update to add info about maven automated workflow by @jainr in #795
- Fix doc dead links by @blrchen in #805
- Fix more dead links on docs by @blrchen in #807
- Improve UI experience and clean up ui code warnings by @Fendoe in #801
- Add release instructions for Release Candidate by @blrchen in #809
- Bump version to 0.9.0-rc1 by @blrchen in #810
- Fix bug in empty array dense tensor default value by @bozhonghu in #806
- Fix sql-based derived feature by @jaymo001 in #812
- Replacing webapp-deploy action with workflow-webhook action. by @jainr in #813
- Fix passthrough feature reference in sql-based derived feature by @jaymo001 in #815
- Revert databricks example notebook until fixing issues by @loomlike in #814
- Add retry logic for purview project-ids logic by @Yuqing-cat in #821
- Bump version to 0.9.0-rc2 by @blrchen in #822
- Fix Not display management menu by @Fendoe in #826
- Update text and link by @Fendoe in #828
- fix sample issues due to derived feature engine change by @xiaoyongzhu in #829
- Add exception if materialize features defined on 'INPUT_CONTEXT' by @enya-yx in #785
- Fix only first Key will show even if multiple keys are added by @Fendoe in #837
- Move the version information to the bottom of the sidemenu. by @Fendoe in #832
- Fix key cannot read properties of undefined (reading 'map') by @Fendoe in #841
- Model by @hyingyang-linkedin in #769
- Bump loader-utils from 2.0.2 to 2.0.3 in /ui by @dependabot in #846
- Maven Package Version Configuration Fix by @Yuqing-cat in #845
- Copy/paste typo by @windoze in #849
- Update outdated docs (WASB_ to BLOB_) by @loomlike in #850
- Update registry nightly deploy CICD by @blrchen in #853
- Windoze/purview registry error log by @windoze in #851
- Fix duplicate action id in registry CICD by @blrchen in #854
- Improve Feathr Client initialization logs by @blrchen in #856
- Enhance error messages of synapse jobs by @enya-yx in #855
- Fix avro files read failure under timePartitionPattern paths by @enya-yx in #808
- Bump version to 0.9.0-rc3 by @blrchen in #860
- Enhance sample notebook by @enya-yx in #848
New Contributors
- @hyingyang-linkedin made their first contribution in #659
- @loomlike made their first contribution in #725
- @Fendoe made their first contribution in #744
- @bozhonghu made their first contribution in #806
Full Changelog: v0.8.0...v0.9.0