Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Python SDK so FeatureSet can import Schema from Tensorflow metadata #450

Conversation

davidheryanto
Copy link
Collaborator

@davidheryanto davidheryanto commented Jan 30, 2020

What this PR does / why we need it:
This PR extends FeatureSet, Entity and Feature class in Python SDK so they can contain constraints defined in the Schema from Tensorflow metadata:
https://github.com/tensorflow/metadata/blob/ddf582f66eeeddb862de6d53c3e03d6eed1c04a6/tensorflow_metadata/proto/v0/schema.proto

These contraints presence_constraints, shape_type and domain_info can be used for validation of feature values and presence in Feast. Typical usage when one is already using Tensorflow data validation is as follows:

import tensorflow_data_validation as tfdv

# Use tensorflow_data_validation to generate initial schema for validation
train_stats = tfdv.generate_statistics_from_csv(data_location="/data/train.csv")
schema = tfdv.infer_schema(statistics=train_stats)

# Create a new FeatureSet or retrieve an existing FeatureSet in Feast
feature_set = FeatureSet(
            name="myfeatureset",
            entities=[Entity(name="id", dtype=ValueType.INT64),],
            features=[
                Feature(name="feature1", dtype=ValueType.STRING),
                Feature(name="feature2", dtype=ValueType.INT), ...
            ],
        )
# Update the entities and features with constraints defined in the schema
feature_set.update_schema(schema)

Which issue(s) this PR fixes:

Related to #172

Does this PR introduce a user-facing change?:

NONE

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidheryanto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Jan 30, 2020

Other than the above, it looks good as a first cut.

@davidheryanto
Copy link
Collaborator Author

/hold
Need to resolve comments and add method to export schema from feature set.

@davidheryanto davidheryanto force-pushed the update-python-sdk-import-export-tf-metadata-schema branch from 7966a5d to 9a1f24a Compare January 31, 2020 08:11
@davidheryanto
Copy link
Collaborator Author

/hold cancel

@woop
Copy link
Member

woop commented Feb 11, 2020

@davidheryanto this PR contains the same code as #449

@davidheryanto
Copy link
Collaborator Author

Yes because the end to end tests were written in Python and it depends on the Python SDK,
so for the end to end tests I have to include changes in the Python SDK too.

@ches ches added this to the v0.5.0 milestone Feb 14, 2020
@davidheryanto davidheryanto force-pushed the update-python-sdk-import-export-tf-metadata-schema branch from 9439ea1 to ffc26fc Compare March 1, 2020 08:12
@ches ches mentioned this pull request Mar 7, 2020
sdk/python/feast/entity.py Outdated Show resolved Hide resolved
sdk/python/feast/feature.py Outdated Show resolved Hide resolved
sdk/python/feast/field.py Outdated Show resolved Hide resolved
davidheryanto and others added 8 commits April 10, 2020 14:18
- Update documentation for properties in Field
- Deduplication refactoring in FeatureSet
They are not necessary for now and to avoid unexpected breaking changes.
In import_tfx_schema method, the domain info is first made inline so there is no need to have schema level domain info when updating Feast Entity and Feature.

Also added documentation to setter property methods in Field.py
@zhilingc zhilingc force-pushed the update-python-sdk-import-export-tf-metadata-schema branch from 283d52d to 6af74c6 Compare April 10, 2020 08:55
@zhilingc
Copy link
Collaborator

/lgtm

@feast-ci-bot feast-ci-bot merged commit e7482af into feast-dev:master Apr 10, 2020
@ches ches added area/sdks kind/feature New feature or request labels Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants