Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TFRecord as one of the output formats for historical feature retrieval #1222

Merged
merged 14 commits into from
Jan 8, 2021

Conversation

khorshuheng
Copy link
Collaborator

What this PR does / why we need it:
This PR allows user to specify tfrecord as the output format of historical feature retrieval, which is useful in cases where the users wish to generate statistics from the retrieved dataset using tfdv, or if the user's machine learning model is based on tensorflow.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

User can now specify `tfrecord` as the output format of historical feature retrieval.

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment



def test_dataproc_job_tfrecord_output(
dataproc_launcher: DataprocClusterLauncher, # noqa: F811
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is with all these noqa: F811? Can it be fixed please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because we are importing the fixtures from another module, and flake8 doesn't understand pytest fixtures.

One way to fix it would be not to import the fixture and include it directly in test_launchers, though that would mean it's harder for us to reuse the fixture across different test files.

I will research for way on how this can be circumvented without explicitly adding # noqa: F811, but i am not certain if it is possible.

Comment on lines 303 to 307
"Args": ["spark-submit", pyspark_script_path]
+ args
+ ["--packages", ",".join(packages)]
if packages
else [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Args": ["spark-submit", pyspark_script_path]
+ args
+ ["--packages", ",".join(packages)]
if packages
else [],
"Args": ["spark-submit", pyspark_script_path]
+ args
+ (["--packages", ",".join(packages)]
if packages
else []),

@oavdeev
Copy link
Collaborator

oavdeev commented Jan 5, 2021

/test test-end-to-end-sparkop

3 similar comments
@oavdeev
Copy link
Collaborator

oavdeev commented Jan 5, 2021

/test test-end-to-end-sparkop

@oavdeev
Copy link
Collaborator

oavdeev commented Jan 5, 2021

/test test-end-to-end-sparkop

@oavdeev
Copy link
Collaborator

oavdeev commented Jan 5, 2021

/test test-end-to-end-sparkop

@jklegar
Copy link
Collaborator

jklegar commented Jan 5, 2021

/test test-end-to-end-azure

…retrieval

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
@khorshuheng khorshuheng changed the title (WIP) Support TFRecord as one of the output formats for historical feature retrieval Support TFRecord as one of the output formats for historical feature retrieval Jan 7, 2021
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
@feast-ci-bot
Copy link
Collaborator

feast-ci-bot commented Jan 7, 2021

@khorshuheng: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
test-end-to-end-azure 6d89310 link /test test-end-to-end-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@khorshuheng
Copy link
Collaborator Author

/test test-end-to-end

@pyalex pyalex merged commit c4df8c9 into feast-dev:master Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants