Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate feature reference and project functionality #631

Closed
woop opened this issue Apr 18, 2020 · 15 comments · Fixed by #693
Closed

Validate feature reference and project functionality #631

woop opened this issue Apr 18, 2020 · 15 comments · Fixed by #693
Assignees
Milestone

Comments

@woop
Copy link
Member

woop commented Apr 18, 2020

This card is a follow up on #479, and specifically this comment

The goal is to add tests and/or the functionality referenced in the above comment.

  • Feature references cannot contain a project when it is in string form. This is to prevent multiple projects from being selected in a single feature request (for retrieval).
  • Feature References can contain projects to prevent breaking changes to existing clients. An additional field feature_set_name is used to reference a feature when a single feature name is ambiguous.
  • Projects can only be defined once per incoming request. There should be a single key on each Feast client to set the project, since projects aren't defined in the feature reference any more.
  • SDKs to add a field project to allow users to override projects for all given feature references. Moving forward this would be the preferred way to set projects, not in feature references, although that is still accepted for backwards compatibility.
  • If a project isn't defined then the default project is used. The default project is just called, "default".
  • The default project is set on the server side, not the client side. Clients should not have to provide a project.
  • Feature names may be duplicated across feature sets in the same project. Currently they have to be unique in a project.
  • All tests need to be updated according to the above functionality.
    • Specifically we should make sure that we do not set or create a project by default in any test, unless we are testing project functionality specifically. Projects should be invisible if the user is just using the default project.
@woop woop added this to the v0.5.0 milestone Apr 18, 2020
@woop
Copy link
Member Author

woop commented Apr 18, 2020

Relevant #453

@mrzzy
Copy link
Collaborator

mrzzy commented May 4, 2020

Proposed changes to support this:

Protobuf

  1. ServingService.proto
  • FeatureReference - update documentation that the project field becomes is optional and defaults to default project: 'default'

Feast Core

  1. SpecService
  • listFeatureSet() - project field in ListFeatureSet.Filter becomes optional (default: 'default')
  • getFeatureSet() - project field in GetFeatureSetRequest becomes optional (default: 'default' project).
  • applyFeatureSet() - project field in FeatureSet becomes optional (default: 'default' project).
  1. AccessManagementService
  • constructor() - check for and create default project if does not exist.

DB changes no longer required. Fixed by #655

Feast Serving

  1. CachedSpecService
  • getFeatureSets() - Add ability to match project-less feature references to Feature Sets by polyfilling the default project default.

Feast Go SDK

  1. request.go - Document the projects are optional in feature references.
  • buildFeatures() - Change parsing of string feature reference to make project optional.
  1. client.go
  • GetOnlineFeatures() - Add param project to set project explicitly.

Feast Java SDK

  1. FeastClient
  • Update documentation that project part of feature references is optional.
  • getOnlineFeatures() - Change defaultProject param to project. project used to set project explicitly.
  1. RequestUtil
  • createFeatureRefs() - Change parsing of feature ref to make project optional.

Feast Python SDK

  1. client.py
  • Update documentation that project part of feature references is optional.
  • get_online_features() - Add param project to set project explicitly.
  • get_batch_features() - Change default_project param to project. project used to set project explicitly.
  • def _build_feature_references() - change parsing of string feature ref to make project optional.

In all SDKs, when both the project param and feature reference in string form specifies the project, the project specified by the project param will take precedence.

@woop
Copy link
Member Author

woop commented May 4, 2020

I have renumbered the tasks so that it's easier to address them individually.

  1. lgtm
  2. lgtm
  3. lgtm
  4. Not sure if you need to add a constraint. You might just have to remove a constraint. I'll let you figure this out once Split Field model into distinct Feature and Entity objects #655 gets merged.
  5. Is it possible to add the logic to KafkaRecordToFeatureRowDoFn instead?
  6. lgtm
  7. lgtm
  8. lgtm
  9. lgtm
  10. lgtm
  11. lgtm

Where is the default project configured (or hardcoded?)

Thanks for taking the effort @mrzzy. You've been very detailed.

@mrzzy
Copy link
Collaborator

mrzzy commented May 4, 2020

Is it possible to add the logic to KafkaRecordToFeatureRowDoFn instead?

I think this should be a separate transform as it has nothing to Kafka. An inline transform would add little boilerplate if that is a concern:

PCollection<FeatureRow> featureRows = featureRows.apply("ApplyDefaultProject", 
ParDo.of(new DoFn<FeatureRow, FeatureRow>() { 
  @ProcessElement
  public void processElement(ProcessContext context) {
     FeatureRow featureRow = context.element();
     // if project is unset, polyfill default project
     // ... polyfill.
  }
}

This would be added directly to to ReadFromSource.expand()

Where is the default project configured (or hardcoded?)

The default project is:

  • created in Feast Core's AccessManagementService's constructor.
  • used as default when project is not explicitly specified.

@woop
Copy link
Member Author

woop commented May 4, 2020

Ok, looks good to me.

@mrzzy
Copy link
Collaborator

mrzzy commented May 6, 2020

Additional changes are required that was not in the original list of proposed changes to the serving if we want to allow Serving to be able to serve features with names that can be duplicated across a project, but unique across a feature set.

Specifically if we want to support this in serving, we have to add a Feature Set name to Feature References in serving:

message FeatureReference {
    string project = 1;
    string feature_set_name = 5;
    string name = 2;
    int32 version = 3;
    google.protobuf.Duration max_age = 4;
}

We can continue clients that do not specify feature_set field by trying to match the feature to the first feature set that has the feature with that name. So adding this field would not cause backwards incompatibility.

If multiple feature sets match for a given feature name, we will just throw an exception.

String feature references would be accepted in the following formats:

  • project/feature_set:feature:version
  • project/feature_set:feature
  • project/feature:version
  • project/feature
  • feature_set/feature:version
  • feature_set/feature
  • feature:version
  • feature

Where a unspecified:

  • project implies the default project default
  • feature_set serving would try to find the FeatureSet with the feature. If there are multiple matching FeatureSet would be an Error.
  • version implies the latest (highest no.) version

In a post version world post #676, version would be accepted but ignored

@woop
Copy link
Member Author

woop commented May 6, 2020

LGTM.

@ches
Copy link
Member

ches commented May 7, 2020

Feature references cannot contain a project when it is in string form. This is to prevent multiple projects from being selected in a single feature request (for retrieval).

With that said, should the first several cases @mrzzy listed as accepted actually not be?

  • project/feature_set:feature:version
  • project/feature_set:feature
  • project/feature:version
  • project/feature

Is it planned for v0.5 to be a grace period of backward compatibility for migrating these, or it's something intended to swallow the breakage now?

The migration plan from v0.4 for retrieval in the breaking case would be for users to update their calls to not include project, and possibly qualify by feature set instead if necessary, before the Feast upgrade—does that cover it? Would it then be forward compatible for old SDK? Also I guess requests that use multiple projects would need to be changed to multiple requests.

Possibly a way to transition if desired could be to accept projects in the refs, return an error if more than one project is used in a request, then translate the requests internally into refs with their project removed and the one validated project set at the request level overriding the default project. New SDK could even do this client-side perhaps. That might enable marking the project field on FeatureReference with [deprecated = true] and making sure service-side is deprecation warning-free with perhaps only a @SuppressWarnings at the RPC edge where handling and translating requests from the old clients.

  • Projects can only be defined once per incoming request. There should be a single key on each Feast client to set the project, since projects aren't defined in the feature reference any more.

Should project be added to GetOnlineFeaturesRequest proto message? That one seems missing from @mrzzy's spec. Or alternatively would it be gRPC metadata, outside of the message payload?

@woop
Copy link
Member Author

woop commented May 8, 2020

With that said, should the first several cases @mrzzy listed as accepted actually not be?

I think they should be accepted for the time being at the Feast Serving layer. However, at the SDK level we want to remove that option.

That would allow old clients to continue to work with a new serving, but provides encouragement to move to a single project definition per incoming request. It could still temporarily translate into project/featureset:feature when sent over the wire for 0.5, but long term we can move that project ref out if they are only being defined as a single field.

The migration plan from v0.4 for retrieval in the breaking case would be for users to update their calls to not include project, and possibly qualify by feature set instead if necessary, before the Feast upgrade—does that cover it?

Yea, we are moving all of our production feature sets into a single default project, and asking users to not define the project in feature refs.

For teams that can't do this, the pathological case is selecting features in different projects with the same name. That would result in an exception. Leaning to softly supporting multiple projects in requests if they are explicitly defined, but I am also open to raising an exception in that case. Ideally we would move to a single project definition.

then translate the requests internally into refs with their project removed and the one validated project set at the request level overriding the default project.

Not 100% clear what you mean here. What is the default project in this case, the one set by the client or the server?

I do like the gist of what you are saying and I think its important that we capture all these decisions publicly.

@mrzzy
Copy link
Collaborator

mrzzy commented May 8, 2020

With that said, should the first several cases @mrzzy listed as accepted actually not be? Is it planned for v0.5 to be a grace period of backward compatibility for migrating these, or it's something intended to swallow the breakage now?

We will still allow project to be specified in feature references for backward compatibility with older SDKs/clients, as @woop has discussed.

Should project be added to GetOnlineFeaturesRequest proto message? That one seems missing from @mrzzy's spec. Or alternatively would it be gRPC metadata, outside of the message payload?

Leaning to softly supporting multiple projects in requests if they are explicitly defined, but I am also open to raising an exception in that case.

This makes sense to add so that we can centralize the effect of setting the project explicit in a separate field in GetOnlineFeaturesRequest/GetBatchFeaturesRequest protos. Right there is code duplication across SDKs to achieve this effect as it is implemented client/SDK side with a project param in each get online/batch features method.

My understanding of how this effect should work currently is to allow projects to still be specified in feature references as usual. However if the project is specified in a explicitly in a separate field, that takes precedence and overrides the projects set in feature references (if set).

Possibly a way to transition if desired could be to accept projects in the refs, return an error if more than one project is used in a request.

If we where to throw an error here, this would constitute a breaking change to old client/SDKs as there were no limitations on cross projects feature references before this.

That might enable marking the project field on FeatureReference with [deprecated = true] and making sure service-side is deprecation warning-free with perhaps only a @SuppressWarnings at the RPC edge where handling and translating requests from the old clients.

This make sense to setup deprecation notices on FeatureReference.project and signposts in the documentation urging users to transition to the use of explicit project field way of setting projects instead.

For teams that can't do this, the pathological case is selecting features in different projects with the same name. That would result in an exception.

This should not create any breaking changes as we are moving from a Project name spaced features in v0.4 which have project-wide unique names.

@ches
Copy link
Member

ches commented May 8, 2020

then translate the requests internally into refs with their project removed and the one validated project set at the request level overriding the default project.

Not 100% clear what you mean here. What is the default project in this case, the one set by the client or the server?

The server. I'll try to clarify: if an old client sends a retrieval request for these references:

  • projectA/featureA
  • projectA/featureB

Then, server side at the edge RPC controller layer (this could even be an interceptor):

  • Extract projectA treating it as the request-level project
  • Turn featureA and featureB into FeatureReference model instances, don't set their project field, that field can be deprecated in the proto and ideally removed entirely, now, in its domain model.

By "validated" I meant returning an error if the above case had two different projects in the request. This is TBD, it will break some old client calls but would allow more aggressively removing project from refs for internal implementation now.

New SDK could do this request translation client-side, raising an exception if there are > 1 projects, so it's likely to be caught in testing and QA, and could emit deprecation warnings in Java if the project field is deprecated in the proto. That would warn in cases where there is only one project (no exception), but projects are included in refs.

@ches
Copy link
Member

ches commented May 8, 2020

I don't have a dog in this fight since we don't have projects being used in retrieval requests, but I've been insistent before about not breaking Serving. This scenario is possibly different, because if I'm seeing it accurately, there is a migration path for old clients, but it requires human coordination to:

  1. Update requests to be forward-compatible, before Feast upgrade
  2. Ideally run your old code in a QA environment with new Feast, to verify the forward compatibility
  3. Upgrade Feast
  4. Eventually migrate to new SDK

I can't say whether that's practically feasible for everyone, or how much the tradeoff is worth in Feast maintenance complexity saved for code path and data model duplicity, if you maintain total backward compatibility for the release.

@woop
Copy link
Member Author

woop commented May 13, 2020

By "validated" I meant returning an error if the above case had two different projects in the request.

Yip, agree with this approach. Especially since it's an uncommon pattern for most teams afaik.

That would warn in cases where there is only one project (no exception), but projects are included in refs.

I like this.

I can't say whether that's practically feasible for everyone, or how much the tradeoff is worth in Feast maintenance complexity saved for code path and data model duplicity, if you maintain total backward compatibility for the release.

I'd optimize for moving faster since we still have a lot of ground to cover. The fact that there is an upgrade path here does mean that it's an easier pill to swallow, as long as its clearly documented.

@ches
Copy link
Member

ches commented May 13, 2020

The PR description edit became considerably harder to follow than #479 (comment) – that may be necessary to some extent if we've yielded more concession to backward compatibility than initially planned, but I think it'd help to break it down by milestones, keeping the more forceful specification tone, asserting deprecations and when they will be removed.

As a possible example, instead of:

  • Projects can only be defined once per incoming request. There should be a single key on each Feast client to set the project, since projects aren't defined in the feature reference any more.

  • SDKs to add a field project to allow users to override projects for all given feature references. Moving forward this would be the preferred way to set projects, not in feature references, although that is still accepted for backwards compatibility.

Maybe something like:

Feast v0.5

  • A project may be defined once for an incoming retrieval request, in a new request-level key. If set, feature references in the request must not include a project and will be assumed to refer to the one given project.

    Support for multiple projects in a retrieval request is deprecated as of this release, and clients should move to the new method.

Feast v0.6

  • Feature references in string form must not include a project.
  • A project must be defined once per incoming retrieval request. Feature references must no longer contain projects, therefore multiple projects in a request are no longer supported in this release as they were deprecated in v0.5.

I'm not sure whether @woop's original "there should be a single key on each Feast client to set the project" meant it applies to the whole client instance for its lifecycle, or per-request, so my "request-level" part might be wrong.

Not sure if it'd actually be v0.6 either since it's meant to come close on the heels of v0.5, but just illustration.

@woop
Copy link
Member Author

woop commented May 18, 2020

I'm not sure whether @woop's original "there should be a single key on each Feast client to set the project" meant it applies to the whole client instance for its lifecycle, or per-request, so my "request-level" part might be wrong.

From the user's perspective. I am happy with our current approach of having the project within the feature reference temporarily. So this should maybe be clarified.

Not sure if it'd actually be v0.6 either since it's meant to come close on the heels of v0.5, but just illustration.

I'd like to sneak it into 0.6, but it wouldn't be a train smash if it got held up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants