Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a Kubernetes Operator for the Feast Feature Server #4145

Merged
merged 3 commits into from
May 1, 2024

Conversation

tchughesiv
Copy link
Contributor

@tchughesiv tchughesiv commented Apr 24, 2024

What this PR does / why we need it:

This PR wraps the existing feast-feature-server helm chart into a simple helm-based operator.

We should leverage the power of kubernetes controllers to manage the deployment state of the feast-feature-server. This is more operationally sound, as it can “repair” issues with deployments in a cluster. It also simplifies the management of multiple deployments within a cluster and opens the door to integrations with other operator-based projects.

This will require the creation of a new image repo @ docker.io/feastdev/feast-operator. Builds are automated, same as the existing feature server images.

While this solution will leverage a few kustomize manifests to deploy the operator itself, these manifests will require little to no maintenance. They should not change often (if at all) as they are not impacted by changes to the helm chart, nor do they impact how the feature server is deployed.

By using a symlink to the feast-feature-server helm directory, future operator images will always have the latest chart version enhancements.

Below is a sample FeastFeatureServer Custom Resource. As you can see, the fields in the FeastFeatureServer.spec directly match existing helm chart parameters -

apiVersion: charts.feast.dev/v1alpha1
kind: FeastFeatureServer
metadata:
  name: feastfeatureserver-sample
spec:
  # Default values copied from helm-charts/feast-feature-server/values.yaml
  affinity: {}
  # base64 encoding of `sdk/python/feast/templates/local/feature_repo/feature_store.yaml`
  feature_store_yaml_base64: "cHJvamVjdDogbXlfcHJvamVjdAojIEJ5IGRlZmF1bHQsIHRoZSByZWdpc3RyeSBpcyBhIGZpbGUgKGJ1dCBjYW4gYmUgdHVybmVkIGludG8gYSBtb3JlIHNjYWxhYmxlIFNRTC1iYWNrZWQgcmVnaXN0cnkpCnJlZ2lzdHJ5OiBkYXRhL3JlZ2lzdHJ5LmRiCiMgVGhlIHByb3ZpZGVyIHByaW1hcmlseSBzcGVjaWZpZXMgZGVmYXVsdCBvZmZsaW5lIC8gb25saW5lIHN0b3JlcyAmIHN0b3JpbmcgdGhlIHJlZ2lzdHJ5IGluIGEgZ2l2ZW4gY2xvdWQKcHJvdmlkZXI6IGxvY2FsCm9ubGluZV9zdG9yZToKICAgIHR5cGU6IHNxbGl0ZQogICAgcGF0aDogZGF0YS9vbmxpbmVfc3RvcmUuZGIKZW50aXR5X2tleV9zZXJpYWxpemF0aW9uX3ZlcnNpb246IDIK"
  fullnameOverride: ""
  image: {}
  imagePullSecrets: []
  livenessProbe:
    initialDelaySeconds: 30
    periodSeconds: 30
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podSecurityContext: {}
  readinessProbe:
    initialDelaySeconds: 20
    periodSeconds: 10
  replicaCount: 1
  resources: {}
  securityContext: {}
  service:
    port: 80
    type: ClusterIP
  tolerations: []

NOTE: This operator could easily be expanded in the future to also handle the release of the feast-release helm chart

Which issue(s) this PR fixes:

Fixes #4144

@tchughesiv tchughesiv changed the title feat: Add a Kubernetes Operator for the Feast Feature Server wip: feat: Add a Kubernetes Operator for the Feast Feature Server Apr 24, 2024
@tchughesiv tchughesiv changed the title wip: feat: Add a Kubernetes Operator for the Feast Feature Server feat: Add a Kubernetes Operator for the Feast Feature Server Apr 24, 2024
# Default values copied from helm-charts/feast-feature-server/values.yaml
affinity: {}
# base64 encoding of `sdk/python/feast/templates/local/feature_repo/feature_store.yaml`
feature_store_yaml_base64: "cHJvamVjdDogbXlfcHJvamVjdAojIEJ5IGRlZmF1bHQsIHRoZSByZWdpc3RyeSBpcyBhIGZpbGUgKGJ1dCBjYW4gYmUgdHVybmVkIGludG8gYSBtb3JlIHNjYWxhYmxlIFNRTC1iYWNrZWQgcmVnaXN0cnkpCnJlZ2lzdHJ5OiBkYXRhL3JlZ2lzdHJ5LmRiCiMgVGhlIHByb3ZpZGVyIHByaW1hcmlseSBzcGVjaWZpZXMgZGVmYXVsdCBvZmZsaW5lIC8gb25saW5lIHN0b3JlcyAmIHN0b3JpbmcgdGhlIHJlZ2lzdHJ5IGluIGEgZ2l2ZW4gY2xvdWQKcHJvdmlkZXI6IGxvY2FsCm9ubGluZV9zdG9yZToKICAgIHR5cGU6IHNxbGl0ZQogICAgcGF0aDogZGF0YS9vbmxpbmVfc3RvcmUuZGIKZW50aXR5X2tleV9zZXJpYWxpemF0aW9uX3ZlcnNpb246IDIK"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we document somewhere what's the actual feature_store.yaml that this is representing? (and maybe in the future we can also add a more complex sample with integrated psql DB deployment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's currently documented in the comment above field... that said, i can certainly add something additional to the getting started docs i'll be adding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, to keep maintenance low... i'd like to lean heavily on existing feast docs and samples. there's no need/reason for us to duplicate anything. for example -
https://github.com/feast-dev/feast/blob/master/sdk/python/feast/templates/postgres/feature_repo/feature_store.yaml
https://docs.feast.dev/project/development-guide#feast-python-feature-server-helm-chart

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added some additional docs, including how one might pass in their own feature_store.yaml during a deployment.

Signed-off-by: Tommy Hughes <tohughes@redhat.com>
Signed-off-by: Tommy Hughes <tohughes@redhat.com>
Signed-off-by: Tommy Hughes <tohughes@redhat.com>
@jeremyary
Copy link
Collaborator

@tokoko @shuchu @HaoXuAI @sudohainguyen hey all, may I kindly ask for your considerations of the effort here? We'd love to hear any feedback or questions/convo around technical and/or objective aspects from some of our heavier contributors, so if you're able, we'd really appreciate it!

@tokoko
Copy link
Collaborator

tokoko commented Apr 30, 2024

This is more operationally sound, as it can “repair” issues with deployments in a cluster. It also simplifies the management of multiple deployments within a cluster and opens the door to integrations with other operator-based projects.

I'll be completely honest, this is my first time hearing of a helm-based operator and I'm not sure I understand the point. 😄What's the difference with normal helm deployment in any practical sense other than the fact that parameters are used to populate a CRD instead of helm values file? Previously I thought (maybe just my limited experience) that controllers came in handy when you needed some very custom behavior in cluster or maybe even orchestrating stuff outside the cluster. So my questions right now are 1) Is something like that even possible with helm-based operators? and 2) do we have any use case for it (actual or expected)? java feature server deployment doesn't look that unusual to me. I'm probably unaware of some other benefits, so some more context there will be appreciated...

Copy link
Collaborator

@HaoXuAI HaoXuAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I personally think the feast deployment process is not devops friendly. Hope this feature streamlines the process better

@tokoko
Copy link
Collaborator

tokoko commented Apr 30, 2024

Another slightly unrelated point to note here. I think feast has bit of an identity crisis when it comes to deployments like this. For example we have lambda feature server which has it's own registry object and is actually deployed when user runs feast apply (not a fan of this myself). At the same time we have ways to start servers either with cli or deploy with helm. I think we should try to make all this less confusing. Personally I think deploying anything anywhere with a feast apply needs to be killed off.

@franciscojavierarceo
Copy link
Member

Another slightly unrelated point to note here. I think feast has bit of an identity crisis when it comes to deployments like this. For example we have lambda feature server which has it's own registry object and is actually deployed when user runs feast apply (not a fan of this myself). At the same time we have ways to start servers either with cli or deploy with helm. I think we should try to make all this less confusing. Personally I think deploying anything anywhere with a feast apply needs to be killed off.

I think we need to be able to allow users to pick and choose based on tradeoffs. Being able to deploy feature changes via feast apply is a choice that has been very successful for some and not within the risk tolerance of others. So I don't think we should be overly prescriptive on that.

@tokoko
Copy link
Collaborator

tokoko commented Apr 30, 2024

@franciscojavierarceo sorry, let me clarify. I wasn't referring to feature changes, I meant deployments like aws lambda feature server.

@tchughesiv
Copy link
Contributor Author

tchughesiv commented Apr 30, 2024

@tokoko let me know if you have additional Qs

This is more operationally sound, as it can “repair” issues with deployments in a cluster. It also simplifies the management of multiple deployments within a cluster and opens the door to integrations with other operator-based projects.

I'll be completely honest, this is my first time hearing of a helm-based operator and I'm not sure I understand the point. 😄What's the difference with normal helm deployment in any practical sense other than the fact that parameters are used to populate a CRD instead of helm values file?

  • The Operator’s controller will automatically recover missing objects in the cluster, and fix misconfigured ones. For example, if a feature server’s k8s Service or Deployment were accidentally deleted, the operator will recreate it. In critical environments, this is invaluable.
  • Once an admin installs the operator to a cluster, all users can take advantage using native k8s tools to deploy their own feature servers, with the use of the new FeastFeatureServer CR. No helm knowledge or installation is required. This will be identical to how a user might create a Deployment object in their own namespace.
  • Changes made to a FeastFeatureServer CR in-cluster will automatically be reflected in the feature server deployment. Nothing else has to be done.

Previously I thought (maybe just my limited experience) that controllers came in handy when you needed some very custom behavior in cluster or maybe even orchestrating stuff outside the cluster. So my questions right now are 1) Is something like that even possible with helm-based operators?

  • Helm-based operators are limited in that respect. We would have to use a Go-based operator instead. But, as I stated in the related issue (Add a Kubernetes Operator for the Feast Feature Server, to the infra directory #4144), a go-based operator would add significant complexity to the development and maintenance of feast. Given that feast and its users already utilize, and prefer, helm… this seemed unnecessary. My intention with this approach was to continue to funnel future deployment contributions to the helm charts… rather than other deployment methods. Any future changes made to the helm charts will be automatically reflected in this helm-based operator.
  • That said, do we think the community would prefer a Go-based operator instead? They can be a lot to take on and maintain, but i have significant experience with them.

and 2) do we have any use case for it (actual or expected)?

  • IMO, aside from the controller benefits stated above, it's a great way to potentially consolidate all of the helm charts into a single, optional deployment mechanism. In the future, should the community so choose, all helm chart installs could be handled by this same feast-operator. Each would have its own CRD. e.g. FeastJavaFeatureServer or FeastTransformationService, etc.
  • Also, should the community choose to, in the future this feast-operator could be added to https://operatorhub.io/ for greater visibility to potential k8s users… or to OpenShift’s internal OperatorHub, which is then made available to all OpenShift & OKD cluster users.
  • More immediately, we’d be able to easily integrate the feast-feature-server into the Open Data Hub (ODH) Operator.

java feature server deployment doesn't look that unusual to me. I'm probably unaware of some other benefits, so some more context there will be appreciated...

@sudohainguyen
Copy link
Collaborator

totally support this, since there's a lot of room to consolidate deployment for feast
I would prefer going with the helm-based operator first due to its simplification 🙌

@tokoko
Copy link
Collaborator

tokoko commented May 1, 2024

@tchughesiv thanks for the lengthy response. I do see the benefits and since people will also be able to use normal helm charts as well, I don't see why we shouldn't do this. as for the go-based approach, I agree with Harry, right now helm is enough. If a suitable use case arises for a go-based one, we can revisit this later.

@tchughesiv
Copy link
Contributor Author

tchughesiv commented May 1, 2024

thanks @tokoko @sudohainguyen @HaoXuAI @franciscojavierarceo
if you get a minute, there's a related PR #4161 that affects the full functionality of this operator. very simple helm fix.

@jeremyary jeremyary merged commit 4a696dc into feast-dev:master May 1, 2024
15 checks passed
tmihalac pushed a commit to tmihalac/feast that referenced this pull request May 3, 2024
…ev#4145)

* add a kubernetes operator for the feast-feature-server

Signed-off-by: Tommy Hughes <tohughes@redhat.com>

* operator makefile fixes

Signed-off-by: Tommy Hughes <tohughes@redhat.com>

* addt'l operator documentation

Signed-off-by: Tommy Hughes <tohughes@redhat.com>

---------

Signed-off-by: Tommy Hughes <tohughes@redhat.com>
tmihalac pushed a commit to tmihalac/feast that referenced this pull request May 3, 2024
…ev#4145)

* add a kubernetes operator for the feast-feature-server

Signed-off-by: Tommy Hughes <tohughes@redhat.com>

* operator makefile fixes

Signed-off-by: Tommy Hughes <tohughes@redhat.com>

* addt'l operator documentation

Signed-off-by: Tommy Hughes <tohughes@redhat.com>

---------

Signed-off-by: Tommy Hughes <tohughes@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
franciscojavierarceo pushed a commit that referenced this pull request May 24, 2024
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24)

### Bug Fixes

* Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6))
* Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff))
* Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084)
* Changed the code the way mysql container is initialized.  ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126)
* Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594))
* Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669)
* Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde))
* Get rid of empty string `name_alias` during feature view projection deserialization  ([#4116](#4116)) ([65056ce](65056ce))
* Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164))
* Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0))
* Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae))
* Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7))
* Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7))
* Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb))
* Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7))
* Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4))
* Update doc ([#4153](#4153)) ([e873636](e873636))
* Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3))
* Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96))
* Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11))

### Features

* Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc))
* Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0))
* Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98))
* Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419))
* Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48))
* Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745))
* Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5))
* Adding support for dictionary writes to online store  ([#4156](#4156)) ([abfac01](abfac01))
* Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640))
* Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef))
* Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562))
* Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621))
* Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9))
* Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4))
* Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150))
* Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f))
* Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea))
* Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716))
* Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d))
* Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114)

### Reverts

* Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
franciscojavierarceo pushed a commit that referenced this pull request May 27, 2024
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24)

### Bug Fixes

* Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6))
* Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff))
* Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084)
* Changed the code the way mysql container is initialized.  ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126)
* Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594))
* Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669)
* Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde))
* Get rid of empty string `name_alias` during feature view projection deserialization  ([#4116](#4116)) ([65056ce](65056ce))
* Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164))
* Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0))
* Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae))
* Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7))
* Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7))
* Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb))
* Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7))
* Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4))
* Update doc ([#4153](#4153)) ([e873636](e873636))
* Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3))
* Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96))
* Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11))

### Features

* Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc))
* Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0))
* Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98))
* Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419))
* Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48))
* Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745))
* Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5))
* Adding support for dictionary writes to online store  ([#4156](#4156)) ([abfac01](abfac01))
* Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640))
* Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef))
* Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562))
* Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621))
* Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9))
* Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4))
* Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150))
* Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f))
* Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea))
* Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716))
* Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d))
* Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114)

### Reverts

* Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a Kubernetes Operator for the Feast Feature Server, to the infra directory
7 participants