Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Instrument Feast using Prometheus and OpenTelemetry #4300

Closed
wants to merge 52 commits into from

Conversation

tsisodia10
Copy link
Contributor

What this PR does / why we need it:

This draft pull request introduces monitoring functionality for the Feast feature store(python server). It instruments the feature store using Prometheus and OpenTelemetry to capture system-level metrics.

@tsisodia10 tsisodia10 changed the title Instrument Feast using Prometheus and OpenTelemetry feat: Instrument Feast using Prometheus and OpenTelemetry Jun 20, 2024
Copy link
Contributor

@tchughesiv tchughesiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be a good idea to make this optional by adding something like a --metrics flag to the CLI and a similar param to the helm chart as well. imo, this should not be started / configured unless the user chooses to enable it.

@tsisodia10 tsisodia10 changed the base branch from v0.38-branch to master June 21, 2024 20:02
infra/charts/feast-feature-server/templates/rbac.yml Outdated Show resolved Hide resolved
sdk/python/feast/feature_store.py Outdated Show resolved Hide resolved
@@ -0,0 +1,34 @@
FROM python:3.11-slim
Copy link
Contributor

@redhatHameed redhatHameed Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why needed new Dockerfile why not using existing one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 , do we need a separate image?

Copy link
Contributor Author

@tsisodia10 tsisodia10 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on merging this Dockerfile with the existing ones, the downside I am facing is that the new image build is not picking the updated code with the existing Dockerfiles.

@tsisodia10
Copy link
Contributor Author

it would be a good idea to make this optional by adding something like a --metrics flag to the CLI and a similar param to the helm chart as well. imo, this should not be started / configured unless the user chooses to enable it.

@tchughesiv makes sense, I will modify it accordingly.

franciscojavierarceo and others added 8 commits June 26, 2024 10:33
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
@tsisodia10
Copy link
Contributor Author

it would be a good idea to make this optional by adding something like a --metrics flag to the CLI and a similar param to the helm chart as well. imo, this should not be started / configured unless the user chooses to enable it.

Done, PTAL!

Copy link
Contributor

@tchughesiv tchughesiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting there...
now, a similar bool metrics param should be added to the feast-feature-server helm chart. this param should be used to enable all metrics related configs when set. it should default to false. be sure to run make build-helm-docs afterward.

sdk/python/feast/cli.py Outdated Show resolved Hide resolved
@@ -52,7 +57,8 @@ spec:
- "serve"
- "-h"
- "0.0.0.0"
{{- end }}
{{- end }}
command: ["feast", "serve", "-h", "0.0.0.0"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should likely turn into something like -

{{- if .Values.metrics }}
command: ["feast", "serve", "--metrics", "-h", "0.0.0.0"]
{{- else }}
command: ["feast", "serve", "-h", "0.0.0.0"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please take a look!

Copy link
Contributor Author

@tsisodia10 tsisodia10 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchughesiv I think we would not need the check here, the --metric flag is working perfectly fine without it being added in the deployment. The user can pass the bool value during helm deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsisodia10 if the --metrics flag is NOT specified, prom and telemetry should NOT be enabled. Metrics should only enabled if the flag is used. Based on this understanding, we would need this helm check. Is this not currently how the cli flag is functioning?

Copy link
Contributor

@tchughesiv tchughesiv Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see it currently being passed anywhere in the helm chart's deployment.yaml... how it is currently being enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmartinol related thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange to have the metrics flag work even if it's not present in the deployment manifest. On enabling it during feast deployment, the monitoring manifests gets deployed and vice versa.
Will add the check in the deployment file then.

infra/charts/feast-feature-server/values.yaml Outdated Show resolved Hide resolved
infra/charts/feast-feature-server/values.yaml Outdated Show resolved Hide resolved
infra/charts/feast-feature-server/values.yaml Outdated Show resolved Hide resolved
infra/charts/feast-feature-server/values.yaml Outdated Show resolved Hide resolved
@@ -0,0 +1,34 @@
FROM python:3.11-slim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 , do we need a separate image?

@tokoko
Copy link
Collaborator

tokoko commented Jun 26, 2024

@tsisodia10 prometheus-client should be added to setup.py explicitly under an extra, I think (maybe metrics or prometheus). (Right now it's part of our ci dependencies only under ge extra "thanks to" a whole mess of unnecessary dependencies great-expectations library likes to drag along wherever it goes)

Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
sdk/python/feast/cli.py Outdated Show resolved Hide resolved
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
Copy link
Contributor

@tchughesiv tchughesiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsisodia10 it might be a good idea to add a couple tests... maybe in sdk/python/tests/integration/online_store? We should make sure that the metrics components are only starting when the --metrics flag is present.

@tokoko
Copy link
Collaborator

tokoko commented Jul 16, 2024

@tsisodia10 Can you take care of a DCO check? I think the easiest way is to rebase all commits and force push a single signed commit (git rebase HEAD~49 --signoff and git push --force-with-lease origin inst-stable).

Also please run make lint-python and make format-python before pushing the changes.

# - "--metrics"
# - "-h"
# - "0.0.0.0"
# {{- else }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this block of configs be uncommented and enabled only when metrics are configured in the chart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these chart changes should all be fully vetted in a cluster to ensure it continues to work as expected with or without metrics configured.

@@ -14,6 +14,7 @@ spec:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
# instrumentation.opentelemetry.io/inject-python: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to stay? a check added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think good to have for reference rather than lost in the README. If the user wants to enable monitoring, he just needs to uncomment this statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure! i was thinking it made more sense though to incorporate it into the metrics param chart checks. e.g.

{{- if .Values.metrics }}

tsisodia10 and others added 19 commits July 23, 2024 17:52
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
remove providers

Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
…v#4303)

* Makefile: Formatting

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Makefile: Exclude Snowflake tests for postgres offline store tests

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Bootstrap: Use conninfo

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Tests: Make connection string compatible with psycopg3

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Tests: Test connection type pool and singleton

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Global: Replace conn.set_session() calls to be psycopg3 compatible

Set connection read only

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Offline: Use psycopg3

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Online: Use psycopg3

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Online: Restructure online_write_batch

Addition

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Online: Use correct placeholder

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Online: Handle bytes properly in online_read()

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Online: Whitespace

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Online: Open ConnectionPool

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Online: Add typehint

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Utils: Use psycopg3

Use new ConnectionPool

Pass kwargs as named argument

Use executemany over execute_values

Remove not-required open argument in psycopg.connect

Improve

Use SpooledTemporaryFile

Use max_size and add docstring

Properly write with StringIO

Utils: Use SpooledTemporaryFile over StringIO object

Add replace

Fix df_to_postgres_table

Remove import

Utils

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Lint: Raise exceptions if cursor returned no columns or rows

Add log statement

Lint: Fix _to_arrow_internal

Lint: Fix _get_entity_df_event_timestamp_range

Update exception

Use ZeroColumnQueryResult

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Add comment on +psycopg string

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Docs: Remove mention of psycopg2

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Lint: Fix

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Default to postgresql+psycopg and log warning

Update warning

Fix

Format warning

Add typehints

Use better variable name

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

* Solve merge conflicts

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>

---------

Signed-off-by: Job Almekinders <job.almekinders@teampicnic.com>
* move get_online_features to OnlineStore interface

Signed-off-by: tokoko <togurg14@freeuni.edu.ge>

* fix pydantic warnings

Signed-off-by: tokoko <togurg14@freeuni.edu.ge>

* run ruff format

Signed-off-by: tokoko <togurg14@freeuni.edu.ge>

---------

Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
adding try and except block for import

Co-authored-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Shuchu Han <shuchu.han@gmail.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Co-authored-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
…batch and fix progress bar (feast-dev#4331)

* Remove batching and fix tqdm progress bar

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

* Comment

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

* Remove test changes

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

* Update comment

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

---------

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>
…v#4327)

* Add async retrieval for postgres

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

* Format

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

* Update _prepare_keys method

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

* Fix typo

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>

---------

Signed-off-by: TomSteenbergen <tomsteenbergen1995@gmail.com>
* Add sql registry async refresh

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* make refresh code a daemon thread

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Change RegistryConfig to cacheMode

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Only run async when ttl > 0

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* make refresh async run in a loop

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* make refresh async run in a loop

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Reorder async refresh call

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Add documentation

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Update test_universal_registry.py

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Force rerun of tests

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Force rerun of tests

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

* Format repo config file

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>

---------

Signed-off-by: Stanley Opara <a-sopara@expediagroup.com>
Co-authored-by: Stanley Opara <a-sopara@expediagroup.com>
* Update maintainers.md

* Update maintainers.md

* Update maintainers.md
upgrade ibis version

Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
* update dask version to support pandas 1.x

Signed-off-by: cmuhao <sduxuhao@gmail.com>

* update dask version to support pandas 1.x

Signed-off-by: cmuhao <sduxuhao@gmail.com>

* update dask version to support pandas 1.x

Signed-off-by: cmuhao <sduxuhao@gmail.com>

---------

Signed-off-by: cmuhao <sduxuhao@gmail.com>
Fix OnDemandFeatureView type inference for array types

Signed-off-by: Alex Mirrington <alex.mirrington@rokt.com>
…east-dev#4337)

Bumps [zipp](https://github.com/jaraco/zipp) from 3.18.1 to 3.19.1.
- [Release notes](https://github.com/jaraco/zipp/releases)
- [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst)
- [Commits](jaraco/zipp@v3.18.1...v3.19.1)

---
updated-dependencies:
- dependency-name: zipp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
@tsisodia10
Copy link
Contributor Author

I encountered issues while rebasing and squashing commits from the master branch, resulting in an incorrect commit history for this PR. To resolve this, I will close this PR and open a new one after manually selecting and adding the necessary changes. This process might take some time, so I appreciate your understanding and patience during this period. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet