-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Redundant feature materialization and premature incremental materialization timestamp updates #3789
Conversation
great, I have been working on the similar PR, but you did it first and way better 💯 |
sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_engine.py
Show resolved
Hide resolved
@james-crabtree-sp Thanks again for putting this PR together. I think in order to merge, you'll need to sign your commits: https://docs.feast.dev/project/development-guide#signing-off-commits |
sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_engine.py
Show resolved
Hide resolved
I think they are all signed off already. I see that the DCO task that checks this completed successfully https://github.com/feast-dev/feast/pull/3789/checks?check_run_id=17541354092 |
Sorry for the confusion, I was referring to the requirements to merge, and linked to the wrong documentation. This is the requirement that I was referring to: Merging is blocked |
hey @james-crabtree-sp this is a great PR, you need any hands to make it merged? I see there's a lot of commits in the past unverified if you don't mind I can create another PR to mimic this one |
0631425
to
baf108c
Compare
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
* SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
baf108c
to
fabd417
Compare
@whoahbot @sudohainguyen Ahh, I understand now. Commits are now verified |
exactly 🤣 looks good now but you need to run code format though |
fabd417
to
c35b0c1
Compare
Noticed that too. Weird rebase issue. Should be fixed now |
…error Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
9920df7
to
938b903
Compare
@james-crabtree-sp I've been adopting your changes on my side so far do we need to wait the job to be done, like forever without timeout? |
Well, this PR adds the configurable active_deadline_seconds setting, which can be used to impose a timeout on the kubernetes jobs. Does that answer your question? |
cool clear |
# [0.35.0](v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([#3812](#3812)) ([9583ed6](9583ed6)) * Adopt connection pooling for HBase ([#3793](#3793)) ([b3852bf](b3852bf)) * Bytewax engine create configmap from object ([#3821](#3821)) ([25e9775](25e9775)) * Fix warnings from deprecated paths and update default log level ([#3757](#3757)) ([68a8737](68a8737)) * improve parsing bytewax job status ([5983f40](5983f40)) * make bytewax settings unexposed ([ae1bb8b](ae1bb8b)) * Make generated temp table name escaped ([#3797](#3797)) ([175d796](175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([#3789](#3789)) ([417b16b](417b16b)), closes [#6](#6) [#7](#7) * Resolve hbase hotspot issue when materializing ([#3790](#3790)) ([7376db8](7376db8)) * Set keepalives_idle None by default ([#3756](#3756)) ([8717e9b](8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](2151c39)) * UI project cannot handle fallback routes ([#3766](#3766)) ([96ece0f](96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](5dc0b24)) * Update jackson and remove unnecessary logging ([#3809](#3809)) ([018d0ea](018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](052182b)) ### Features * Add get online feature rpc to gprc server ([#3815](#3815)) ([01db8cc](01db8cc)) * Add materialize and materialize-incremental rest endpoints ([#3761](#3761)) ([fa600fe](fa600fe)), closes [#3760](#3760) * add redis sentinel support ([3387a15](3387a15)) * add redis sentinel support ([4337c89](4337c89)) * add redis sentinel support format lint ([aad8718](aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([#3762](#3762)) ([6a728fe](6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([#3754](#3754)) ([2192e65](2192e65)) * Apply cache to load proto registry for performance ([#3702](#3702)) ([709c709](709c709)) * Make bytewax job write as mini-batches ([#3777](#3777)) ([9b0e5ce](9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](9cf9d96)) * Support GCS filesystem for bytewax engine ([#3774](#3774)) ([fb6b807](fb6b807))
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
…rialization timestamp updates (feast-dev#3789) * SAASMLOPS-767 wait for jobs to complete Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-805 Stopgap change to fix duplicate materialization of data Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-805 save BYTEWAX_REPLICAS=1 Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-809 fix bytewax workers so they only process a single file (feast-dev#6) * SAASMLOPS-809 fix bytewax workers so they only process a single file * SAASMLOPS-809 fix newlines Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-833 add configurable job timeout (feast-dev#7) * SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * develop Run large materializations in batches of pods Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master Set job_batch_size at least equal to max_parallelism Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master clarity max_parallelism description Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master resolve bug that causes materialization to continue after job error Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master resolve bug causing pod logs to not be printed Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> --------- Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> Signed-off-by: Attila Toth <hello@attilatoth.dev>
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: Attila Toth <hello@attilatoth.dev>
What this PR does / why we need it:
This PR fixes 2 issues encountered while attempting to run very large materializations (Several billion rows as part of an initial run of online materialization).
Which issue(s) this PR fixes:
Fixes #3786
Fixes #3787
Fixes #3788