Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-port Support for Cassandra [WIP] #475

Closed

Conversation

Wirick
Copy link
Contributor

@Wirick Wirick commented Feb 13, 2020

I'm still having trouble getting the serving unit tests to pass, I think because of something that I'm not mocking, but I'm not sure. Wanted to share the work if people are interested, like @ches. I will be cleaning this up further, but I have a bit of a queue, so please be patient. This logic however seems to work well in our staging cluster right now. But want to discuss best practices and optimization

This pr ports the cassandra logic forward to be compatible with feast>=.4
Fixes #335

Allows for configuration of cassandra as an online serving datastore and as a job service.

woop and others added 30 commits January 7, 2020 13:55
…#385)

Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com>
Rather than the Maven protobuf plugin running on the same symlinked
definitions in several Java modules, localize this process into one
module that the others depend on.

This provides a single module that can be depended on by third-party
extensions with the bare minimum of dependencies.

Also removes proto files that are no longer used.
* Add PR template

* Remove line about not using fixes

* Add notes about CLA and release notes
* Initial commit of changelog up to 0.4.3

* Remove unreleased changes on master

* Add missing changelog manually

Co-authored-by: Khor Shu Heng <32997938+khorshuheng@users.noreply.github.com>
This forward-ports a straggling commit from feast-dev#407: it was missed when
initially creating the datatypes module because Sonatype publishing
setup was added concurrently.
* Add documentation for bigquery batch retrieval

* Fix formatting for multiline comments
Allow log level to be set via environmental variable.
Add ability to set appender type in serving.
Remove logback-classic from ingestion as it is a library so should not bring its own impl.
Upgrade log4j to 2.12.1 to support objectMessageAsJsonObject.
Fix logger config targeting feast package in serving an add same concept for core.
* Add presence, shape and domain fields to EntitySpec and FeatureSpec
These are copied from definition in Schema.proto in tensorflow data validation

* Add presence,shape and domain fields to Field class
* Add retry options to BigQuery

Add two fields to job properties and use them to set retry options in all BigQuery job waitFor calls, as per Google example.

* Change timeout defaults and add config to e2e test to fix failure.
* Change job service connection to use a connection pool

* Close connection

* Remove isConnected condition

* Add simple test to test that pool recovers broken connections

* Catch JedisExceptions separately

* Catch only JedisConnectionException
* Allow users not to set max age for batch retrieval

* Fix typo in test assertion
* Update Changelog

* Remove closed issues
* Deduplicate example notebooks

* Merge docker-compose.yml for both batch and online serving.
* Update README.md and remove versions from Helm Charts

* Fix type in README.md
woop and others added 11 commits March 9, 2020 13:54
* Encode feature row before storing in Redis

* Include encoding as part of RedisMutationDoFn

Co-authored-by: Khor Shu Heng <khor.heng@gojek.com>
Please see this issue for more details feast-dev#527
…east-dev#534)

This seems to make the test pass more deterministically
If this value is higher than the one used for sending output (50ms) some messages may be lost
leading to failed test
)

* Update base Docker image for building Feast Serving image
- Add clean phase before packaging for more deterministic build (in case host directory is dirty)
- Move the downloading of grpc-health-probe in Feast Serving to build stage so the production stage does not need extra tools like wget, for slimmer production image.

* Update base Docker image for Feast Serving in Dockerfile.dev
* create cassandra store for registration and ingestion

  * Downgraded Guava to 25
    * Beam 2.16 uses Cassandra 3.4.0 (So we cannot use Cassandra 4.x which shades Guava)
    * Cassandra 3.4.0 uses Guava version 16.0 but has a compatibility check to use a different class when we use version > 19.0.
    * Guava version 26 (version previously used) has breaking change to method used in compatibility check in Cassandra's dependency, hence version 25
  * Using Cassandra's internal field 'writetime' to handle out of order arrivals. When older records where the primary key already exist in Cassandra are ingested, they are set as tombstones in Cassandra and ignored on retrieval.
    * Aware that this way of handling out of order arrival is specific to Cassandra, but until we have a general way to handle out of order arrivals we need to do it this way
  * Cassandra's object mapper requires stating table's name along with @table annotation
    * table_name is still part of CassandraConfig for use in serving module
    * if user registers CassandraConfig with a different table name other than "feature_store", this will throw an exception

* add cassandra serving service

  * Abstracted OnlineServingService for common implementation of online serving stores
  * Complete tests remain in RedisServingServiceTest while Cassandra tests only contain basic tests for writes, and some other implementation specific to Cassandra

* update documentation to reflect current API and add cassandra store to docs

* add default expiration to cassandra config for when featureset does not have max age

* docs update, spotless check, and bug fix on cassandra schema
* create cassandra store for registration and ingestion

  * Downgraded Guava to 25
    * Beam 2.16 uses Cassandra 3.4.0 (So we cannot use Cassandra 4.x which shades Guava)
    * Cassandra 3.4.0 uses Guava version 16.0 but has a compatibility check to use a different class when we use version > 19.0.
    * Guava version 26 (version previously used) has breaking change to method used in compatibility check in Cassandra's dependency, hence version 25
  * Using Cassandra's internal field 'writetime' to handle out of order arrivals. When older records where the primary key already exist in Cassandra are ingested, they are set as tombstones in Cassandra and ignored on retrieval.
    * Aware that this way of handling out of order arrival is specific to Cassandra, but until we have a general way to handle out of order arrivals we need to do it this way
  * Cassandra's object mapper requires stating table's name along with @table annotation
    * table_name is still part of CassandraConfig for use in serving module
    * if user registers CassandraConfig with a different table name other than "feature_store", this will throw an exception

* add cassandra serving service

  * Abstracted OnlineServingService for common implementation of online serving stores
  * Complete tests remain in RedisServingServiceTest while Cassandra tests only contain basic tests for writes, and some other implementation specific to Cassandra

* update documentation to reflect current API and add cassandra store to docs

* add default expiration to cassandra config for when featureset does not have max age

* docs update, spotless check, and bug fix on cassandra schema
Pr Comments and using ttl for cassandra job service

1. Still unresolved is why the CassandraServingServiceITTest is failing,
I'm thinking its because I'm not mocking the feature request correctly
@Wirick Wirick force-pushed the crw/feast_cassandra_forward_port branch from 1ed0956 to f002093 Compare March 24, 2020 15:25
@ches
Copy link
Member

ches commented Mar 28, 2020

A belated thanks for getting the ball rolling on this! I've been reluctant to push forward on it myself due to a few reasons I'll give below.

Couple of general remarks from my PoV when we get back 'round to this:

  • Cassandra-backed job management DB is a new feature for which I'd prefer to be proposed as a separate PR.

  • I'm 👎on upgrading to v4 of the Cassandra driver for Serving, until Beam does. Unless there's a very strong reason, I feel that using different versions of significant dependencies across Feast components will ultimately bring more pain and confusion than it's worth. Dependency version declarations should probably go in the root POM's <dependencyManagement> as a default development practice, in hindsight we should've put the Cassandra driver's there in the original PR.

    Now, v4 finally shades its Guava dependency, which is a perennial pain and may very well qualify as "a very strong reason", but I think justification would need to be explained since v3 worked for Serving thus far, modulo the small Guava downgrade that aligned compatibility.

Coming back to what has stalled me:

@woop
Copy link
Member

woop commented Mar 28, 2020

Just want to chime in her as well. The effort that has been put into this is very impressive @Wirick. Forward porting Cassandra support would be pretty big.

I'll not comment specifically on the PR itself at the moment, but I'll echo the reasons for delaying this forward port previously. #386 is a big change, but dare I say relatively predictable in how it will change our APIs. #567 will hopefully add structure to this PR and others, and should not bring any big surprises. I'd recommend waiting for these to both be merged in before proceeding with this PR.

The biggest uncertainty for me is still which path we will take with regards to feature references and our data model, documented in #479. Getting the API right is critical. We have one more window of opportunity (with the removal of versions) to potentially make other breaking changes. If possible I'd want to come to some kind of consensus in that issue as well before merging this PR, but it probably does not have to be a blocker.

@Wirick
Copy link
Contributor Author

Wirick commented Mar 31, 2020

@ches @woop I generally agree with everything that we've discussed. In our case because of what I was outlining in #479 we've mostly abstracted the api into a centralized module called features, which can be used to create, publish and serve features using minimal feast API interaction. This means it's relatively to stay up to date as long as we change those integration points. I'm going to be keeping this current with master, because some recent changes, such as the new feature value metrics provide a lot of value, and I'm the boldly going forward type. I'm fine waiting and rebasing as development proceeds. In terms of the new drivers I've been really enjoying the new api and creation of cassandra sessions especially, and don't think that having a different dependency in ingestion outweighs the significant improvement in intuitiveness. But at the end of the day I'm flexible

@woop woop changed the title Crw/feast cassandra forward port [WIP] Forward-port Support for Cassandra [WIP] Mar 31, 2020
* Ordering keys in ingestion and serving

* Notracing
* Versionless serving option

* Adding in tracing

* Make nulls not update feature values
@Wirick Wirick force-pushed the crw/feast_cassandra_forward_port branch from 3563323 to b43f466 Compare June 1, 2020 18:58
@Wirick Wirick requested a review from pyalex as a code owner June 1, 2020 18:58
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Wirick
To complete the pull request process, please assign woop
You can assign the PR to them by writing /assign @woop in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Wirick Wirick changed the base branch from master to v0.4-branch June 1, 2020 19:00
@feast-ci-bot
Copy link
Collaborator

@Wirick: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
test-serving 356332326001ae37e9fe14dc6c2260d359cb1765 link /test test-serving
test-core-and-ingestion 356332326001ae37e9fe14dc6c2260d359cb1765 link /test test-core-and-ingestion
test-end-to-end-batch b43f466 link /test test-end-to-end-batch
test-end-to-end b43f466 link /test test-end-to-end
test-end-to-end-redis-cluster b43f466 link /test test-end-to-end-redis-cluster
publish-docker-images b43f466 link /test publish-docker-images
test-core-and-ingestion-java-8 b43f466 link /test test-core-and-ingestion-java-8
test-end-to-end-batch-java-8 b43f466 link /test test-end-to-end-batch-java-8
test-serving-java-8 b43f466 link /test test-serving-java-8
test-end-to-end-java-8 b43f466 link /test test-end-to-end-java-8
test-java-sdk-java-8 b43f466 link /test test-java-sdk-java-8

Full PR test history

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@stale
Copy link

stale bot commented Jul 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 31, 2020
@stale stale bot closed this Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test size/XXL wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Forward-port Cassandra storage support for v0.4+