Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: BigTable online store #3140

Merged
merged 24 commits into from
Oct 5, 2022
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b349ad6
Initial implementation of BigTable online store.
chhabrakadabra Aug 25, 2022
f9f45bb
Attempt to run bigtable integration tests.
chhabrakadabra Sep 5, 2022
2a1a863
Got the BigTable tests running in local containers
chhabrakadabra Sep 5, 2022
96814d6
Set serialization version when computing entity ID
chhabrakadabra Sep 6, 2022
6e6233f
Switch to the recommended layout in bigtable.
chhabrakadabra Sep 6, 2022
2a0d09a
Minor bugfixes.
chhabrakadabra Sep 10, 2022
98532a5
Move BigTable online store out of contrib
chhabrakadabra Sep 27, 2022
2a65fef
Attempt to run integration tests in CI.
chhabrakadabra Sep 28, 2022
de795f3
Delete tables for entity-less feature views.
chhabrakadabra Sep 28, 2022
bf798e8
Table names should be smaller than 50 characters
chhabrakadabra Sep 28, 2022
eb3ab91
Optimize bigtable reads.
chhabrakadabra Sep 28, 2022
1383b7e
dynamodb: switch to `mock_dynamodb`
chhabrakadabra Sep 28, 2022
6986fa9
minor: rename `BigTable` to `Bigtable`
chhabrakadabra Sep 29, 2022
3cd76a8
Wrote some Bigtable documentation.
chhabrakadabra Sep 29, 2022
c7449cc
Bugfix: Deal with missing row keys.
chhabrakadabra Sep 30, 2022
f356312
Fix linting issues.
chhabrakadabra Sep 30, 2022
ff62c6b
Generate requirements files.
chhabrakadabra Sep 30, 2022
cce3602
Don't bother materializing created timestamp.
chhabrakadabra Sep 30, 2022
943ee3f
Remove `tensorflow-metadata`.
chhabrakadabra Sep 30, 2022
ab80b42
Minor fix to Bigtable documentation.
chhabrakadabra Oct 5, 2022
4755745
update roadmap docs
adchia Oct 5, 2022
c0f2d8e
Fix roadmap doc
adchia Oct 5, 2022
2d6bdac
Change link to point to roadmap page
adchia Oct 5, 2022
992c318
change order in roadmap
adchia Oct 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/reference/online-stores/bigtable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Bigtable online store

## Description

The [Bigtable](https://cloud.google.com/bigtable) online store provides support for
materializing feature values into Cloud Bigtable. The data model used to store feature
values in Bigtable is described in more detail
[here](../../specs/online_store_format.md#google-bigtable-online-store-format).

## Getting started

In order to use this online store, you'll need to run `pip install 'feast[gcp]'`. You
can then get started with the command `feast init REPO_NAME -t gcp`.

## Example

{% code title="feature_store.yaml" %}
```yaml
project: my_feature_repo
registry: data/registry.db
provider: gcp
online_store:
type: bigtable
project_id: my_gcp_project
instance: my_bigtable_instance
```
{% endcode %}

The full set of configuration options is available in
[BigtableOnlineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.online_stores.bigtable.BigtableOnlineStoreConfig).

## Functionality Matrix

The set of functionality supported by online stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the Bigtable online store.

| | Bigtable |
|-----------------------------------------------------------|----------|
| write feature values to the online store | yes |
| read feature values from the online store | yes |
| update infrastructure (e.g. tables) in the online store | yes |
| teardown infrastructure (e.g. tables) in the online store | yes |
| generate a plan of infrastructure changes | no |
| support for on-demand transforms | yes |
| readable by Python SDK | yes |
| readable by Java | no |
| readable by Go | no |
| support for entityless feature views | yes |
| support for concurrent writing to the same key | no |
chhabrakadabra marked this conversation as resolved.
Show resolved Hide resolved
| support for ttl (time to live) at retrieval | no |
| support for deleting expired data | no |
| collocated by feature view | yes |
| collocated by feature service | no |
| collocated by entity key | yes |

To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).
25 changes: 24 additions & 1 deletion docs/specs/online_store_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,29 @@ Other types of entity keys are not supported in this version of the specificatio

![Datastore Online Example](datastore_online_example.png)

## Google Bigtable Online Store Format

[Bigtable storage model](https://cloud.google.com/bigtable/docs/overview#storage-model)
consists of massively scalable tables, with each row keyed by a "row key". The rows in a
table are stored lexicographically sorted by this row key.

We use the following structure to store feature data in Bigtable:

* All feature data for an entity or a specific group of entities is stored in the same
table. The table name is derived by concatenating the lexicographically sorted names
of entities.
* This implementation only uses one column family per table, named `features`.
* Each row key is created by concatenating a hash derived from the specific entity keys
and the name of the feature view. Each row only stores feature values for a specific
feature view. This arrangement also means that feature values for a given group of
entities are colocated.
* The columns used in each row are named after the features in the feature view.
Bigtable is perfectly content being sparsely populated.
* By default, we store 1 historical value of each feature value. This can be configured
chhabrakadabra marked this conversation as resolved.
Show resolved Hide resolved
using the `max_versions` setting in `BigtableOnlineStoreConfig`. This implementation
of the online store does not have the ability to revert any given value to its old
self. To use the historical version, you'll have to use custom code.

## Cassandra/Astra DB Online Store Format

### Overview
Expand Down Expand Up @@ -250,4 +273,4 @@ message BoolList {
repeated bool val = 1;
}

```
```
Loading