Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Rename FileOfflineStore to DaskOfflineStore #4349

Merged
merged 1 commit into from
Jul 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
* [Azure Synapse + Azure SQL (contrib)](reference/data-sources/mssql.md)
* [Offline stores](reference/offline-stores/README.md)
* [Overview](reference/offline-stores/overview.md)
* [File](reference/offline-stores/file.md)
* [Dask](reference/offline-stores/dask.md)
* [Snowflake](reference/offline-stores/snowflake.md)
* [BigQuery](reference/offline-stores/bigquery.md)
* [Redshift](reference/offline-stores/redshift.md)
Expand Down Expand Up @@ -119,7 +119,7 @@
* [Feature servers](reference/feature-servers/README.md)
* [Python feature server](reference/feature-servers/python-feature-server.md)
* [\[Alpha\] Go feature server](reference/feature-servers/go-feature-server.md)
* [Offline Feature Server](reference/feature-servers/offline-feature-server)
* [Offline Feature Server](reference/feature-servers/offline-feature-server.md)
* [\[Beta\] Web UI](reference/alpha-web-ui.md)
* [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# File offline store
# Dask offline store

## Description

The file offline store provides support for reading [FileSources](../data-sources/file.md).
It uses Dask as the compute engine.
The Dask offline store provides support for reading [FileSources](../data-sources/file.md).

{% hint style="warning" %}
All data is downloaded and joined using Python and therefore may not scale to production workloads.
Expand All @@ -17,28 +16,28 @@ project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
type: file
type: dask
```
{% endcode %}

The full set of configuration options is available in [FileOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.file.FileOfflineStoreConfig).
The full set of configuration options is available in [DaskOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.dask.DaskOfflineStoreConfig).

## Functionality Matrix

The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the file offline store.
Below is a matrix indicating which functionality is supported by the dask offline store.

| | File |
| | Dask |
| :-------------------------------- | :-- |
| `get_historical_features` (point-in-time correct join) | yes |
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
| `offline_write_batch` (persist dataframes to offline store) | yes |
| `write_logged_features` (persist logged features to offline store) | yes |

Below is a matrix indicating which functionality is supported by `FileRetrievalJob`.
Below is a matrix indicating which functionality is supported by `DaskRetrievalJob`.

| | File |
| | Dask |
| --------------------------------- | --- |
| export to dataframe | yes |
| export to arrow table | yes |
Expand Down
6 changes: 3 additions & 3 deletions docs/reference/offline-stores/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ The first three of these methods all return a `RetrievalJob` specific to an offl

## Functionality Matrix

There are currently four core offline store implementations: `FileOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
There are currently four core offline store implementations: `DaskOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
There are several additional implementations contributed by the Feast community (`PostgreSQLOfflineStore`, `SparkOfflineStore`, and `TrinoOfflineStore`), which are not guaranteed to be stable or to match the functionality of the core implementations.
Details for each specific offline store, such as how to configure it in a `feature_store.yaml`, can be found [here](README.md).

Below is a matrix indicating which offline stores support which methods.

| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes |
| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes |
Expand All @@ -42,7 +42,7 @@ Below is a matrix indicating which offline stores support which methods.

Below is a matrix indicating which `RetrievalJob`s support what functionality.

| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- |
| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes |
| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes |
Expand Down
8 changes: 0 additions & 8 deletions sdk/python/docs/source/feast.infra.contrib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,6 @@ feast.infra.contrib package
Submodules
----------

feast.infra.contrib.azure\_provider module
------------------------------------------

.. automodule:: feast.infra.contrib.azure_provider
:members:
:undoc-members:
:show-inheritance:

feast.infra.contrib.grpc\_server module
---------------------------------------

Expand Down
2 changes: 0 additions & 2 deletions sdk/python/docs/source/feast.infra.feature_servers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ Subpackages
.. toctree::
:maxdepth: 4

feast.infra.feature_servers.aws_lambda
feast.infra.feature_servers.gcp_cloudrun
feast.infra.feature_servers.local_process
feast.infra.feature_servers.multicloud

Expand Down
12 changes: 6 additions & 6 deletions sdk/python/docs/source/feast.infra.offline_stores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,18 @@ feast.infra.offline\_stores.bigquery\_source module
:undoc-members:
:show-inheritance:

feast.infra.offline\_stores.duckdb module
-----------------------------------------
feast.infra.offline\_stores.dask module
---------------------------------------

.. automodule:: feast.infra.offline_stores.duckdb
.. automodule:: feast.infra.offline_stores.dask
:members:
:undoc-members:
:show-inheritance:

feast.infra.offline\_stores.file module
---------------------------------------
feast.infra.offline\_stores.duckdb module
-----------------------------------------

.. automodule:: feast.infra.offline_stores.file
.. automodule:: feast.infra.offline_stores.duckdb
:members:
:undoc-members:
:show-inheritance:
Expand Down
1 change: 0 additions & 1 deletion sdk/python/docs/source/feast.infra.registry.contrib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ Subpackages
:maxdepth: 4

feast.infra.registry.contrib.azure
feast.infra.registry.contrib.postgres

Module contents
---------------
Expand Down
24 changes: 0 additions & 24 deletions sdk/python/docs/source/feast.infra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,6 @@ Subpackages
Submodules
----------

feast.infra.aws module
----------------------

.. automodule:: feast.infra.aws
:members:
:undoc-members:
:show-inheritance:

feast.infra.gcp module
----------------------

.. automodule:: feast.infra.gcp
:members:
:undoc-members:
:show-inheritance:

feast.infra.infra\_object module
--------------------------------

Expand All @@ -51,14 +35,6 @@ feast.infra.key\_encoding\_utils module
:undoc-members:
:show-inheritance:

feast.infra.local module
------------------------

.. automodule:: feast.infra.local
:members:
:undoc-members:
:show-inheritance:

feast.infra.passthrough\_provider module
----------------------------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,20 +39,20 @@
from feast.saved_dataset import SavedDatasetStorage
from feast.utils import _get_requested_feature_views_to_features_dict

# FileRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1
# DaskRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1
# This is not the desired behavior for our use case, so we set the convert-string option to False
# See (https://github.com/dask/dask/issues/10881#issuecomment-1923327936)
dask.config.set({"dataframe.convert-string": False})


class FileOfflineStoreConfig(FeastConfigBaseModel):
"""Offline store config for local (file-based) store"""
class DaskOfflineStoreConfig(FeastConfigBaseModel):
"""Offline store config for dask store"""

type: Literal["file"] = "file"
type: Union[Literal["dask"], Literal["file"]] = "dask"
""" Offline store type selector"""


class FileRetrievalJob(RetrievalJob):
class DaskRetrievalJob(RetrievalJob):
def __init__(
self,
evaluation_function: Callable,
Expand Down Expand Up @@ -122,7 +122,7 @@ def supports_remote_storage_export(self) -> bool:
return False


class FileOfflineStore(OfflineStore):
class DaskOfflineStore(OfflineStore):
@staticmethod
def get_historical_features(
config: RepoConfig,
Expand All @@ -133,7 +133,7 @@ def get_historical_features(
project: str,
full_feature_names: bool = False,
) -> RetrievalJob:
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
for fv in feature_views:
assert isinstance(fv.batch_source, FileSource)

Expand Down Expand Up @@ -283,7 +283,7 @@ def evaluate_historical_retrieval():

return entity_df_with_features.persist()

job = FileRetrievalJob(
job = DaskRetrievalJob(
evaluation_function=evaluate_historical_retrieval,
full_feature_names=full_feature_names,
on_demand_feature_views=OnDemandFeatureView.get_requested_odfvs(
Expand All @@ -309,7 +309,7 @@ def pull_latest_from_table_or_query(
start_date: datetime,
end_date: datetime,
) -> RetrievalJob:
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
assert isinstance(data_source, FileSource)

# Create lazy function that is only called from the RetrievalJob object
Expand Down Expand Up @@ -372,7 +372,7 @@ def evaluate_offline_job():
return source_df[list(columns_to_extract)].persist()

# When materializing a single feature view, we don't need full feature names. On demand transforms aren't materialized
return FileRetrievalJob(
return DaskRetrievalJob(
evaluation_function=evaluate_offline_job,
full_feature_names=False,
)
Expand All @@ -387,10 +387,10 @@ def pull_all_from_table_or_query(
start_date: datetime,
end_date: datetime,
) -> RetrievalJob:
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
assert isinstance(data_source, FileSource)

return FileOfflineStore.pull_latest_from_table_or_query(
return DaskOfflineStore.pull_latest_from_table_or_query(
config=config,
data_source=data_source,
join_key_columns=join_key_columns
Expand All @@ -410,7 +410,7 @@ def write_logged_features(
logging_config: LoggingConfig,
registry: BaseRegistry,
):
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
destination = logging_config.destination
assert isinstance(destination, FileLoggingDestination)

Expand Down Expand Up @@ -441,7 +441,7 @@ def offline_write_batch(
table: pyarrow.Table,
progress: Optional[Callable[[int], Any]],
):
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
assert isinstance(feature_view.batch_source, FileSource)

pa_schema, column_names = get_pyarrow_schema_from_batch_source(
Expand Down
7 changes: 4 additions & 3 deletions sdk/python/feast/repo_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@
}

OFFLINE_STORE_CLASS_FOR_TYPE = {
"file": "feast.infra.offline_stores.file.FileOfflineStore",
"file": "feast.infra.offline_stores.dask.DaskOfflineStore",
"dask": "feast.infra.offline_stores.dask.DaskOfflineStore",
"bigquery": "feast.infra.offline_stores.bigquery.BigQueryOfflineStore",
"redshift": "feast.infra.offline_stores.redshift.RedshiftOfflineStore",
"snowflake.offline": "feast.infra.offline_stores.snowflake.SnowflakeOfflineStore",
Expand Down Expand Up @@ -205,7 +206,7 @@ def __init__(self, **data: Any):
self.registry_config = data["registry"]

self._offline_store = None
self.offline_config = data.get("offline_store", "file")
self.offline_config = data.get("offline_store", "dask")

self._online_store = None
self.online_config = data.get("online_store", "sqlite")
Expand Down Expand Up @@ -348,7 +349,7 @@ def _validate_offline_store_config(cls, values: Any) -> Any:

# Set the default type
if "type" not in values["offline_store"]:
values["offline_store"]["type"] = "file"
values["offline_store"]["type"] = "dask"

offline_store_type = values["offline_store"]["type"]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
from feast.data_format import DeltaFormat, ParquetFormat
from feast.data_source import DataSource
from feast.feature_logging import LoggingDestination
from feast.infra.offline_stores.dask import DaskOfflineStoreConfig
from feast.infra.offline_stores.duckdb import DuckDBOfflineStoreConfig
from feast.infra.offline_stores.file import FileOfflineStoreConfig
from feast.infra.offline_stores.file_source import (
FileLoggingDestination,
SavedDatasetFileStorage,
Expand Down Expand Up @@ -84,7 +84,7 @@ def get_prefixed_table_name(self, suffix: str) -> str:
return f"{self.project_name}.{suffix}"

def create_offline_store_config(self) -> FeastConfigBaseModel:
return FileOfflineStoreConfig()
return DaskOfflineStoreConfig()

def create_logged_features_destination(self) -> LoggingDestination:
d = tempfile.mkdtemp(prefix=self.project_name)
Expand Down Expand Up @@ -334,7 +334,7 @@ def get_prefixed_table_name(self, suffix: str) -> str:
return f"{suffix}"

def create_offline_store_config(self) -> FeastConfigBaseModel:
return FileOfflineStoreConfig()
return DaskOfflineStoreConfig()

def teardown(self):
self.minio.stop()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from feast.infra.offline_stores.contrib.trino_offline_store.trino import (
TrinoRetrievalJob,
)
from feast.infra.offline_stores.file import FileRetrievalJob
from feast.infra.offline_stores.dask import DaskRetrievalJob
from feast.infra.offline_stores.offline_store import RetrievalJob, RetrievalMetadata
from feast.infra.offline_stores.redshift import (
RedshiftOfflineStoreConfig,
Expand Down Expand Up @@ -100,7 +100,7 @@ def metadata(self) -> Optional[RetrievalMetadata]:
@pytest.fixture(
params=[
MockRetrievalJob,
FileRetrievalJob,
DaskRetrievalJob,
RedshiftRetrievalJob,
SnowflakeRetrievalJob,
AthenaRetrievalJob,
Expand All @@ -112,8 +112,8 @@ def metadata(self) -> Optional[RetrievalMetadata]:
]
)
def retrieval_job(request, environment):
if request.param is FileRetrievalJob:
return FileRetrievalJob(lambda: 1, full_feature_names=False)
if request.param is DaskRetrievalJob:
return DaskRetrievalJob(lambda: 1, full_feature_names=False)
elif request.param is RedshiftRetrievalJob:
offline_store_config = RedshiftOfflineStoreConfig(
cluster_id="feast-int-bucket",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pytest
from moto import mock_dynamodb

from feast.infra.offline_stores.file import FileOfflineStoreConfig
from feast.infra.offline_stores.dask import DaskOfflineStoreConfig
from feast.infra.online_stores.dynamodb import (
DynamoDBOnlineStore,
DynamoDBOnlineStoreConfig,
Expand Down Expand Up @@ -40,7 +40,7 @@ def repo_config():
provider=PROVIDER,
online_store=DynamoDBOnlineStoreConfig(region=REGION),
# online_store={"type": "dynamodb", "region": REGION},
offline_store=FileOfflineStoreConfig(),
offline_store=DaskOfflineStoreConfig(),
entity_key_serialization_version=2,
)

Expand Down
Loading