Ability to provide a custom `feature_store.yaml` during CLI operations #1556

MattDelac · 2021-05-12T22:13:30Z

Is your feature request related to a problem? Please describe.
We often want to run feast apply (or other CLI operations) on different GCP projects.

Therefore it would be nice if we could point to a specific "feature_store.yaml" when we use the CLI

Describe the solution you'd like
Something easy like feast apply --conf feature_store_prod.yaml. By default --conf would be feature_store.yaml

Describe alternatives you've considered
Copying a specific yaml to feature_store.yaml when we need to perform CLI operations on different environments

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

woop · 2021-05-12T22:16:50Z

Hey @MattDelac

Do you think #1509 (with separate repositories) would address this problem?

MattDelac · 2021-05-12T22:23:46Z

Hey @MattDelac

Do you think #1509 (with separate repositories) would address this problem?

I don't think so as you are staying in the same repository.

It's just that we would like the flexibility to have multiple configurations

conf/
conf/feature_store_adhoc.yaml
conf/feature_store_prod.yaml
conf/feature_store_local.yaml

For example, this would be useful to let the user confirm that their new FeatureView is properly applied by doing feast apply --conf conf/feature_store_local.yaml

woop · 2021-05-25T21:30:45Z

@MattDelac Some options that I can imagine

Option 1: One repo, one config

This is what we have today. The idea is that the feature_store.yaml anchors a configuration repository. It tells us where to scan for all feature definitions within the repository (and what the root folder is). If you have multiple environments then the idea is that you will have something like

.
├── prod
│   ├── feature_store.yaml
│   └── my_feature_def.py
└── staging
    ├── feature_store.yaml
    └── my_feature_def.py

but I think its possible to do

.
├── common
│   └── my_feature_def.py
├── prod
│   ├── feature_store.yaml
│   └── my_feature_def.py
└── staging
    ├── feature_store.yaml
    └── my_feature_def.py

so your prod and staging would pull definitions from common or another python package.

Option 2: One repo, many configs

Alternatively, we could make it possible to specify a remote configuration file. My main concern with that is that it could be unintuitive how it would function. Would we still consider it to be the root of a feature repo?

When I see a command like feast apply --conf conf/feature_store_local.yaml then I dont think there is anything special about the --conf file. But in reality, that conf file location is important since we will use its location to scan for feature definition files.

shihgianlee · 2021-05-25T22:44:14Z

We tried to organize our code with common which is suggested in Option 1. Personally, I like to group the relevant modules under a package/folder, i.e. prod, staging. We have GCP projects created for dev, qa and prod. In our repo, we have dev, qa and prod that points to the corresponding GCP projects. From CLI, we should be able to execute feast apply -c dev/.

I may not have a good understanding of the problem statement. What benefit does one repo with multiple feature store definitions give us if we can structure our repo to match GCP projects?

MattDelac · 2021-05-26T12:24:37Z

We tried to organize our code with common which is suggested in Option 1. Personally, I like to group the relevant modules under a package/folder, i.e. prod, staging. We have GCP projects created for dev, qa and prod. In our repo, we have dev, qa and prod that points to the corresponding GCP projects. From CLI, we should be able to execute feast apply -c dev/.

Same things on our side !

We basically have

.
├── config
   ├── feature_store_prod.yaml
   ├── feature_store_dev.yaml
└── features
   └── my_feature_def.py
   └── my_feature_def_2.py

Then once we merge a new PR, our CD tool is going to spin two jobs that basically do

update_registry_in_prod:
run:
  - cp config/feature_store_production.yaml ./feature_store.yaml
  - feast apply

update_registry_in_dev:
run:
  - cp config/feature_store_dev.yaml ./feature_store.yaml
  - feast apply

That's where I should be able to not copy the files and directly do feast apply -c config/feature_store_production.yaml

Also to give you more details, in our code we change the GCP project of our table_ref based on the config (if it's prod or development)

We have something like

table_ref: str = f"{get_bigquery_project()}.{BIGQUERY_SCHEMA}.{entity}_{feature_view}"

So the two registry (prod & dev) does not contain exactly the same information (as the table_ref will be different)

woop · 2021-07-05T16:28:47Z

So more tangibly @MattDelac, are you suggesting that all parameterization should happen in the feature_store.yaml and that you would only have a single feature repo, and that feature repo would then have conditional logic based on this configuration?

I'm just trying to figure out what the most natural approach is here for users.

MattDelac · 2021-07-05T18:03:58Z

So more tangibly @MattDelac, are you suggesting that all parameterization should happen in the feature_store.yaml and that you would only have a single feature repo, and that feature repo would then have conditional logic based on this configuration?

Yes

woop · 2021-07-05T19:21:30Z

Digression warning

One of the things I have been thinking about a lot is the philosophy behind Black. The idea is basically that we should stop thinking about formatting and just let a tool handle it. The reason I think this may apply to Feast is because we could also let Feast take a more opinionated approach to managing a feature repository.

Let's take feature inferencing for instance. Today, you have something like

driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=timedelta(days=1),
    input=driver_hourly_stats
)

after which you should run

feast apply

which infers features and adds them to the registry. The repo itself is generalized and light weight. At first glance this sounds great, but I have been thinking about whether this is actually a good practice. How does a user constrain the schema of a feature view? They should add specific features to the features argument, but then why don't we follow the same approach for inferencing? I think it may make more sense to do something like

feast discover

which infers schemas for defined feature views and updates them in the repository like

driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=timedelta(days=1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    input=driver_hourly_stats
)

A benefit of this approach is that we can version control all schema changes in git, and we have a consistent way to define features (all of it is in the repo, as opposed to some in the repo and some of them inferrred).

How does this relate to this particular issue? Well if we have a single repo then the user probably has conditional logic within their FeatureView, meaning Feast will probably have trouble updating/adding the FeatureView in the repo. Also, if we go the single repo (or folder) route, then it's not possible to easily diff different commits and see the changes over time. Either inside a single environment (prod/staging) or across.

Don't feel too strongly, but just some things on my mind.

MattDelac · 2021-07-07T16:03:13Z

One of the things I have been thinking about a lot is the philosophy behind Black. The idea is basically that we should stop thinking about formatting and just let a tool handle it. The reason I think this may apply to Feast is because we could also let Feast take a more opinionated approach to managing a feature repository.

I am clearly not against a more opinionated approach. It might be hard though as Feast is trying to be a tool which let the users connecting OfflineStore & OnlineStore (through Provider)
The big difference with a linter is that it makes sense to have one config for a given repo. Feast is here to help managing data, thus I believe that the flexibility of different environment (prod, adhoc, staging, etc.) is very important.

which infers schemas for defined feature views and updates them in the repository like

Ho I see what you mean here and it might a good approach. The problem (at least for us) is that our FS repo is also our source of truth about which features are published and which are not. Moreover we add extra information like

Team owning a FeatureView
Description of a Feature

I don't know if we could easily infer the description of the Features with other OfflineStore than BigQuery (eg Presto). Even if all OfflineStore supports it, it means that it's the responsability of the upstream pipeline to properly document a FeatureView. This will be harder to enforce as we would need to create this logic in all of our upstream tools.

Well if we have a single repo then the user probably has conditional logic within their FeatureView, meaning Feast will probably have trouble updating/adding the FeatureView in the repo

I mean it depends how we can save metadata. It sounds like adding tags to FeatureViews gives a lot of flexibility to the user. This gives them the creativity to "tweak" Feast to make it work on s specific environment (each company is different). Keeping track of those tags (or another form of metadata) should be trivial I believe and is key.

Also, if we go the single repo (or folder) route, then it's not possible to easily diff different commits and see the changes over time.

I don't understand what you mean here

Don't feel too strongly, but just some things on my mind.

Same on my side. I really enjoy this chat as it helps me think out of the box 🙂

stale · 2021-11-06T08:19:16Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

achals · 2022-08-26T17:32:01Z

Closed by #3077

MattDelac mentioned this issue Jun 1, 2021

Add current directory in sys path for CLI commands that might depend on custom providers #1594

Merged

stale bot added the wontfix This will not be worked on label Nov 6, 2021

stale bot closed this as completed Nov 14, 2021

woop reopened this Nov 17, 2021

stale bot removed the wontfix This will not be worked on label Nov 17, 2021

woop added the keep-open label Nov 17, 2021

adchia added the kind/feature New feature or request label Jan 7, 2022

kevjumba mentioned this issue Aug 3, 2022

Support passing in feature_store.yaml explicitly to FeatureStore #2848

Closed

cburroughs mentioned this issue Aug 26, 2022

feat: Allow passing repo config path via flag #3077

Merged

achals closed this as completed Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to provide a custom `feature_store.yaml` during CLI operations #1556

Ability to provide a custom `feature_store.yaml` during CLI operations #1556

MattDelac commented May 12, 2021

woop commented May 12, 2021

MattDelac commented May 12, 2021

woop commented May 25, 2021 •

edited

Loading

shihgianlee commented May 25, 2021

MattDelac commented May 26, 2021

woop commented Jul 5, 2021

MattDelac commented Jul 5, 2021

woop commented Jul 5, 2021 •

edited

Loading

MattDelac commented Jul 7, 2021

stale bot commented Nov 6, 2021

achals commented Aug 26, 2022

Ability to provide a custom feature_store.yaml during CLI operations #1556

Ability to provide a custom feature_store.yaml during CLI operations #1556

Comments

MattDelac commented May 12, 2021

woop commented May 12, 2021

MattDelac commented May 12, 2021

woop commented May 25, 2021 • edited Loading

shihgianlee commented May 25, 2021

MattDelac commented May 26, 2021

woop commented Jul 5, 2021

MattDelac commented Jul 5, 2021

woop commented Jul 5, 2021 • edited Loading

MattDelac commented Jul 7, 2021

stale bot commented Nov 6, 2021

achals commented Aug 26, 2022

Ability to provide a custom `feature_store.yaml` during CLI operations #1556

Ability to provide a custom `feature_store.yaml` during CLI operations #1556

woop commented May 25, 2021 •

edited

Loading

woop commented Jul 5, 2021 •

edited

Loading