Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fs/s3): Add support for prefix filtering #1938

Merged
merged 18 commits into from
Aug 8, 2023

Conversation

ahobson
Copy link
Contributor

@ahobson ahobson commented Aug 1, 2023

This PR allows for using part of an s3 bucket for feature flags, using the S3 native Prefix filter.

@GeorgeMac You wondered about prefix listing and I realized it was much easier to do than I thought and also a feature that I think I might be able to use sooner rather than later.

I tried to update the tests to reflect this new feature, but I didn't test all combinations. Let me know if you think it needs more coverage.

@ahobson ahobson requested a review from a team as a code owner August 1, 2023 18:54
* This allows for using part of an s3 bucket for feature flags, using
  the S3 native `Prefix` filter
@codecov
Copy link

codecov bot commented Aug 1, 2023

Codecov Report

Merging #1938 (9d92187) into main (24108a4) will increase coverage by 0.06%.
The diff coverage is 83.63%.

@@            Coverage Diff             @@
##             main    #1938      +/-   ##
==========================================
+ Coverage   71.07%   71.13%   +0.06%     
==========================================
  Files          67       67              
  Lines        6541     6572      +31     
==========================================
+ Hits         4649     4675      +26     
- Misses       1629     1634       +5     
  Partials      263      263              
Files Changed Coverage Δ
internal/config/storage.go 79.68% <ø> (ø)
internal/s3fs/s3fs.go 68.21% <82.00%> (+2.88%) ⬆️
internal/storage/fs/s3/source.go 79.41% <100.00%> (+1.28%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@GeorgeMac GeorgeMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! Definitely worth considering. Just having a thought before we go any further though.

The .flipt.yml index file we already parse and respect when using the filesystem backend might actually give you the capabilities you already need without a code change.
Flipt will attempt to fetch and parse this file first, it includes both an include and exclude list. If the file does not exist, it will use a default index file:

version: "1.0"
include:
  - "**/features.yaml"
  - "**/features.yml"
  - "**/*.features.yaml"
  - "**/*.features.yml"
exclude: []

However, it will respect any provided override of that file instead. So, given you just want to source from a prefix today, e.g. someprefix you could add a .flipt.yml with:

version: "1.0"
include:
  - "someprefix/**/features.yaml"
  - "someprefix/**/features.yml"
  - "someprefix/**/*.features.yaml"
  - "someprefix/**/*.features.yml"
exclude: []

I haven't tried this particular example yet though, but that would be my assumption at least.

Obviously, the one advantage to a prefix server side like you have here is the size of the list response fetched from S3 can be reduced. But I don't know how much of a problem it is that we might need that.

What do you think @ahobson ? Do you think server side prefixed fetching is still valuable regardless of the index capability? i.e. to reduce the number of keys fetched.

My one thought originally around using prefixed search was just to make the fs.FS implementation behave like a much like an FS as possible. Have it e.g. tokenize on / and walk prefixes, but after reflecting on it, I dont see any real value in that. What you implemented and proposed already seems pretty perfect already.

@ahobson
Copy link
Contributor Author

ahobson commented Aug 2, 2023

TL; DR: for my specific use case, the prefix would be helpful, but I'm not sure how common that use case is.

Nice, thanks! Definitely worth considering. Just having a thought before we go any further though.

The .flipt.yml index file we already parse and respect when using the filesystem backend might actually give you the capabilities you already need without a code change. Flipt will attempt to fetch and parse this file first, it includes both an include and exclude list.

I hadn't realized that!

Obviously, the one advantage to a prefix server side like you have here is the size of the list response fetched from S3 can be reduced. But I don't know how much of a problem it is that we might need that.

What do you think @ahobson ? Do you think server side prefixed fetching is still valuable regardless of the index capability? i.e. to reduce the number of keys fetched.

As I explored how we might deploy flipt for one of our applications, I realized we already had an S3 bucket configured, but we use it for multiple purposes, including user uploads. The result set could be quite large.

... and that has made me realize I didn't implement any support for dealing with truncated responses because the bucket has more than 1000 objects. I should probably implement that anyway.

However, if I could set a prefix, the result size would be much more manageable. I'm not sure how common a situation it is to have an s3 bucket used for multiple purposes like this.

@ahobson
Copy link
Contributor Author

ahobson commented Aug 4, 2023

The .flipt.yml index file we already parse and respect when using the filesystem backend might actually give you the capabilities you already need without a code change.

I was thinking about this a bit more and my understanding is that filtering happens at the strorage level, not at the filesystem level. I believe that means for all filesystems, the entire directory tree will be walked regardless of .flipt.yaml exludes or includes. For S3, that means listing all objects in the bucket and returning them from ReadDir (and making however many calls to AWS to get all the objects in the bucket).

Supporting a prefix for S3 means that only the objects that match the prefix will be returned, limiting the size of the results and the number of calls to AWS. I suppose that's just an optimization, but to me it seems like a pretty important one.

Copy link
Contributor

@GeorgeMac GeorgeMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow review @ahobson , I was away end of last week.

One question around how the prefix in enforced internally. Otherwise, this looks great and the reasoning makes sense.

Comment on lines 71 to 74
// If a prefix is provided, cannot Open files without the prefix
if f.prefix != "" && !strings.HasPrefix(name, f.prefix) {
return nil, pathError
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought on this:

I wonder if instead of enforcing that all keys provide the prefix, we should instead forcibly prepend the prefix.

My reasoning is for e.g. .flipt.yml if we dont do this, given a prefix is defined, the user will never be able to override the .flipt.yml. Since the storage layer will always request it without the prefix. This might be a confusing and easy mistaken to make.

However, if we forcibly apply the prefix, then the user can still put a .flipt.yml at the prefix in their bucket and get the same experience of controlling the overrides that way.

What do you think @ahobson ?

Copy link
Contributor Author

@ahobson ahobson Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Nice catch! Yes, supporting opening the .flipt.yml file in s3 under the prefix seems like an important use case.

The current implementation of ReadDir in the s3fs returns the Key that includes the prefix and thus Open gets the full path.

I think it would be less confusing (and less error prone in the code) if the includes and excludes in the .flipt.yml file matched against the S3 Key instead of the s3fs removing the prefix.

If a path provided to Open does not have a prefix, then one is prepended before the S3 GetObject call. That would support the .flipt.yml file, but still allow everything else to operate on the full Key.

@GeorgeMac I think I probably just restated your proposal, but I wanted to make sure we are on the same page about the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yeah I think I see what you're saying there r.e. the subtle interaction there.
Kind of append if not already prefixed?:

if !strings.HasPrefix(path, prefix) {
    path = prefix + path
}

That sounds great if what I am saying equates with what you're thinking 👍

GeorgeMac and others added 13 commits August 7, 2023 10:23
…-io#1962)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.56.2 to 1.57.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.56.2...v1.57.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: George <me@georgemac.com>
…flipt-io#1961)

Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.11.0 to 0.11.1.
- [Release notes](https://github.com/golang/tools/releases)
- [Commits](golang/tools@v0.11.0...v0.11.1)

---
updated-dependencies:
- dependency-name: golang.org/x/tools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
)

Bumps [eslint](https://github.com/eslint/eslint) from 8.45.0 to 8.46.0.
- [Release notes](https://github.com/eslint/eslint/releases)
- [Changelog](https://github.com/eslint/eslint/blob/main/CHANGELOG.md)
- [Commits](eslint/eslint@v8.45.0...v8.46.0)

---
updated-dependencies:
- dependency-name: eslint
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.18.30 to 1.18.32.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md)
- [Commits](aws/aws-sdk-go-v2@config/v1.18.30...config/v1.18.32)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: George <me@georgemac.com>
Co-authored-by: Mark Phelps <209477+markphelps@users.noreply.github.com>
@GeorgeMac
Copy link
Contributor

Unsure if you have access to the GH action output but I am seeing this error currently in CI @ahobson :

--- FAIL: Test_SourceGetPrefix (0.00s)
    source_test.go:65: 
        	Error Trace:	/src/internal/storage/fs/s3/source_test.go:65
        	Error:      	An error is expected but got nil.
        	Test:       	Test_SourceGetPrefix

@ahobson
Copy link
Contributor Author

ahobson commented Aug 8, 2023

@GeorgeMac 🤦 I'm sorry, I missed that. Thank you for the direct ping.

I've updated the tests and they now pass for me locally. Hopefully I haven't missed anything else.

Copy link
Contributor

@GeorgeMac GeorgeMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Thanks @ahobson

@GeorgeMac GeorgeMac merged commit ed6b7f0 into flipt-io:main Aug 8, 2023
22 checks passed
@markphelps
Copy link
Collaborator

Thank you @ahobson !! Very awesome indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants