Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit on number of buckets used #60

Open
evanstade opened this issue Jan 19, 2023 · 3 comments
Open

Limit on number of buckets used #60

evanstade opened this issue Jan 19, 2023 · 3 comments

Comments

@evanstade
Copy link
Collaborator

evanstade commented Jan 19, 2023

Using this space to follow up on the discussion in #36

I think we want some cap on the number of buckets being used since each bucket does incur overhead, and (intentionally or unintentionally) malicious sites should not be able to vandalize a user's machine/user agent. However we have a hard time coming up with a fixed number that is appropriate and also don't want to create a new error condition for sites to have to handle.

The solution suggested by @asutherland to tie into the quota system seems the most elegant way to address all these issues. Thus if a site is near its quota limit, creating a new bucket would fail in the same way as attempting to use additional storage without buckets.

For the purposes of discussion, here are a couple options:

  1. We could attach a constant amount of space in the accounting for each bucket, i.e. each bucket is counted as [the amount of space the APIs within it actually use] + [the overhead of one bucket]. Pro: since each bucket has costs associated with it (e.g. putting 10MB into IDB split between two buckets takes up more disk space than the same amount of data put in IDB in one bucket or the default bucket) it makes sense that the same amount of data spread across more buckets should be counted as more usage. Con: this could discourage bucket usage at the margin since using buckets would always count against the site's quota; allows sites to misuse buckets since the overhead figure is probably pretty small relative to a site's quota

  2. Alternatively we count each bucket as max([the amount of space the APIs within it actually use], 20MB). Pros: discourages creation of a zillion tiny buckets; doesn't penalize sites at all for using buckets when each bucket has at least 20MB (or whatever number) in it. Cons: somewhat of a discrepancy between actual usage and reported usage

I am leaning towards the latter, which also matches @asutherland 's suggestion on #36.

@evanstade
Copy link
Collaborator Author

In addition, we will want to reject the open promise with a DOMException of QuotaExceededError

@evanstade
Copy link
Collaborator Author

evanstade commented Jan 26, 2023

After some more thought on this, conflating quota usage with bucket count creates some bad behavior in certain edges cases. For (2) above, imagine a situation where the total quota is 100 MB and there are 10 buckets, some of which actually contain 10MB of data and some contain less than that. The total amount of space actually being used is 80MB. The site can't create an 11th bucket --- fine. But where can it store another 20MB? Some buckets will be OK to store in, as adding to a bucket that contains less than 10MB does not change the total usage calculation. Other buckets will not be OK to store in, because they will grow past 10MB and then usage accounting will report more than 100MB. This is a weird situation to be in, because forcing the developer to put new data in one bucket but not the other serves no particular purpose to the user or anyone involved.

It strikes me that at the end of the day we don't want a site to treat buckets as totally free to create and we also don't want a site to use up too much space, but these two things are not necessarily related. And from a developer's perspective, combining these two requirements is likely to create unpredictable behavior as outlined above. We already know of cases where the majority of usage would be focused in a single bucket and other buckets are just used for holding less important and smaller data like logs. Such a site wants to allot as much space as possible for the user-facing data, and making small buckets take up a minimum amount of quota (such as 10MB) would just cause the site to not use buckets at all.

Hence I think we should implement a cap on the absolute number of buckets, both a static limit of 10,000 and another dynamic limit that is calculated based on quota space and a reasonable minimum bucket size. So if the quota is 100MB, and we take 10MB to a reasonable-sounding typical bucket size, then we come up with a cap of 10 buckets. But the actual usage that's reported to the developer and which is bound by the site quota will not depend on the number of buckets created, and the 100MB can be distributed among those buckets however the site chooses.

@asutherland
Copy link
Collaborator

It strikes me that at the end of the day we don't want a site to treat buckets as totally free to create and we also don't want a site to use up too much space, but these two things are not necessarily related.

Agreed.

Hence I think we should implement a cap on the absolute number of buckets, both a static limit of 10,000 and another dynamic limit that is calculated based on quota space and a reasonable minimum bucket size.

Yeah, 10,000 sounds like a reasonable hard cap to help developers reason about how to map their data model to buckets without them worrying too much about getting anywhere near the limit. In my hypothetical offline music application example comment, that would suggest a bucket-per-album could work, which feels like a nice granularity for sites and users as a unit of data eviction.

This also cleans up implementation-related questions for me; a previous proposal of a max of 10 was an order of magnitude where we might be tempted to keep all of a shelf's buckets in memory most of the time, but for 10k we would probably have a limited cache. In turn I'm not quite as concerned about the storage used by the buckets as long as it's allowed to count against quota.

I do think that proposal was right about how many buckets most sites would need, though. So in terms of minimum buckets, maybe some (non-normative?) text like: Implementations are expected to support a minimum of 10 buckets for each storage shelf (origin) and allow at least 1 additional bucket for every 10 MiB of quota provided.

But the actual usage that's reported to the developer and which is bound by the site quota will not depend on the number of buckets created, and the 100MB can be distributed among those buckets however the site chooses.

As long as the spec allows for the quota used to be increased by some implementation defined amount and each implementation ideally shows some increase, that makes sense for me under our current implementation-defined quota tracking. Ideally we would get to standardizing storage action sizes, at which point we would probably want to quantify an official number to impact the quota.

Note: At that point quota will very much be a made up unit like "board game money" quantity since browser optimizations like compression and de-duplication of data can cause actual storage usage to diverge wildly from reality, but I would expect browsers to end up factoring that into their implementation-defined quota limit policies. There isn't much harm in growing the quota limit thanks to on-disk savings as long as it doesn't reveal information about opaque Response bodies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants