-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No way to query past Query.Prefix? #116
Comments
Some datastores may not be able to support this (like flatfs, but it doesn't even support ordered iterating, so it's probably fine). 👍 from me, just make sure that it's optional and not enabled by default (default = zero value) cc @Stebalien @bigs |
Ok, cool, thanks for the quick response. How about I whip up some changes, post them here for initial review/discussion and if they look good, we can take it from there. |
Ok, so so here's my proposed change to
type Query struct {
Prefix string // namespaces the query to results whose keys have Prefix
IterationPrefix string // if set, this overrides Prefix during query iteration (n/a for some impls) At first I was thinking a So If we go with adding the If y'all like this, give me the green light and I'll start submitting the changes. I'm OCD when it comes to naming and comments, so don't be shy for any name, style, or comment edits -- I get it. :) |
type Query struct {
Prefix string
StartPrefix string
[...]
}
func Query[...]
if q.StartPrefix == "" {
q.StartPrefix = q.Prefix
}
if !strings.HasPrefix(q.StartPrefix, q.Prefix) {
return errors.New("StartPrefix not within Prefix")
}
[...]
it.Seek(q.StartPrefix)
for [...] it.ValidForPrefix(q.Prefix) |
Love it -- great suggestion! |
SeekPrefix or StartPrefix? SeekPrefix may more accurately describe the behavior for impls that literally have seek (vs prefix) behavior. |
Yeah, |
i'm definitely not against this—i think it could be quite useful. i think we'll need to update datastores that don't natively support this to provide it or at least heavily document its absence. |
How about: type Query struct {
Prefix string // namespaces the query to results whose keys have Prefix
SeekPrefix string // if set, an alternate to Prefix that initially positions the query (n/a for some impls)
... |
Ok, so I made the discussed changes to leveldb, badger, and bolt -- changes were clean, cost-free, were only a couple lines, and have been tested. Meanwhile, I'm not familiar with S3 to know the obvious path, so I was going to pass on S3 for now. If y'all want, I can have |
Feel free to open the PRs, it's easier to discuss changes there. (https://github.com/whyrusleeping/sql-datastore/ will also need support) |
ok great |
Can you link the PRs here? (makes it easier for others to follow) |
Oops, yeah good idea. :D 4 PRs submitted. Will do SQL once we're past this step if that werks. The 3 ds impl changes will of course get compile fails until the ds.Query: badger: leveldb: bolt: (I also added support for Query.Prefix) |
Note: S3 calls this "StartAfter" (but SeekPrefix is fine with me). |
Question: How should this interact with sorting by value? There are two reasonable solutions I can think of:
Thoughts? |
if we want to stay in line with leveldb and descendants, it should happen
after the sort. it feels like the semantics of the alternative would almost
be undefined
…On Wed, Feb 6, 2019 at 18:40 Steven Allen ***@***.***> wrote:
Question: How should this interact with sorting by value?
There are two reasonable solutions I can think of:
1. Apply after sorting. This is useful for resuming query.
2. Apply before sorting. This is useful for, e.g., scanning a range of
dates. Maybe we even want to replace this field with a range (Start, Stop)?
This is also likely easier to implement everywhere.
Thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#116 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANBWgXTFd3FTp8lT_8BHrSsA7iTTc5eks5vK2gJgaJpZM4agZRr>
.
|
Liking that Start Stop idea, and it maps well to most impls (tho I can't speak for S3). Steb, as you raised in the other thread, there's the bookkeeping factor w/ ascending vs descending order thing, so that's the main PITA factor I see. With a Stop though, that means there'd now be another reason for a query to stop (which is useful b/c even w/ the SeekPrefix I'm using for now, for example, I have to check each result key to see if it's time to stop). So I'm seeing that Start/Stop is a nice alternative to Prefix (using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right). So the bottom line seems to be that Start+Stop seems to be a wealthier alternative to Prefix, but having both may be over-defined-ish. I'd happily help support a switch to Start+Stop if that's what others would be into. I already have test code that tests our ds stuff (independent of the impl used), so I'd be up to make common_test.go that would run core tests for bolt, badger, SQL, leveldb, and other impls that support the whole shootin match. I was surprised it didn't already exist, but who wants to do that -- I get it. Literally the opposite of feeling productive lol. |
Yes.
Almost. Stop would have to be Start+1. E.g., /a -> /b (non-inclusive).
We have a shared test suite in the But yeah, a complete shared test suite would be amazing.
Thoughts @magik6k, @bigs? TL;DR: "seeking" implies seeking through some ordered set while a range is just a range of keys. IMO, this better matches user needs and should be easier to implement correctly. |
yeah, i'm into it. we could even implement a member method, |
Well, my thought was to define the Stop key as inclusive, which of course means either <= or >=, depending on ascending/descending keys. This would appear to fit in everywhere. So maybe instead of Start/Stop, it's Start/Last, First/Last, or Seek/Last? |
i think the |
With start/stop, I wouldn't take order into account (we also support things like ordering by value). That is, start/stop mean "all keys between start and stop" where start <= stop. That's really the main reason I prefer start/stop over seek.
Stop inclusive can't mimic Note: We could also just use a custom range type: type Bound {
Value string
Exclusive bool
}
func (b Bound) Open() bool {
return b == Bound{} // zero value is "open"
}
func (b Bound) Before(key string) bool {
if b.Exclusive {
return bound < key
} else if key == "" {
return true
} else {
return bound <= key
}
}
func (b Bound) After(key string) bool {
if b.Exclusive {
return key < bound
} else if key == "" {
return true // open range
} else {
return key <= bound
}
}
type Range struct {
Start, Stop Bound
}
func (r *Range) Contains(key string) bool {
return r.Start.Before(key) && r.Stop.After(key)
}
You mean get rid of |
my thinking was that if re: removing |
Got it. Yeah, that would also work (although I mildly prefer ranges).
It just feels like an unnecessary breaking change (although we could deprecate it). With prefix and start/stop, I assume you'd turn the prefix into a range and then take the intersection. We can even add helper functions for this. |
Range code: https://gist.github.com/66d63607ee56de9192025c8606f57e3b (needs tests). Really, this is starting to make me like my ranges less... (but at least that code only has to be written once). |
Is there anything notable blocking from this landing in |
It needs to be implemented (correctly) in badger, flatfs, s3, etc. and that's a fair amount of work, especially because it interacts with other parts of the query system.
|
Would it make sense to scope this to another interface, something like Implemented as: type RangeDatastore interface {
Datastore
QueryRange(ctx context.Context, q query.Range)
} Implementation based on your range code gist This way it can be implemented by only those datastore implementations that actually support this kind of Query effectiently, and not worry about the weird edge case of implementing range queries on those that don't make sense to. |
Unfortunately, no. Go has some pretty big holes in its composition story until we get generics. We'd:
1. Have to implement this for, e.g., the "Mount" datastore.
2. We'd then need to deal with some sub-datastores potentially not implementing it.
To be clear, I'd very much like this feature, someone just needs to submit patches to implement and test it.
I'd also be open to _removing_ some features of our query system that we frankly don't use to make this more approachable.
1. The ability to sort by anything but the key. No datastore supports this efficiently.
2. The ability to seek to some offset by _index_. This feature isn't currently used or even all that usable and would be made redundant by the proposed feature.
Together, these two "features" are responsible for most of the complexity in our query system, and are, as far as I know, completely unused.
Are you up for doing that?
|
Would this then need to be split up into two issues?
Or leave it as a single refactor + feature PR? I have a pretty good sense of how a Range might be implemented based on my existing usage and understanding of the package, along with your example gist for ranges. I would need to look a little more into the NaiveQuery implementation stuff to make sure I knew all the various elements to effectively and safely remove points #1 and #2 you suggested from the existing codebase. |
Hi gents!
Currently,
Query.Prefix
is used to do the initial seek when aQuery
begins, and iteration only continues while that prefix holds. There needs to be a flag that basically says, hey, don't usePrefix
once iteration has begun. Otherwise, there's no way to effectively do a seek and do an arbitrary amount of scanning. This seems like a critical feature to leave behind, yikes.Hopefully, I'm missing something!?
For example, in
go-ds-badger
: (line 270+)could be enhanced to:
Adding this would be easy, but all the impls would need to be updated of course. So maybe it's done as a selectively available feature?
This is a dealbreaker for us, but we love Datastore and want to use and support it! If some at PL agrees to accept our pull requests in a timely fashion, we'll help do the work. I'm sad we're now in this position, and would take less than a few mins for dbs like leveldb, badger, and bolt. What do y'all think?
We kindly need resolution on this relatively soon one way or the other so that we know we need to pass on Datastore.
Thanks,
Drew
The text was updated successfully, but these errors were encountered: