Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORM: DeleteRange deletes entries with nil index fields #11980

Closed
4 tasks
technicallyty opened this issue May 17, 2022 · 22 comments · Fixed by #15138
Closed
4 tasks

ORM: DeleteRange deletes entries with nil index fields #11980

technicallyty opened this issue May 17, 2022 · 22 comments · Fixed by #15138

Comments

@technicallyty
Copy link
Contributor

Summary of Bug

Given a table that generates a pointer field, and creating an ORM index on that field:

  • Executing a Get will fetch the object with the field as nil
  • Executing a DeleteRange with a 0'd value index, will delete any entry with the corresponding field as nil

Version

orm/v1.0.0-alpha.12 and all before

Steps to Reproduce

  1. create a table with one field as a separate message (whether user defined, or an imported message like timestamp, coin, etc) and make an index on that table
  2. Insert an object into the table with that field as nil
  3. Execute a DeleteRange on the table with the from as the default/zero value of the generated struct (your user defined message struct, or the corresponding import i.e. timestamppb.Timestamp{}.
  4. attempt to get the entry, should return an error.

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@aaronc
Copy link
Member

aaronc commented May 17, 2022

Is this mainly about time stamps and durations? Not sure I totally understand what's happening here

@technicallyty
Copy link
Contributor Author

Is this mainly about time stamps and durations? Not sure I totally understand what's happening here

Yeah the bug was found with a timestamp index field, but i assumed it extend to any field that generates a pointer to a struct similar to timestamps/durations.

basically, nil timestamps are getting treated as 0 value timestamps. not sure if thats intentional actually 🤷🏻

@aaronc
Copy link
Member

aaronc commented May 19, 2022

Is this mainly about time stamps and durations? Not sure I totally understand what's happening here

Yeah the bug was found with a timestamp index field, but i assumed it extend to any field that generates a pointer to a struct similar to timestamps/durations.

basically, nil timestamps are getting treated as 0 value timestamps. not sure if thats intentional actually 🤷🏻

Yep, nil is treated as zero value. Does that seem incorrect?

@aaronc
Copy link
Member

aaronc commented May 19, 2022

There is no explicit handling of nil values, however. Should we add it for timestamp and duration? I'm not sure what the alternative would be. There is no special encoding of nil timestamps and durations - just 0 currently. We could add an extra bit for that if you think it's a problem but it would be breaking

@technicallyty
Copy link
Contributor Author

There is no explicit handling of nil values, however. Should we add it for timestamp and duration? I'm not sure what the alternative would be. There is no special encoding of nil timestamps and durations - just 0 currently. We could add an extra bit for that if you think it's a problem but it would be breaking

I mostly was just not expecting it. I'd be curious how others feel about it before we try adding the extra. Maybe its just me 🤷🏻 If anything, some docs explaining this behavior would be good

@aaronc
Copy link
Member

aaronc commented May 19, 2022

There is no explicit handling of nil values, however. Should we add it for timestamp and duration? I'm not sure what the alternative would be. There is no special encoding of nil timestamps and durations - just 0 currently. We could add an extra bit for that if you think it's a problem but it would be breaking

I mostly was just not expecting it. I'd be curious how others feel about it before we try adding the extra. Maybe its just me 🤷🏻 If anything, some docs explaining this behavior would be good

Might be useful but we'll need to think about encoding.

Timestamp's already take up 12 bytes in keys so this would mean they take 13 bytes. There probably is a more efficient encoding however since these are the rules:
image

nanos need a max of 30 bits (to hold 999,999,999) and are supposed to be unsigned so we could use the compact unsigned encoding potentially.

seconds would go from -62135579038 to 253402318799. I think we can encode that in 39 bits.

So there's probably an encoding that can use between 5 to 13 bytes (depending on whether nanos are used).

And then we'll fail on any timestamps that fall outside of this range, and allow nils to be a true 0/unspecified value.

Currently treating nil as zero has the effect of causing it to mean 1970-01-01 which is sort of wrong.

What do you think? Should we do this?

@aaronc
Copy link
Member

aaronc commented Jun 14, 2022

@technicallyty I'm thinking we should move forward with this and try to get it in soon. One question is should nils get sorted first or last?

@aaronc
Copy link
Member

aaronc commented Jun 14, 2022

I'm thinking maybe nil time stamps should get sorted last.

Is it safe to treat nil durations as 0?

@technicallyty
Copy link
Contributor Author

technicallyty commented Jun 14, 2022

https://learnsql.com/blog/how-to-order-rows-with-nulls/

looks like theres not really a standard between db's, so either way is prob fine

Is it safe to treat nil durations as 0?

that seems at odds with the proposed timestamp change tbh

@aaronc
Copy link
Member

aaronc commented Jun 14, 2022

My thinking is that timestamps are often used for expiration so nil expiration means forever which would go at the end. Although I'm sure there's some counter-example where it's the opposite.

For durations, I'm thinking that omitting a value is equivalent to saying 0. It would seem strange that nil durations go either at the beginning, before negative durations or at the end after the longest duration. If we were going to pick one order though, I would seem nil's at the beginning which is I guess okay but it would be strange if durations have nil's first and timestamps have nil's last.

Also, we probably need to have special values of from and to in ListRange to mean start end end because nil has its own meaning.

@technicallyty
Copy link
Contributor Author

maybe we could add an option and just let the dev define ordering? something like nil_order: asc or nil_order: desc?

@aaronc
Copy link
Member

aaronc commented Jun 14, 2022

maybe we could add an option and just let the dev define ordering? something like nil_order: asc or nil_order: desc?

I think that would be good to add in the future. I'd like to start with a sane default to begin with. Seems like nil duration as zero could be one of the options and maybe default

@aaronc
Copy link
Member

aaronc commented Jun 14, 2022

Here's a proposed encoding that both handles nils correctly and is significantly more compact than our current Timestamp and Duration encodings which always require 12 bytes. With the encodings below, Timestamps generally will take 6 bytes (unless there are nanos and then may take up to 9 bytes) and Durations will take between 1 to 10 bytes.

Timestamp

  • []byte{0xFF} is a special value indicating nil and comes after any actual timestamps
  • seconds can range from 0001-01-01T00:00:00Z (-62135579038 seconds before epoch) to 9999-12-31T23:59:59Z (253402318799 after epoch) inclusive
    • encoded as 5 fixed bytes with 0 representing 0001-01-01T00:00:00Z. We don’t do any compression of seconds like with compact uint’s because modern timestamps (after the epoch) are fairly large already in the seconds field so compression won’t help much
    • the maximum value for 9999-12-31T23:59:59Z would be 253402318799 + 62135579038 = 0x497786396D (or []byte{0x49, 0x77, 0x86, 0x39, 0x6D}
    • any values over 9999-12-31T23:59:59Z are rejected such that all seconds values are < []byte{0xC0} with bytes ordering
  • nanos can range from 0 to 999,999,999 inclusive which can be represented in 30 bits or less. There is little real world need for compression because small values are uncommon and we can keep the implementation simpler
    • []byte{0x0} represents zero nanos so that timestamps without nano seconds are represented as a fixed 6 bytes
    • non-zero values are represented as 4 fixed bytes with the bit mask 0xC0 applied to the first byte (the first two bits set to 11)

Duration

  • []byte{0xFF} is a special value indicating nil and comes after any actual durations
  • seconds can range from -315,576,000,000 to +315,576,000,000 inclusive which can be represented in 39 bits
  • nanos can range from 0 to +/-999,999,999 depending on the sign of seconds, these can be represented in 30 bits
  • seconds encoded as 5 fixed bytes with 0 representing -315,576,000,000
  • nanos are encoded identically as with Timestamps ([]byte{0x0} for 0, four fixed bytes with the bit mask 0xC0 applied to the first byte for other values)

@aaronc
Copy link
Member

aaronc commented Jun 15, 2022

Started working on a fix in #12273

@aaronc
Copy link
Member

aaronc commented Feb 8, 2023

In discussing this with @testinginprod, an alternative is to simply reject nil values in timestamp and duration indexes. This would require state machines that accept nil values to convert these to either to min or max timestamp or duration depending on the use case. In the case of expiration dates where nil means don't expire, the state machine would need to convert this to the max acceptable value for timestamp and then it would get ordered at the end. This would avoid needing to deal with whether nil values get ordered first or list, however it does place additional burden on app developers and could result in unexpected runtime errors if apps simply assume nil is valid.

@ryanchristo I'm wondering if you're able to give some feedback as you're familiar with the case where this came up in practice.

A variation of the encodings above could still be useful for reducing storage space, but that could be a separate issue.

@ryanchristo
Copy link
Contributor

I'm wondering if you're able to give some feedback as you're familiar with the case where this came up in practice.

The specific issue we came across was with timestamps used for expiration. In our x/ecocredit module, we prune sell orders when they have expired and some sell orders do not have an expiration (i.e. expiration is nil).

The workaround we implemented was to set the minimum range to 1ns rather than nil, which prevents us from deleting sell orders with a nil expiration (see here). Maybe this is ok as is but this is what prompted the issue.

The current workaround might be easier and more natural than expecting app developers to convert to the max acceptable value for timestamp. I don't have a strong preference and would be ok with leaving the workaround we currently have.

@aaronc
Copy link
Member

aaronc commented Feb 8, 2023

Thanks @ryanchristo. Am I correct in assuming that with the current solution in #12273, that nils last (meaning the farthest data in the future) is acceptable for this each case of expiration?

I wonder if there are any cases where we'd want nils first...

@ryanchristo
Copy link
Contributor

ryanchristo commented Feb 8, 2023

I think nil as last makes sense for this case. The only other time nil is allowed for a timestamp in the Regen Ledger codebase is a min_start_date field in the date criteria of a basket (see here).

@aaronc
Copy link
Member

aaronc commented Feb 23, 2023

I'm not sure whether the duration encoding proposed is quite what we want. It proposes that nil durations come after negative durations and before zero and positive durations...

It seems like it might be better to do the same thing we do for timestamps and have nil durations get ordered last after all durations. Or we could reject nil durations altogether, however, I think that given we decided to accept nil timestamps, that behavior would be somewhat strange.

Any thoughts @ryanchristo or @testinginprod ?

@testinginprod
Copy link
Contributor

If the decision for timestamp is to put nil at the end (and not to reject it), then we should apply it to duration too, for consistency.

@ryanchristo
Copy link
Contributor

I agree. We should probably maintain consistency.

@aaronc
Copy link
Member

aaronc commented Feb 23, 2023

I've updated the proposal to order nil values at the end for durations, and also greatly simplify the encoding and make it more similar to timestamps (taking up 1, 6 or 9 bytes). Let me know if that looks okay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants