Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Proper working cursor #2514

Merged
merged 17 commits into from
Aug 28, 2024
Merged

Conversation

another-rex
Copy link
Contributor

The current paging/cursor implementation is slightly hacky and fails at various edge cases, and for many query types does not work at all, e.g. (queries without ecosystem, semver queries...).

When initially implemented, the assumption was almost all API calls resulted in one datastore query. This is no longer the case.

This refactor/rework adds:

  • New cursor type
    • which can represent 3 states (Start, End, InProgress) in a type safe way (through enums, thanks for the ideas @michaelkedar @DonggeLiu!)
    • Also stores the ndb.Cursor inside
    • Also stores which query number the ndb.Cursor is for when there are multiple queries
    • can serialise to a page token, and deserialise in a backward compatible fashion.
  • Reworked logic to centralise cursor handling to make the code more readable and predicable:
    • Cursor state is stored in context, rather than passed through a tuple through return types
    • Query skips are determined next to each query.iter() function, where the datastore queries actually happens, rather than in outer functions.
  • Context now accounts for the number of datastore queries that has been performed
    • This is used to know which query a cursor belongs to
  • Minor refactors to query_generic_helper() to remove unnecessary arguments being passed in
  • Add a skipIf on env variable to allow actually running the pagination tests locally.
  • Modify the tests to run faster (by lowering the sleep duration haha),
  • Modify tests to take in unittest arguments of which tests to run from the second argument onwards (rather than having the gcloud service account always being the last argument)

Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

gcp/api/cursor.py Show resolved Hide resolved
gcp/api/cursor.py Outdated Show resolved Hide resolved
gcp/api/cursor.py Show resolved Hide resolved
gcp/api/cursor.py Show resolved Hide resolved
gcp/api/cursor.py Outdated Show resolved Hide resolved
gcp/api/cursor.py Outdated Show resolved Hide resolved
Copy link
Contributor

@andrewpollock andrewpollock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome

gcp/api/cursor.py Show resolved Hide resolved
gcp/api/cursor.py Show resolved Hide resolved
gcp/api/cursor.py Show resolved Hide resolved
gcp/api/cursor.py Show resolved Hide resolved
gcp/api/server.py Outdated Show resolved Hide resolved
gcp/api/server.py Outdated Show resolved Hide resolved
gcp/api/server.py Outdated Show resolved Hide resolved
gcp/api/server.py Outdated Show resolved Hide resolved
gcp/api/server.py Outdated Show resolved Hide resolved
gcp/api/server.py Outdated Show resolved Hide resolved
gcp/api/server.py Outdated Show resolved Hide resolved
# a token in the response
return None

if self.query_number == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that should be 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually removed this entirely, there is no reason to not return this as part of the page token.

Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor remaining comments. Awesome stuff!

_METADATA_SEPARATOR = ':'


class _QueryCursorState(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since it's the only thing that uses it , this could possibly go inside of QueryCursor:

class QueryCursor:
  class _State(Enum):
    ENDED = 0
    # ...
    
  _cursor_state: _State = _State.ENDED

  @property
  def ended(self) -> bool:
    return self._cursor_state == QueryCursor._State.ENDED

That said, I don't think I have a preference either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just a bit clearer to be private but at the top level to avoid having to write the class name repeatedly.

@another-rex another-rex merged commit 12f33e6 into google:master Aug 28, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants