Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sui-tool] dump-packages uses GraphQL #18337

Merged
merged 2 commits into from
Aug 19, 2024
Merged

Conversation

amnn
Copy link
Contributor

@amnn amnn commented Jun 20, 2024

Description

Replace the original implementation of the dump-packages command (which requires access to an indexer database) with an implementation that reads from a GraphQL service. The former is not readily accessible, but the latter should be.

The new tool is also able to run incrementally: Fetching only packages created before a certain checkpoint, or pick up where a previous invocation took off to fetch new packages that were introduced since.

Test plan

Ran a test invocation, on our experimental read replica. With a max page size of 200, I was able to fetch 17000 packages (all the packages at the time the read replica was created) in 3 minutes.

Stack


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:

@amnn amnn self-assigned this Jun 20, 2024
Copy link

vercel bot commented Jun 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 19, 2024 11:12am
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Aug 19, 2024 11:12am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Aug 19, 2024 11:12am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Aug 19, 2024 11:12am


#[derive(cynic::QueryFragment, Debug)]
pub(crate) struct ServiceConfig {
pub(crate) max_page_size: i32,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cynic (the GraphQL client library that I'm using here) maps Int to i32 -- this is technically correct, according to the spec. We use it to represent values that are less than 2^53 because that's what we can generally rely on being transferred faithfully by JSON across languages, and async-graphql lets us do this as well (maps u64 to Int).

This makes me think that we need to introduce a new scalar, like UInt64 (...or UInt53, or UInt?) that maps to a JSON numeric literal but that we can give this meaning. cc @hayes-mysten as our resident expert -- this is not something that we need for JS' sake, but other strongly typed languages are likely to face this issue, and the overhead of this extra scalar seems minimal to accommodate that.

}

/// Query types related to fetching packages.
pub(crate) mod packages {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you hand-roll these or did cynic generate these for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hand-rolled these, although cynic does have a builder UI: https://generator.cynic-rs.dev/

Grouping the types into their own modules, and adding a build function are my own additions, but otherwise I think you'll get something quite similar from the builder.

/// existing empty directory (in which case it will be filled), or an existing directory that has
/// been written to in the past (in which case this invocation will pick back up from where the
/// previous invocation left off).
pub async fn dump(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, in other words, dump all the packages that have been created by before_checkpoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right -- the reason why it's phrased more "passively" (that we're ensuring they exist) is that we use the last-checkpoint file to avoid refetching packages (e.g. if you ran the tool once and then you run it again in the same directory, it will pick up where it left off).

amnn added a commit that referenced this pull request Jul 7, 2024
## Description

Whilst working on #18337, I noticed that we were over-using the `Int`
scalar -- using it to represent values that could exceed 2^31 - 1 --
when the GraphQL spec states that `Int` must be a 32-bit signed
integer.

We made this decision at the time (a) because `async-graphql` allowed
converting `u64`s to `Int` and we were primarily concerned with the
fact that although JSON doesn't specify a precision for its numeric
types, JS (among other languages), assumes it is an IEEE
double-precision floating point number, so can only represent integral
values precisely below 2^53.

`cynic` (a Rust GraphQL client library) is (correctly) stricter,
however, and maps an `Int` to an `i32`, always. There may be other
similarly strict client libraries for other languages.

This PR introduces a new scalar, `UInt`, that maps to a JSON number
literal, just like `Int`, but allows us to ascribe our own meaning (in
this case, it will be an unsigned number, and it can be as large as
2^53).

This scalar has been used in many cases where we had previously used
`Int`: sequence numbers, counts of objects, checkpoints, transactions,
etc. While other uses continue to use `Int` (pagination limits,
service limits, values bounded by the number of validators).

In some cases, we have switched from `BigInt` to using this
scalar notably:

- the db cost estimate, which was previously a `BigInt` because we
  were unsure of its scale, but in hindsight, after benchmarking, it
  is unlikely that we would want to set a limit greater than 2^31 - 1.

- the number of checkpoints in an epoch, as the number of transactions
  in an epoch (a number that is guaranteed to be greater) is being
  represented using an `Int` at the moment (and soon a `UInt`).

This will be a breaking change, so will only go out with the new major
version. Hopefully, this change will be minimal as the format of this
scalar over the wire is the same as for `Int`, but it will require
existing clients to register a new scalar in most cases.

## Test plan

Existing tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-e2e-tests$ cargo nextest run --features pg_integration
```
@amnn amnn mentioned this pull request Jul 8, 2024
7 tasks
amnn added a commit that referenced this pull request Jul 13, 2024
## Description

Whilst working on #18337, I noticed that we were over-using the `Int`
scalar -- using it to represent values that could exceed 2^31 - 1 --
when the GraphQL spec states that `Int` must be a 32-bit signed
integer.

We made this decision at the time (a) because `async-graphql` allowed
converting `u64`s to `Int` and we were primarily concerned with the
fact that although JSON doesn't specify a precision for its numeric
types, JS (among other languages), assumes it is an IEEE
double-precision floating point number, so can only represent integral
values precisely below 2^53.

`cynic` (a Rust GraphQL client library) is (correctly) stricter,
however, and maps an `Int` to an `i32`, always. There may be other
similarly strict client libraries for other languages.

This PR introduces a new scalar, `UInt`, that maps to a JSON number
literal, just like `Int`, but allows us to ascribe our own meaning (in
this case, it will be an unsigned number, and it can be as large as
2^53).

This scalar has been used in many cases where we had previously used
`Int`: sequence numbers, counts of objects, checkpoints, transactions,
etc. While other uses continue to use `Int` (pagination limits,
service limits, values bounded by the number of validators).

In some cases, we have switched from `BigInt` to using this
scalar notably:

- the db cost estimate, which was previously a `BigInt` because we
  were unsure of its scale, but in hindsight, after benchmarking, it
  is unlikely that we would want to set a limit greater than 2^31 - 1.

- the number of checkpoints in an epoch, as the number of transactions
  in an epoch (a number that is guaranteed to be greater) is being
  represented using an `Int` at the moment (and soon a `UInt`).

This will be a breaking change, so will only go out with the new major
version. Hopefully, this change will be minimal as the format of this
scalar over the wire is the same as for `Int`, but it will require
existing clients to register a new scalar in most cases.

## Test plan

Existing tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-e2e-tests$ cargo nextest run --features pg_integration
```
amnn added a commit that referenced this pull request Jul 13, 2024
## Description

Whilst working on #18337, I noticed that we were over-using the `Int`
scalar -- using it to represent values that could exceed 2^31 - 1 --
when the GraphQL spec states that `Int` must be a 32-bit signed integer.

We made this decision at the time (a) because `async-graphql` allowed
converting `u64`s to `Int` and we were primarily concerned with the fact
that although JSON doesn't specify a precision for its numeric types, JS
(among other languages), assumes it is an IEEE double-precision floating
point number, so can only represent integral values precisely below
2^53.

`cynic` (a Rust GraphQL client library) is (correctly) stricter,
however, and maps an `Int` to an `i32`, always. There may be other
similarly strict client libraries for other languages.

This PR introduces a new scalar, `UInt`, that maps to a JSON number
literal, just like `Int`, but allows us to ascribe our own meaning (in
this case, it will be an unsigned number, and it can be as large as
2^53).

This scalar has been used in many cases where we had previously used
`Int`: sequence numbers, counts of objects, checkpoints, transactions,
etc. While other uses continue to use `Int` (pagination limits, service
limits, values bounded by the number of validators).

In some cases, we have switched from `BigInt` to using this scalar
notably:

- the db cost estimate, which was previously a `BigInt` because we were
unsure of its scale, but in hindsight, after benchmarking, it is
unlikely that we would want to set a limit greater than 2^31 - 1.

- the number of checkpoints in an epoch, as the number of transactions
in an epoch (a number that is guaranteed to be greater) is being
represented using an `Int` at the moment (and soon a `UInt53`).

This will be a breaking change, so will only go out with the new major
version. Hopefully, this change will be minimal as the format of this
scalar over the wire is the same as for `Int`, but it will require
existing clients to register a new scalar in most cases.

## Test plan

Existing tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-e2e-tests$ cargo nextest run --features pg_integration
```

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [x] GraphQL: Introduces a new scalar -- `UInt53` -- to represent
unsigned 53 bit integer values. Some uses of `Int` in the existing
schema have been replaced with `UInt53`. All clients will need to
register the new scalar and clients for statically typed languages will
also need to use a wider (e.g. 64 bit), unsigned type to hold the value.
- [ ] CLI: 
- [ ] Rust SDK:
tx-tomcat pushed a commit to tx-tomcat/sui-network that referenced this pull request Jul 29, 2024
## Description

Whilst working on MystenLabs#18337, I noticed that we were over-using the `Int`
scalar -- using it to represent values that could exceed 2^31 - 1 --
when the GraphQL spec states that `Int` must be a 32-bit signed integer.

We made this decision at the time (a) because `async-graphql` allowed
converting `u64`s to `Int` and we were primarily concerned with the fact
that although JSON doesn't specify a precision for its numeric types, JS
(among other languages), assumes it is an IEEE double-precision floating
point number, so can only represent integral values precisely below
2^53.

`cynic` (a Rust GraphQL client library) is (correctly) stricter,
however, and maps an `Int` to an `i32`, always. There may be other
similarly strict client libraries for other languages.

This PR introduces a new scalar, `UInt`, that maps to a JSON number
literal, just like `Int`, but allows us to ascribe our own meaning (in
this case, it will be an unsigned number, and it can be as large as
2^53).

This scalar has been used in many cases where we had previously used
`Int`: sequence numbers, counts of objects, checkpoints, transactions,
etc. While other uses continue to use `Int` (pagination limits, service
limits, values bounded by the number of validators).

In some cases, we have switched from `BigInt` to using this scalar
notably:

- the db cost estimate, which was previously a `BigInt` because we were
unsure of its scale, but in hindsight, after benchmarking, it is
unlikely that we would want to set a limit greater than 2^31 - 1.

- the number of checkpoints in an epoch, as the number of transactions
in an epoch (a number that is guaranteed to be greater) is being
represented using an `Int` at the moment (and soon a `UInt53`).

This will be a breaking change, so will only go out with the new major
version. Hopefully, this change will be minimal as the format of this
scalar over the wire is the same as for `Int`, but it will require
existing clients to register a new scalar in most cases.

## Test plan

Existing tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-e2e-tests$ cargo nextest run --features pg_integration
```

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [x] GraphQL: Introduces a new scalar -- `UInt53` -- to represent
unsigned 53 bit integer values. Some uses of `Int` in the existing
schema have been replaced with `UInt53`. All clients will need to
register the new scalar and clients for statically typed languages will
also need to use a wider (e.g. 64 bit), unsigned type to hold the value.
- [ ] CLI: 
- [ ] Rust SDK:
@amnn amnn requested review from a team and suiwombat as code owners August 19, 2024 10:33
## Description

Add a command for generating a `config.toml` file for the GraphQL
service with all its parameters set to their default values.

## Test plan

```
cargo run --bin sui-graphql-rpc -- generate-config /tmp/config.toml
```
## Description

Replace the original implementation of the dump-packages
command (which requires access to an indexer database) with an
implementation that reads from a GraphQL service. The former is not
readily accessible, but the latter should be.

The new tool is also able to run incrementally: Fetching only packages
created before a certain checkpoint, or pick up where a previous
invocation took off to fetch new packages that were introduced since.

## Test plan

Ran a test invocation, on our experimental read replica. With a max
page size of 200, I was able to fetch 17000 packages (all the packages
at the time the read replica was created) in 3 minutes.
Base automatically changed from amnn/gql-gen-cfg to amnn/gql-obj-versions August 19, 2024 10:52
@amnn amnn merged commit 7f39fed into amnn/gql-obj-versions Aug 19, 2024
13 of 17 checks passed
@amnn amnn deleted the amnn/gql-pkg-dump branch August 19, 2024 10:53
amnn added a commit that referenced this pull request Aug 20, 2024
## Description

Replace the original implementation of the dump-packages command (which
requires access to an indexer database) with an implementation that
reads from a GraphQL service. The former is not readily accessible, but
the latter should be.

The new tool is also able to run incrementally: Fetching only packages
created before a certain checkpoint, or pick up where a previous
invocation took off to fetch new packages that were introduced since.

## Test plan

Ran a test invocation, on our experimental read replica. With a max page
size of 200, I was able to fetch 17000 packages (all the packages at the
time the read replica was created) in 3 minutes.

## Stack

- #17543
- #17692 
- #17693 
- #17696 
- #18287 
- #18288 
- #18336

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
amnn added a commit that referenced this pull request Aug 20, 2024
## Description

Replace the original implementation of the dump-packages command (which
requires access to an indexer database) with an implementation that
reads from a GraphQL service. The former is not readily accessible, but
the latter should be.

The new tool is also able to run incrementally: Fetching only packages
created before a certain checkpoint, or pick up where a previous
invocation took off to fetch new packages that were introduced since.

## Test plan

Ran a test invocation, on our experimental read replica. With a max page
size of 200, I was able to fetch 17000 packages (all the packages at the
time the read replica was created) in 3 minutes.

## Stack

- #17543
- #17692 
- #17693 
- #17696 
- #18287 
- #18288 
- #18336

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
tx-tomcat pushed a commit to tx-tomcat/sui-network that referenced this pull request Aug 27, 2024
## Description

Replace the original implementation of the dump-packages command (which
requires access to an indexer database) with an implementation that
reads from a GraphQL service. The former is not readily accessible, but
the latter should be.

The new tool is also able to run incrementally: Fetching only packages
created before a certain checkpoint, or pick up where a previous
invocation took off to fetch new packages that were introduced since.

## Test plan

Ran a test invocation, on our experimental read replica. With a max page
size of 200, I was able to fetch 17000 packages (all the packages at the
time the read replica was created) in 3 minutes.

## Stack

- MystenLabs#17543
- MystenLabs#17692 
- MystenLabs#17693 
- MystenLabs#17696 
- MystenLabs#18287 
- MystenLabs#18288 
- MystenLabs#18336

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
suiwombat pushed a commit that referenced this pull request Sep 16, 2024
## Description

Replace the original implementation of the dump-packages command (which
requires access to an indexer database) with an implementation that
reads from a GraphQL service. The former is not readily accessible, but
the latter should be.

The new tool is also able to run incrementally: Fetching only packages
created before a certain checkpoint, or pick up where a previous
invocation took off to fetch new packages that were introduced since.

## Test plan

Ran a test invocation, on our experimental read replica. With a max page
size of 200, I was able to fetch 17000 packages (all the packages at the
time the read replica was created) in 3 minutes.

## Stack

- #17543
- #17692 
- #17693 
- #17696 
- #18287 
- #18288 
- #18336

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants