Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphQL] Add health endpoint #18277

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

stefan-mysten
Copy link
Contributor

@stefan-mysten stefan-mysten commented Jun 15, 2024

Description

Adds a checkpoint_timestamp_ms to the watermark task and uses it in a new health check endpoint function. The health endpoint checks for two things

  • if there is a DB connection, otherwise it returns code 500
  • if the last known checkpoint timestamp is within an acceptable buffer. It subtracts the current timestamp from the checkpoint timestamp, and checks if the value is larger than the provided query param max_checkpoint_lag_ms or a default value, and it returns code 504, GATEWAY TIMEOUT in that case.

How to query this endpoint:
curl -X GET "http://127.0.0.1:8000/health" -i
Set the check for max checkpoint time lag to 10s. If it returns 503, then the checkpoint is behind.
curl -X GET "http://127.0.0.1:8000/health?max_checkpoint_lag_ms=10000" -i

Test plan

Added a new test.

cargo nextest run --features pg_integration -- test_health_check


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:

Copy link

vercel bot commented Jun 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 25, 2024 0:29am

Copy link

vercel bot commented Jun 15, 2024

@stefan-mysten is attempting to deploy a commit to the Mysten Labs Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

crates/sui-graphql-rpc/src/server/builder.rs Outdated Show resolved Hide resolved
crates/sui-graphql-rpc/src/server/builder.rs Outdated Show resolved Hide resolved
crates/sui-graphql-rpc/src/server/builder.rs Outdated Show resolved Hide resolved
crates/sui-graphql-rpc/src/server/builder.rs Outdated Show resolved Hide resolved
crates/sui-graphql-rpc/src/server/builder.rs Outdated Show resolved Hide resolved
wlmyng added a commit that referenced this pull request Jun 25, 2024
)

## Description 

Unblock #18277 by explicitly
setting up the correct test scenario for the
`test_query_default_page_limit` test.

I think we can follow this up with 
1. instead of using the same db url, follow sui-graphql-e2e-tests
pattern and generate unique db urls for each test
2. this is so that we can run tests in parallel instead of sequentially
(which is the case today due to the serial)
3. we can leverage ExecutorCluster.cleanup_resources to cleanup each
test



## Test plan 

How did you test the new or updated feature?

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
wlmyng added a commit that referenced this pull request Jun 25, 2024
)

Unblock #18277 by explicitly
setting up the correct test scenario for the
`test_query_default_page_limit` test.

I think we can follow this up with
1. instead of using the same db url, follow sui-graphql-e2e-tests
pattern and generate unique db urls for each test
2. this is so that we can run tests in parallel instead of sequentially
(which is the case today due to the serial)
3. we can leverage ExecutorCluster.cleanup_resources to cleanup each
test

How did you test the new or updated feature?

---

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol:
- [ ] Nodes (Validators and Full nodes):
- [ ] Indexer:
- [ ] JSON-RPC:
- [ ] GraphQL:
- [ ] CLI:
- [ ] Rust SDK:
wlmyng added a commit that referenced this pull request Jun 25, 2024
)

Unblock #18277 by explicitly
setting up the correct test scenario for the
`test_query_default_page_limit` test.

I think we can follow this up with
1. instead of using the same db url, follow sui-graphql-e2e-tests
pattern and generate unique db urls for each test
2. this is so that we can run tests in parallel instead of sequentially
(which is the case today due to the serial)
3. we can leverage ExecutorCluster.cleanup_resources to cleanup each
test

How did you test the new or updated feature?

---

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol:
- [ ] Nodes (Validators and Full nodes):
- [ ] Indexer:
- [ ] JSON-RPC:
- [ ] GraphQL:
- [ ] CLI:
- [ ] Rust SDK:
wlmyng added a commit that referenced this pull request Jun 25, 2024
…ault_page_limit (#18… (#18394)

…393)

Unblock #18277 by explicitly
setting up the correct test scenario for the
`test_query_default_page_limit` test.

I think we can follow this up with
1. instead of using the same db url, follow sui-graphql-e2e-tests
pattern and generate unique db urls for each test
2. this is so that we can run tests in parallel instead of sequentially
(which is the case today due to the serial)
3. we can leverage ExecutorCluster.cleanup_resources to cleanup each
test

How did you test the new or updated feature?

---

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol:
- [ ] Nodes (Validators and Full nodes):
- [ ] Indexer:
- [ ] JSON-RPC:
- [ ] GraphQL:
- [ ] CLI:
- [ ] Rust SDK:

## Description 

Describe the changes or additions included in this PR.

## Test plan 

How did you test the new or updated feature?

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
@stefan-mysten stefan-mysten merged commit f645673 into MystenLabs:main Jun 25, 2024
40 of 43 checks passed
@stefan-mysten stefan-mysten deleted the gql_health_endpoint branch June 25, 2024 02:28
stefan-mysten added a commit to stefan-mysten/sui that referenced this pull request Jun 25, 2024
## Description 

Adds a `checkpoint_timestamp_ms` to the watermark task and uses it in a
new health check endpoint function. The health endpoint checks for two
things
- if there is a DB connection, otherwise it returns code 500
- if the last known checkpoint timestamp is within an acceptable buffer.
It subtracts the current timestamp from the checkpoint timestamp, and
checks if the value is larger than the provided query param
`max_checkpoint_lag_ms` or a default value, and it returns code 504,
GATEWAY TIMEOUT in that case.

How to query this endpoint:
`curl -X GET "http://127.0.0.1:8000/health" -i `
Set the check for max checkpoint time lag to 10s. If it returns 503,
then the checkpoint is behind.
`curl -X GET "http://127.0.0.1:8000/health?max_checkpoint_lag_ms=10000"
-i`

## Test plan 

Added a new test.

`cargo nextest run --features pg_integration -- test_health_check`

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
stefan-mysten added a commit that referenced this pull request Jun 25, 2024
…24.4 release (#18399)

## Description 

Adds a `checkpoint_timestamp_ms` to the watermark task and uses it in a
new health check endpoint function. The health endpoint checks for two
things
- if there is a DB connection, otherwise it returns code 500
- if the last known checkpoint timestamp is within an acceptable buffer.
It subtracts the current timestamp from the checkpoint timestamp, and
checks if the value is larger than the provided query param
`max_checkpoint_lag_ms` or a default value, and it returns code 504,
GATEWAY TIMEOUT in that case.

How to query this endpoint:
`curl -X GET "http://127.0.0.1:8000/health" -i `
Set the check for max checkpoint time lag to 10s. If it returns 503,
then the checkpoint is behind.
`curl -X GET "http://127.0.0.1:8000/health?max_checkpoint_lag_ms=10000"
-i`

## Test plan 

Added a new test.

`cargo nextest run --features pg_integration -- test_health_check`

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
wlmyng added a commit that referenced this pull request Jun 25, 2024
…fault_page_limit (#18… (#18395)

…393)

Unblock #18277 by explicitly
setting up the correct test scenario for the
`test_query_default_page_limit` test.

I think we can follow this up with
1. instead of using the same db url, follow sui-graphql-e2e-tests
pattern and generate unique db urls for each test
2. this is so that we can run tests in parallel instead of sequentially
(which is the case today due to the serial)
3. we can leverage ExecutorCluster.cleanup_resources to cleanup each
test

How did you test the new or updated feature?

---

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol:
- [ ] Nodes (Validators and Full nodes):
- [ ] Indexer:
- [ ] JSON-RPC:
- [ ] GraphQL:
- [ ] CLI:
- [ ] Rust SDK:

## Description 

Describe the changes or additions included in this PR.

## Test plan 

How did you test the new or updated feature?

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
stefan-mysten added a commit to stefan-mysten/sui that referenced this pull request Jul 9, 2024
Adds a `checkpoint_timestamp_ms` to the watermark task and uses it in a
new health check endpoint function. The health endpoint checks for two
things
- if there is a DB connection, otherwise it returns code 500
- if the last known checkpoint timestamp is within an acceptable buffer.
It subtracts the current timestamp from the checkpoint timestamp, and
checks if the value is larger than the provided query param
`max_checkpoint_lag_ms` or a default value, and it returns code 504,
GATEWAY TIMEOUT in that case.

How to query this endpoint:
`curl -X GET "http://127.0.0.1:8000/health" -i `
Set the check for max checkpoint time lag to 10s. If it returns 503,
then the checkpoint is behind.
`curl -X GET "http://127.0.0.1:8000/health?max_checkpoint_lag_ms=10000"
-i`

Added a new test.

`cargo nextest run --features pg_integration -- test_health_check`

---

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol:
- [ ] Nodes (Validators and Full nodes):
- [ ] Indexer:
- [ ] JSON-RPC:
- [ ] GraphQL:
- [ ] CLI:
- [ ] Rust SDK:
tx-tomcat pushed a commit to tx-tomcat/sui-network that referenced this pull request Jul 29, 2024
…tenLabs#18393)

## Description 

Unblock MystenLabs#18277 by explicitly
setting up the correct test scenario for the
`test_query_default_page_limit` test.

I think we can follow this up with 
1. instead of using the same db url, follow sui-graphql-e2e-tests
pattern and generate unique db urls for each test
2. this is so that we can run tests in parallel instead of sequentially
(which is the case today due to the serial)
3. we can leverage ExecutorCluster.cleanup_resources to cleanup each
test



## Test plan 

How did you test the new or updated feature?

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
tx-tomcat pushed a commit to tx-tomcat/sui-network that referenced this pull request Jul 29, 2024
## Description 

Adds a `checkpoint_timestamp_ms` to the watermark task and uses it in a
new health check endpoint function. The health endpoint checks for two
things
- if there is a DB connection, otherwise it returns code 500
- if the last known checkpoint timestamp is within an acceptable buffer.
It subtracts the current timestamp from the checkpoint timestamp, and
checks if the value is larger than the provided query param
`max_checkpoint_lag_ms` or a default value, and it returns code 504,
GATEWAY TIMEOUT in that case.

How to query this endpoint:
`curl -X GET "http://127.0.0.1:8000/health" -i `
Set the check for max checkpoint time lag to 10s. If it returns 503,
then the checkpoint is behind.
`curl -X GET "http://127.0.0.1:8000/health?max_checkpoint_lag_ms=10000"
-i`

## Test plan 

Added a new test.

`cargo nextest run --features pg_integration -- test_health_check`

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants