-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs best effort responses #3865
base: master
Are you sure you want to change the base?
Conversation
We were missing quite a lot of relevant topics: how to discover canister endpoints, guidance on how to perform calls, attaching cycles, no explanation of why the async/await desugaring was important etc. I also now changed the tag to beginner, as the Rust/Motoko language-specific documents are also rated as "beginner". The previous discussion was not really motivated (why should the user care about desugaring?) and quite language dependent, so I removed most of it - the language dependent stuff should go into the language specific docs. Now added all this and also added the difference between best-effort and guaranteed response calls.
🤖 Here's your preview: https://tai3p-vaaaa-aaaam-abema-cai.icp0.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only looked at message-execution-properties.mdx
for now and added some nitpicks.
However, I can't help thinking that the topic could be explained more clearly. E.g. start with calls/tasks vs. message executions as we already do, but before we list any properties; just a couple of pictures with simplified sequence diagrams. That way, the difference between call/task and message execution would be immediately clear. It would also immediately make it clear why two requests Req1 and Req2 delivered in that order may produce two responses Rep2 and Rep1, in reverse order (even before the protocol may decide to deliver them in an arbitrary order).
Very likely not for this PR, though.
@@ -14,26 +14,26 @@ The [community conversation on security best practices](https://www.youtube.com/ | |||
|
|||
To interact with a canister's methods, there are two primary types of **calls** that can be used: [update](/references/ic-interface-spec.md#http-call) calls and [query](/docs/current/references/ic-interface-spec#http-query) calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of your change, but apart from mentioning updates and queries here and in the paragraph below, we never clarify anything about either of them.
I haven't followed the link, but by the looks of the URL (involving HTTP endpoints), it refers to "replicated updates" and "non-replicated queries". Which is even more confusing, as further down we say "Each downstream call that a canister makes, query or update, triggers a message." Which may imply that a replicated message execution can somehow make a non-replicated query call.
Maybe it's necessary to explicitly state that this page is all about replicated execution (which I guess it is, but I'm not 100% sure, because there's also composite queries, which behave somewhat similarly to replicated execution). And briefly explain the difference between replicated updates and replicated queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I agree with your points.
I think for this document it doesn't matter if we're talking about replicated or non-replicated execution, except for the part where we talk about committing state at the end of a message handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that to the extent that we define updates and queries on this page, updates are replicated and queries are non-replicated. So that later when we say that "a canister makes [a downstream] query call", the uninformed reader would interpret it as "a replicated update call can make a non-replicated query call".
But yeah, it's a nitpick, so feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed all the details about HTTP calls and focused on the endpoints - not sure if you're somehow looking at an old version of the PR...
However, a malicious destination canister could choose to delay the response for arbitrarily long if it is willing to put in the required cycles. Also, the response does not have to be successful, but can also be a reject response. The reject may come from the called canister, but it may also be generated by ICP. Such protocol-generated rejects can occur at any time before the call reaches the callee-canister, as well as once the call does reach the callee-canister, for example if the callee-canister traps while processing the call. Thus, it's important that the calling canister handles reject responses as well. A reject response means that the message hasn't been successfully processed by the receiver but doesn't guarantee that the receiver's state wasn't changed. | ||
|
||
- **Property 8**: If the calling canister made a guaranteed response (as opposed to a best-effort response) inter-canister call, and the call reaches the callee-canister which then produces a reply or reject response, the protocol guarantees that the callee-canister's generated reply or reject response gets back to the caller-canister. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is broad enough. I would put it in terms of "for guaranteed response calls there is always exactly one globally unique response to a call (whether from the callee or from the protocol)".
Or even "there is always one single globally unique response for every canister call (originating from either the callee or the protocol); if the call is a guaranteed response call, then this unique response is guaranteed to be delivered to the caller; if it was a best-effort call, then a SYS_UNKNOWN
response may be delivered instead (and potentially before)". Or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm not sure how useful it is to talk about the "real response generated by the system" in case of SYS_UNKNOWN
, as this real response is not something observable by the user. What is observable is that the callee produced a response but it didn't get delivered. So we can introduce the internal requests and responses (including system-generated responses) if it's the simplest mental model for the user of what's happening, but if there's a simple way to explain everything using just what's observable, I'd prefer that.
Not entirely related, but my original idea was to have a property that only applies to guaranteed response calls, and we could just say "best-effort responses lack this property". But thinking about it, it's probably useful to have a separate property about losing responses that we can refer to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that this property (of the caller being guaranteed to learn of the outcome of the call) applies more broadly than just for "reply or reject responses [produced by] the callee-canister". The system itself may also produce a reject response without making the call at all and this is also guaranteed to be delivered to the caller. And, importantly, there is always exactly one outcome for every call. So it's never the case that e.g. the system produces and delivers a reject response to the caller, but then somehow still delivers the request to the callee, which then produces a second response (that the system simply chooses to not deliver).
The difference in the best-effort response case is that the system may at any time, including before the call completes or even begins execution, produce a SYS_UNKNOWN
reject response and deliver it to the caller instead of the actual outcome (which is what would always be delivered for a guaranteed response call).
I.e. the former is always a closed loop, with no branches. Whereas the latter is an interrupted loop, where a SYS_UNKNOWN
reject response can be delivered at any time, while the call is potentially still ongoing or in the future (meaning that the caller can never find out the outcome in this way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The system itself may also produce a reject response without making the call at all and this is also guaranteed to be delivered to the caller. And, importantly, there is always exactly one outcome for every call.
This is covered by Property 7. This property just says that, for guaranteed response calls, if the caller actually produces a response, the user will get that response.
The difference in the best-effort response case is that the system may at any time, including before the call completes or even begins execution, produce a SYS_UNKNOWN reject response
Of course it's also true that for guaranteed response calls, if the system produces some response, only that response will be delivered to the user, which is not true for best-effort responses, and what you write is an entirely accurate description how the system works. What I'm saying is that in general, the user doesn't need to know how the system works. We just have to give them the simplest correct description of the behavior that they can observe. It might well be that the simplest explanation is to tell them how the system really works. I tried providing a simple description without going into the system details but it's quite possible that I failed. I hope my intention is clear now - if it was clear all along and you feel that it's too complicated or not precise enough, I will switch to what you proposed, otherwise take another look and LMK what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what's missing is to say something like "the call is only deliver at most once" in Property 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found it unnecessarily specific. IMHO there is no need to qualify this with "if the call is delivered to the callee and the callee responds without trapping".
But yeah, if you want to describe strictly the observable properties of the system, then this is more accurate. So it's entirely your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left feedback on everything but message-execution-properties.md
as Alin already made a pass over this one. I'll make a pass once you've addressed Alin's comments.
|
||
## Overview of language runtime for asynchronous code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re "true" response from the callee
: Strictly speaking, it is also not guaranteed in the current messaging model that the response will come from the callee. The response could also be generated by the system if the request couldn't be delivered to the callee.
The property the current messaging model provides is that the response is globally unique, i.e., if the system produces a reject one can conclude that the callee never saw the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re "true" response from the callee: Strictly speaking, it is also not guaranteed in the current messaging model that the response will come from the callee. The response could also be generated by the system if the request couldn't be delivered to the callee.
True, I dropped the "from the callee" part.
The property the current messaging model provides is that the response is globally unique, i.e., if the system produces a reject one can conclude that the callee never saw the request.
Well technically a CANISTER_ERROR
is also produced by the system but it means that the callee has seen the request (whether any mutation was made on the callee side is anyone's guess, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I dropped the "from the callee" part.
FWIW, I still see
in this mode the "true" response from the callee is not guaranteed to be delivered
Maybe you just didn't push your change yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed the change, I see below:
but in this mode the "true" response is not guaranteed to be delivered, and can be masked by a system generated error response instead.
Not sure what's going on 🤷
|
||
When canister A calls B, a future that represents the result of the call is created. The future calls [the system API](/docs/current/references/ic-interface-spec#system-api-call) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should already mention that with best effort messaging the cycles are dropped if the message is dropped. It might be too early to mention this in detail, but maybe one can provide a link to the next section where the difference between best-effort and guaranteed response calls is discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I added an info box that cycles may be dropped and a forward pointer to the section where it's discussed in more detail
|
||
ICP supports two kind of inter-canister calls: | ||
|
||
* Guaranteed response calls provide the caller with the guarantee that, if the callee produces a response (which may be unsuccessful, i.e., an error during call processing), that exact response will be delivered to the caller. Furthermore, if the request isn't successfully delivered to the callee (which can happen during high load, callee running out of cycles, and other reasons), the response will notify the caller of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention here that the system ensures that there is only ever one response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
The other way I think of it (and I've mentioned this in an earlier comment) is that guaranteed response calls are closed loops: the request goes out (potentially all the way to the callee, potentially not very far); one response is produced from it (whether by the callee or by the system); that response is always returned to the caller (whether this takes until the heat death of the universe or just before).
Best-effort calls address both of these problems. However, they can require more effort on the part of the caller to handle the additional error case. Here are some guidelines how to choose between the two types of calls: | ||
|
||
* Always prefer best-effort response calls for calls that don't change the state of the callee, i.e., reads. | ||
* For endpoints that change the state of the callee, the best practice is to make such endpoints amenable to [safe retries](/docs/current/developer-docs/smart-contracts/best-practices/idempotency), since this is also required for safely handling external user calls. Use best-effort responses calls for such endpoints, and handle the additional edge cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* For endpoints that change the state of the callee, the best practice is to make such endpoints amenable to [safe retries](/docs/current/developer-docs/smart-contracts/best-practices/idempotency), since this is also required for safely handling external user calls. Use best-effort responses calls for such endpoints, and handle the additional edge cases. | |
* For endpoints that change the state of the callee, the best practice is to make such endpoints amenable to [safe retries](/docs/current/developer-docs/smart-contracts/best-practices/idempotency), similar to the one that is required for safely handling external user calls. Use best-effort responses calls for such endpoints, and handle the additional edge cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded this now more thoroughly to make it clear that we recommend doing that for ingress messages anyways.
|
||
* Guaranteed response calls provide the caller with the guarantee that, if the callee produces a response (which may be unsuccessful, i.e., an error during call processing), that exact response will be delivered to the caller. Furthermore, if the request isn't successfully delivered to the callee (which can happen during high load, callee running out of cycles, and other reasons), the response will notify the caller of this. | ||
|
||
* Best-effort response calls drop this guarantee. They allow the caller to specify a timeout (which is capped from above by the system to some maximum value, such as 5 minutes), after which the system will return an error to the caller. The request may or may not be processed by the callee; the caller must, if necessary, determine whether the call took place or not by some other mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be very explicit here that dropping a message means dropping the attached cycles as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
|
||
* Always prefer best-effort response calls for calls that don't change the state of the callee, i.e., reads. | ||
* For endpoints that change the state of the callee, the best practice is to make such endpoints amenable to [safe retries](/docs/current/developer-docs/smart-contracts/best-practices/idempotency), since this is also required for safely handling external user calls. Use best-effort responses calls for such endpoints, and handle the additional edge cases. | ||
* Use guaranteed response calls for endpoints that mutate state and do not enable safe retries, or calls that perform larger cycle transfers. Best-effort response calls can currently lose cycles in edge cases. Be aware of the limitations of guaranteed response calls listed above. If safe upgrades are needed, consider using a stateless proxy canister (see the [security best practices](https://internetcomputer.org/docs/current/developer-docs/security/security-best-practices/inter-canister-calls) for more information). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would not say that best effort calls can lose cycles in edge cases as it makes it sound as if it was a bug in the system. I'd rather use the formulation that if a message is dropped also the attached cycles will be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, changed.
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
In contrast, Motoko takes another approach where every async/await call will be converted to an inter-canister call, more specifically a self-call | ||
in case a local function is called. This means that in Motoko `foo` would be split across 3 message executions -- the first would be until `baz` is | ||
called, the second until canister B is called and the final one until the end of the function body. | ||
A [separate document](https://internetcomputer.org/docs/current/references/message-execution-properties) contains more details on interleaving/commit points and message execution properties. Refer to the [security best practices](/docs/current/developer-docs/security/security-best-practices/inter-canister-calls) for advice on how to handle concurrency and state rollback issues correctly. Finally, refer to the CDK documentation for your language to learn more about desugaring of `async/await` and interleaving/commit points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also point to the security best practices for best-effort responses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a single security best practices doc for all types of inter-canister calls (as most points are shared between both call types), and we already link to it above.
docs/developer-docs/smart-contracts/best-practices/idempotency.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/best-practices/idempotency.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/best-practices/idempotency.mdx
Outdated
Show resolved
Hide resolved
Co-authored-by: Jessie Mongeon <133128541+jessiemongeon1@users.noreply.github.com>
Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>
Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>
Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>
Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>
Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>
@@ -14,26 +14,26 @@ The [community conversation on security best practices](https://www.youtube.com/ | |||
|
|||
To interact with a canister's methods, there are two primary types of **calls** that can be used: [update](/references/ic-interface-spec.md#http-call) calls and [query](/docs/current/references/ic-interface-spec#http-query) calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that to the extent that we define updates and queries on this page, updates are replicated and queries are non-replicated. So that later when we say that "a canister makes [a downstream] query call", the uninformed reader would interpret it as "a replicated update call can make a non-replicated query call".
But yeah, it's a nitpick, so feel free to ignore.
::: | ||
|
||
:::info | ||
Note that if the code does not `await` the response, the code after the callback is executed in the same message, until the next inter-canister call is triggered using `await`. | ||
Note that if the code does not `await` the response, the code after the callback is executed in the same message execution, until the next inter-canister call is triggered using `await`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I just find it awkward that we talk about a call or a callback without an await
. If there's no await
, nothing is called, you've merely created a future even though it's not obvious from the syntax.
IOW, leaving it at "the code after the await
is executed as a separate message execution (callback)" (which the paragraph above already says) is IMHO cleaner than adding "but if there's no await
, there's no callback". I would rather change this paragraph to say "note that it is actually the await
, not the thing that looks like a function call that triggers the canister call and marks the separation point between message executions". Or something to that effect.
However, a malicious destination canister could choose to delay the response for arbitrarily long if it is willing to put in the required cycles. Also, the response does not have to be successful, but can also be a reject response. The reject may come from the called canister, but it may also be generated by ICP. Such protocol-generated rejects can occur at any time before the call reaches the callee-canister, as well as once the call does reach the callee-canister, for example if the callee-canister traps while processing the call. Thus, it's important that the calling canister handles reject responses as well. A reject response means that the message hasn't been successfully processed by the receiver but doesn't guarantee that the receiver's state wasn't changed. | ||
|
||
- **Property 8**: If the calling canister made a guaranteed response (as opposed to a best-effort response) inter-canister call, and the call reaches the callee-canister which then produces a reply or reject response, the protocol guarantees that the callee-canister's generated reply or reject response gets back to the caller-canister. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that this property (of the caller being guaranteed to learn of the outcome of the call) applies more broadly than just for "reply or reject responses [produced by] the callee-canister". The system itself may also produce a reject response without making the call at all and this is also guaranteed to be delivered to the caller. And, importantly, there is always exactly one outcome for every call. So it's never the case that e.g. the system produces and delivers a reject response to the caller, but then somehow still delivers the request to the callee, which then produces a second response (that the system simply chooses to not deliver).
The difference in the best-effort response case is that the system may at any time, including before the call completes or even begins execution, produce a SYS_UNKNOWN
reject response and deliver it to the caller instead of the actual outcome (which is what would always be delivered for a guaranteed response call).
I.e. the former is always a closed loop, with no branches. Whereas the latter is an interrupted loop, where a SYS_UNKNOWN
reject response can be delivered at any time, while the call is potentially still ongoing or in the future (meaning that the caller can never find out the outcome in this way).
docs/developer-docs/security/security-best-practices/inter-canister-calls.mdx
Outdated
Show resolved
Hide resolved
|
||
## Overview of language runtime for asynchronous code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I dropped the "from the callee" part.
FWIW, I still see
in this mode the "true" response from the callee is not guaranteed to be delivered
Maybe you just didn't push your change yet.
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
|
||
ICP supports two kind of inter-canister calls: | ||
|
||
* Guaranteed response calls provide the caller with the guarantee that, if the callee produces a response (which may be unsuccessful, i.e., an error during call processing), that exact response will be delivered to the caller. Furthermore, if the request isn't successfully delivered to the callee (which can happen during high load, callee running out of cycles, and other reasons), the response will notify the caller of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
The other way I think of it (and I've mentioned this in an earlier comment) is that guaranteed response calls are closed loops: the request goes out (potentially all the way to the callee, potentially not very far); one response is produced from it (whether by the callee or by the system); that response is always returned to the caller (whether this takes until the heat death of the universe or just before).
docs/developer-docs/smart-contracts/advanced-features/async-code.mdx
Outdated
Show resolved
Hide resolved
docs/developer-docs/smart-contracts/best-practices/idempotency.mdx
Outdated
Show resolved
Hide resolved
…de.mdx Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some comments.
DO NOT MERGE YET - this should be merged only once best-effort responses go live on mainnet.
This PR contains the documentation for the new best-effort response feature. In the process, I noticed that we were lacking a good introductory page on inter-canister calls, addressing topics such as:
how do I find which endpoints to call?
what's the role of Candid in inter-canister calls?
why do I care about the desugaring of async/await
attaching cycles to calls
Follows the developer docs style guide.
Follows the best practices and guidelines.
New documentation pages include document tags.
New documentation pages include SEO keywords.
New documents are in the
.mdx
file format to support the previous two components.New documents are registered in
/sidebars.js
, otherwise, it will not appear in theside navigation bar.
Make sure that the
.github/CODEOWNERS
file isfilled with new documents that you added. This way we can ensure that future Pull Requests are reviewed by the right people.