Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nearcore doesn't return structured errors for most use cases of API #2976

Closed
vgrichina opened this issue Jul 10, 2020 · 31 comments · Fixed by #4399
Closed

nearcore doesn't return structured errors for most use cases of API #2976

vgrichina opened this issue Jul 10, 2020 · 31 comments · Fixed by #4399
Assignees
Labels
A-RPC Area: rpc P-high Priority: High

Comments

@vgrichina
Copy link
Collaborator

Structured errors work (#1839) is not complete and it's hard to know what to expect.

Example of error from nearcore:

    response.error {
      code: -32000,
      message: 'Server error',
      data: 'Other Error: block DtdfWgnt3QDEjoeARUK25vg8qh31kc3Lrg6qREsoJsST is ahead of head block 34eT8dRWfDPhZBiRGYvkHDj65ThaXCBTrnM4LQjn2Zab \n' +
        ' Cause: Unknown \n' +
        ' Backtrace:    0: failure::backtrace::Backtrace::new\n' +
        '   1: <actix::address::envelope::SyncEnvelopeProxy<A,M> as actix::address::envelope::EnvelopeProxy>::handle\n' +
        '   2: <actix::contextimpl::ContextFut<A,C> as core::future::future::Future>::poll\n' +
        '   3: tokio::runtime::task::raw::poll\n' +
        '   4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll\n' +
        '   5: actix_rt::runtime::Runtime::block_on\n' +
        '   6: near::main\n' +
        '   7: std::rt::lang_start_internal::{{closure}}::{{closure}}\n' +
        '             at rustc/a74d1862d4d87a56244958416fd05976c58ca1a8/src/libstd/rt.rs:52\n' +
        '      std::sys_common::backtrace::__rust_begin_short_backtrace\n' +
        '             at rustc/a74d1862d4d87a56244958416fd05976c58ca1a8/src/libstd/sys_common/backtrace.rs:130\n' +
        '   8: main\n' +
        '   9: __libc_start_main\n' +
        '  10: _start\n'
    }

Note that even actual error message Other Error: block DtdfWgnt3QDEjoeARUK25vg8qh31kc3Lrg6qREsoJsST is ahead of head block 34eT8dRWfDPhZBiRGYvkHDj65ThaXCBTrnM4LQjn2Zab doesn't go into proper place in message.

This effectively makes current error message strings part of API, which is very suboptimal.

Here's how we have to handle errors because of this:

near-api-js

src/account.ts
120:                if (!error.message.match(/Transaction \w+ doesn't exist/)) {
205:            if (e.message.includes('does not exist while viewing')) {
src/wallet-account.ts
193:                if (e.message.includes('does not have enough balance')) {

near-wallet

src/actions/account.js
78:            if (error.message.indexOf('does not exist while viewing') !== -1) {
src/reducers/allAccounts.js
16:        if (payload.message.match('does not exist')) {

Note that we also handle errors explicitly only in few cases (degrading user experience) because we want to avoid depending on error messages in this way. This problem also degrades quality of every app built on NEAR, as everybody has same problem.

@vgrichina
Copy link
Collaborator Author

Another example is that Timeout errors are of unclear structure:

#1839 (comment)

@MaksymZavershynskyi
Copy link
Contributor

@evgenykuzyakov Vlad Frolov is currently busy with P0 Rosetta. Would you be able to take a look at it?

@janedegtiareva
Copy link
Contributor

Consistent error-handling is important for devx:

@MaksymZavershynskyi
Copy link
Contributor

MaksymZavershynskyi commented Sep 19, 2020

Assigning 1 day since most of the work is going to be done in #2536

@frol
Copy link
Collaborator

frol commented Oct 22, 2020

It is going to be the follow-up issue after #3517 is finally done.

The plan is to expose the structure errors side-by-side with the currently awkward text errors. Once #3517 is done, it will be just a matter of the design of the structured errors in the RPC response. There has been some discussion happening in #3204, where I initially proposed to expose the very detailed view-client error as is, but on the second thought, we should better minimize the scope of the possible errors exposed through JSON RPC as otherwise, we won't be able to reason about the possible return values just as we cannot reason about them right now with the text messages (see the very first paragraph of this issue description).

$ http post https://rpc.testnet.near.org jsonrpc=2.0 id=dontcare method=block 'params:=[0]'

{
    "error": {
        "code": -32000, // don't touch it
        "data": "DB Not Found Error: BLOCK HEIGHT: 0 \n Cause: Unknown", // don't touch it
        "message": "Server error", // don't touch it
        "cause": {
            "kind": "QUERYING_ERROR",
            "cause": {
                "kind": "NOT_FOUND",
                "details": {
                    "kind": "BLOCK_BY_HEIGHT_QUERY_ERROR",
                    "data": {
                        "block_height": 0
                    }
                }
            }
        }
    },
    "id": "dontcare",
    "jsonrpc": "2.0"
}

Having the view-client error structure exposed to the public leads us into an unfortunate situation of not being able to guarantee that our API is stable as any change in the underlying structure will lead us into breaking changes.

This means that we want to have RPC-specific enums representing possible RPC errors (there is primitives::rpc module where we can put all of that). I would even use a dedicated enum for each endpoint, so the API client can expect that only the relevant error messages can be returned.

Just to include the relevant parts of the conversation here:

@vgrichina suggested (#3204 (comment)):

Personally I'd just prefer following response structure for errors:

{
   "errorCode": "ActionError.FunctionCallError.HostError.GuestPanic",
   "errorDetails":  {
       "panic_msg": "Attempt to call transfer on tokens belonging to another account."
    }
}

Well, I think we need to draw a line between what we want to have flat and what nested. I think it is not helpful to introduce a custom error code schema when we can just use JSON for that. However, if users don't care about whether it was a host error -> guest panic, and only care about it being a function call error with the panic message, that should be enough, I guess.

It is yet to find the exact level of detail we want, but here is my suggestion:

{
    "error": {
        "kind":  "ACTION_ERROR",
        "details": {
            "action_index": 0,
        },
        "cause": {
            "kind": "FUNCTION_CALL_ERROR",
            "details": {
                "panic_msg": "Attempt to call transfer on tokens belonging to another account."
            }
        }
    }
}

Various Concerns

The errors structure should be expressed with the format structure (not a custom encoding format)

I don't see any point in introducing a custom error format (e.g. ActionError.FunctionCallError.HostError.GuestPanic) that you should parse then after you already parsed some well-defined structure.

The errors should not be tightly coupled with the implementation of nearcore

It is very hard for the client to reason about the nearcore internal structure. The public API errors should reflect a reasonable level of detail while being generic enough.

The enumerations should be tagged

It is cumbersome to work with the following structure:

{
    "ACTION_ERROR": {
        "details": {
            "action_index": 0,
        },
        "cause": {
            "FUNCTION_CALL_ERROR": {
                "details": {
                    "panic_msg": "Attempt to call transfer on tokens belonging to another account."
                }
            }
        }
    }
}

This structure is hard to define and also it does not forbid to express something we clearly don't want to express (several types of errors mixed on the same level):

{
    "ACTION_ERROR": {
        "details": {
            "action_index": 0,
        },
        "cause": {
            "QQ_ERROR": {},
            "FUNCTION_CALL_ERROR": {
                "details": {
                    "panic_msg": "Attempt to call transfer on tokens belonging to another account."
                }
            }
        }
    },
    "SERVER_ERROR": {}
}

@vgrichina
Copy link
Collaborator Author

vgrichina commented Oct 27, 2020

Well, I think we need to draw a line between what we want to have flat and what nested. I think it is not helpful to introduce a custom error code schema when we can just use JSON for that. However, if users don't care about whether it was a host error -> guest panic, and only care about it being a function call error with the panic message, that should be enough, I guess.

I think it is not helpful to introduce a custom error code schema when we can just use JSON for that.

@frol no idea what do you mean by introducing custom error schema, may you elaborate

Can you also give some examples of real-world situations when developing frontend (e.g. explorer) where nested structure would be necessary and/or helpful? I've made examples of how it is actively harmful here: #3204 (comment)

I hope this time we don't ignore frontend preferences on error format (for easier DevX/UX) and follow YAGNI as well. As you noticed – we'll have to support all these details as stable API but still need to have flexibility internally. Simpler format likely will be easier to maintain as an API.

@frol
Copy link
Collaborator

frol commented Oct 28, 2020

@vgrichina Let's consider a View Function Call method.

The errors that can happen there end-to-end (not all of them are relevant to the RPC implementation) ordered based on the request processing order:

  • TCP errors (e.g. timeout, DNS, etc)
  • HTTPS errors (e.g. invalid certificate)
  • HTTP errors (e.g. all the nodes are unreachable and the load balancer returns HTTP 502 Bad Gateway)
  • Internal error (we should always keep this in mind) [kind of HTTP 500 Internal Error] - this is the error message we have control over
  • Invalid input (e.g. non-JSON input or missing parameters) [kind of HTTP 400 Bad Request]
  • Invalid block reference (e.g. the block height or hash does not exist) [kind of HTTP 404 Not Found]
  • Non-existing contract [kind of HTTP 404 Not Found, but with different details clearly stating that it was the contract that is missing, not the block]
  • Contract failed to get compiled for various reasons [kind of HTTP 502 Bad Gateway, which is absolutely fine from the RPC point of view]
  • Non-existing contract method [kind of HTTP 404 Not Found, but again, it should be clearly distinguishable from the rest "Not Found" errors]
  • Contract method crash during execution [kind of HTTP 500 Internal Error, but for contract execution which is an absolutely normal return value from RPC point of view, so I would call it HTTP 200 OK wrapping internal HTTP 500 Internal Error]
  • Contract run out of Gas [kind of HTTP 408 Request Timeout, but internal to the contract execution]
  • Contract attempted to transfer more balance than the account has

Transaction execution with a FunctionCall as one of the action will need to have basically the same error, but it should also include the Action index in the error structure to be helpful.

@frol
Copy link
Collaborator

frol commented Jun 1, 2021

We have finished the internal refactorings necessary to finally expose the structured errors in a systematic way.

I suggest we take the client implementation to design the errors. As I mentioned above, there are tons of different levels where things may go wrong.

Let's take broadcast_tx_commit endpoint for this case study, and define the interface for a hypothetical function on the client side:

fn broadcast_tx_commit(Transaction) -> Result<TransactionExecutionDetails, Error>

Error can be one of:

  • Network connectivity error (DNS lookup error, connection error, read/write error)
  • Validation error (invalid payload, e.g. missing parameters or unexpected method) - this is unexpected when the client library implements the API interface correctly
  • Handler error

When a network connectivity error is faced, it makes sense to retry the request, while validation errors won't get auto-resolved, and handler errors require some decision to be made.

I want to focus on validation errors and handler errors since network connectivity errors are not reported via JSON RPC, and they are mostly out of the discussion here.

Currently, we report all the errors the same way through obscure "data" field in the "error" object like:

  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "message": "Server error",
    "data": "Method not found"
  }
}
{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "message": "Server error",
    "data": "Block not found: Block #1 is unknown"
  }
}

NOTE: We are going to keep the old structure no matter what.

If we mix validation and handler errors into the same structure, we will get:

{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "METHOD_NOT_FOUND",
    "error_details": {
      "unknown_method_name": "xxx"
    },
    "message": "Server error",
    "data": "Method not found"
  }
}
{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "UNKNOWN_BLOCK",
    "error_details": {
      "block_height": 1
    },
    "message": "Server error",
    "data": "Block not found: Block #1 is unknown"
  }
}
{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "CONTRACT_EXECUTION_ERROR",
    "error_details": {
      "vm_error": "VM error message goes here",
      "block_height": 1,
      "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
    },
    "message": "Server error",
    "data": "Function call returned an error: VM error message goes here"
  }
}

Alternatively, we may split validation errors and handler errors by introducing 1 layer of nesting:

{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "VALIDATION_ERROR",
    "error_details": {
      "error_kind": "METHOD_NOT_FOUND",
      "error_details": {
        "unknown_method_name": "xxx"
      },
    },
    "message": "Server error",
    "data": "Method not found"
  }
}
{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "HANDLER_ERROR",
    "error_details": {
      "error_kind": "UNKNOWN_BLOCK",
      "error_details": {
        "block_height": 1
      },
    },
    "message": "Server error",
    "data": "Block not found: Block #1 is unknown"
  }
}
{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "HANDLER_ERROR",
    "error_details": {
      "error_kind": "CONTRACT_EXECUTION_ERROR",
      "error_details": {
        "vm_error": "VM error message goes here",
        "block_height": 1,
        "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
      }
    },
    "message": "Server error",
    "data": "Function call returned an error: VM error message goes here"
  }
}

Notice the first level of error can only be VALIDATION_ERROR or HANDLER_ERROR.

VALIDATION_ERRORS (incorrect method name, broken JSON payload, or invalid payload structure) is the type of errors that can happen to any JSON RPC call.

@volovyk-s @vgrichina If we mix validation and handler errors into the same namespace, I believe we mess with the client side handling. I have heard your concerns about nesting structures before. I still strongly believe that it is better to have clear namespaces that are easy to handle than a flat structure that is as useful as a string error message due to mix of various reasons as to why something went wrong. What do you think about this 1 level nesting?

@volovyks
Copy link

volovyks commented Jun 4, 2021

Are we adding new fields to the existing JSON RPC endpoints, or we will introduce new endpoints? I'm wondering if we really need to have old fields data and message there.

If the depth of this nested structure is not going to be changed, then I don't understand why do we need to have it. Why can not we just add one additional field, so it would look like this:

{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_type": "HANDLER_ERROR" // we can think of a better field name
    "error_kind": "CONTRACT_EXECUTION_ERROR",
    "error_details": {
      "vm_error": "VM error message goes here",
      "block_height": 1,
      "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
    },
    "message": "Server error",
    "data": "Function call returned an error: VM error message goes here"
  }
}

I can not come up with an example where 2 levels nested structure will be better. Please, elaborate.

2 level nested structure is not that bad, but in reality, people will always do something like error.error_kind.error_kind.

@frol
Copy link
Collaborator

frol commented Jun 4, 2021

@volovyk-s Nice! I have not thought about flattening the nesting and keeping both "type" and "kind". Let's brainstorm on the naming a bit:

  • error_type + error_kind seems a bit hard to immediately understand their relation
  • error_group + error_kind
  • error_class + error_kind (I don't like using type and class since the terms are quite overloaded)

@volovyks
Copy link

volovyks commented Jun 4, 2021

@frol Agree, let's stay away from type and class words. error_group is ok. Maybe error_category.

@khorolets
Copy link
Member

There was another idea, if we want to avoid 2 levels. Having error_group (error_type whatever) along with error_kind confuses me as a client developer. Like what it what? What is main important?

I propose to mix them. As a developer I'm interested in error and details. So:

...
error:
  error_kind: UNKNOWN_BLOCK,
  error_details:
    block_hash: 111111111...,
...

Looks more easier to me to handle.

At the same time

...
error:
  error_kind: METHOD_NOT_FOUND,
  error_details:
    method_name: "get_me_a_cofee",
...
...
error:
  error_kind: INTERNAL_ERROR,
  error_details:
    error_message: "Node reached its limits",
...

What do you think?

@volovyks
Copy link

volovyks commented Jun 4, 2021

@khorolets do yo want to move error_group to details? Like this:

{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "CONTRACT_EXECUTION_ERROR",
    "error_details": {
      "error_group": "HANDLER_ERROR"
      "vm_error": "VM error message goes here",
      "block_height": 1,
      "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
    },
    "message": "Server error",
    "data": "Function call returned an error: VM error message goes here"
  }
}

@frol
Copy link
Collaborator

frol commented Jun 4, 2021

@khorolets how is that different from the first variant I proposed above?

{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "error_kind": "UNKNOWN_BLOCK",
    "error_details": {
      "block_height": 1
    },
    "message": "Server error",
    "data": "Block not found: Block #1 is unknown"
  }
}

@khorolets
Copy link
Member

@frol no difference, yep.

@volovyk-s I suggest to avoid a group at all and to mix them together. To leave only one kind of errors and not split them. Or to use 2-levels as the most representative way.

@frol
Copy link
Collaborator

frol commented Jun 4, 2021

See my other comment in the thread for some context.

I want to have a summary. So far we have 4 variants:

  1. Simple and flat (a single error_kind and no information about whether it was a validation error or handler error)

    Examples:

    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "METHOD_NOT_FOUND",
        "error_details": {
          "unknown_method_name": "xxx"
        },
        "message": "Server error",
        "data": "Method not found"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "UNKNOWN_BLOCK",
        "error_details": {
          "block_height": 1
        },
        "message": "Server error",
        "data": "Block not found: Block #1 is unknown"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "CONTRACT_EXECUTION_ERROR",
        "error_details": {
          "vm_error": "VM error message goes here",
          "block_height": 1,
          "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
        },
        "message": "Server error",
        "data": "Function call returned an error: VM error message goes here"
      }
    }

  2. Nested (2 levels of error_kind, where the first level captures the high-level group of errors [validation vs handler errors], and the second provides the specific error_kind [e.g. missing account])

    Examples:

    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "VALIDATION_ERROR",
        "error_details": {
          "error_kind": "METHOD_NOT_FOUND",
          "error_details": {
            "unknown_method_name": "xxx"
          },
        },
        "message": "Server error",
        "data": "Method not found"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "HANDLER_ERROR",
        "error_details": {
          "error_kind": "UNKNOWN_BLOCK",
          "error_details": {
            "block_height": 1
          },
        },
        "message": "Server error",
        "data": "Block not found: Block #1 is unknown"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "HANDLER_ERROR",
        "error_details": {
          "error_kind": "CONTRACT_EXECUTION_ERROR",
          "error_details": {
            "vm_error": "VM error message goes here",
            "block_height": 1,
            "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
          }
        },
        "message": "Server error",
        "data": "Function call returned an error: VM error message goes here"
      }
    }

    Notice the first level of error can only be VALIDATION_ERROR or HANDLER_ERROR.

  3. Flattened nested variant (have two fields on the top level: error_group and error_kind)

    Examples:

    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_group": "VALIDATION_ERROR",
        "error_kind": "METHOD_NOT_FOUND",
        "error_details": {
          "unknown_method_name": "xxx"
        },
        "message": "Server error",
        "data": "Method not found"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_group": "HANDLER_ERROR",
        "error_kind": "UNKNOWN_BLOCK",
        "error_details": {
          "block_height": 1
        },
        "message": "Server error",
        "data": "Block not found: Block #1 is unknown"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_group": "HANDLER_ERROR",
        "error_kind": "CONTRACT_EXECUTION_ERROR",
        "error_details": {
          "vm_error": "VM error message goes here",
          "block_height": 1,
          "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
        },
        "message": "Server error",
        "data": "Function call returned an error: VM error message goes here"
      }
    }

  4. Flat with custom-encoded error_kind [e.g. HANDLER_ERROR.UNKNOWN_ACCOUNT]

    Examples:

    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "VALIDATION_ERROR.METHOD_NOT_FOUND",
        "error_details": {
          "unknown_method_name": "xxx"
        },
        "message": "Server error",
        "data": "Method not found"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "HANDLER_ERROR.UNKNOWN_BLOCK",
        "error_details": {
          "block_height": 1
        },
        "message": "Server error",
        "data": "Block not found: Block #1 is unknown"
      }
    }
    {
      "id": "dontcare",
      "jsonrpc": "2.0",
      "error": {
        "error_kind": "HANDLER_ERROR.CONTRACT_EXECUTION_ERROR",
        "error_details": {
          "vm_error": "VM error message goes here",
          "block_height": 1,
          "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
        },
        "message": "Server error",
        "data": "Function call returned an error: VM error message goes here"
      }
    }

Pros and cons

1. Simple and flat

Pros:
➕ Simple

Cons:
➖ Hard to exhaustively handle all the errors as there will be a mix of all sorts of errors, which most probably will lead to ad-hoc errors handling (similar to how it is done now with string errors)

2. Nested

Pros:
➕ Explicit (easy to reason about the error just from reading the error value)
➕ Easy to implement exhaustive error handling (you only need to handle the whole group of validation errors once, and then handle specific business logic errors [if (error_kind === ViewAccountErrors.UNKNOWN_ACCOUNT)])

Cons:
➖ Verbose

NOTE: All the pros might be worked around by the users by just using error.error_details.error_kind without even looking into error.error_kind first, but that could be a good and a bad thing at the same time 🤷‍♂️

3. Flattened nested variant

Pros:
➕ Simple
➕ Somewhat explicit (though, the current naming is said to be unintuitive)

Cons:
➖ Somewhat confusing at first glance

NOTE: Explicitness might be worked around by the users by just using error.error_kind without even looking into error.error_group first, but that could be a good and a bad thing at the same time 🤷‍♂️

4. Flat with custom-encoded error_kind

Pros:
➕ Somewhat simple
➕ It forces users to learn about error groups (validation vs handler errors) since every single error contains the error group prefix (HANDLER_ERROR.UNKNOWN_ACCOUNT)

Cons:
➖ Requires custom serialization
➖ Requires custom deserialization
➖ May lead to error handling based on starts_with/ends_with or split('.')[1], which is error-prone


I would go with the explicit variant and just let client libraries provide the platform-specific behavior, so I vote for (2). What do you think?

I imagine something like this to be implemented on the JS side:

try {
  const account = getAccount("frol.near");
} catch (error) {
  if (error.error_kind === nearApi.API_RESPONSE_ERRORS.NETWORK_ERROR) {
    // retry
  }
  if (error.error_kind !== nearApi.API_RESPONSE_ERRORS.HANDLER_ERROR) {
    throw Error("Unexpected validation error! Your client library constructed an incompatible request")
  }
  const { error_kind, error_details } = error.error_details;
  if (error_kind === nearApi.API_GET_ACCOUNT_ERRORS.UNKNOWN_ACCOUNT) {
    setError(`Account ${error_details.requested_account_id} has not been found`);
  } else {
    setError(`Unhandled error ${error_kind}: ${JSON.stringify(error_details)}`)
  }
}

Rust:

let account = match getAccount("frol.near") {
    Ok(account) => Some(account),
    Err(ResponseError::HandlerError(GetAccountError::UnknownAccount)) => {
        None
    }
    Err(ResponseError::HandlerError(err)) => {
        warn!("Unhandled error {:?}", err);
        None
    }
    Err(ResponseError::ValidationError(err)) => {
        panic!("Unexpected validation error! Your client library constructed an incompatible request: {:?}", err);
    }
    Err(ResponseError::NetworkError(err)) => {
        // retry    
    }
};

@telezhnaya
Copy link
Contributor

@frol thank you for such a great summary!
I like options 2 and 3.

The third option is not that confusing, actually. It could be also improved with better wording if we have an option to change it. After 5 minutes of thinking I have

error_group -> error_type
error_kind -> error_cause

It's still not perfect, but we could think more.

Option 2 looks better if we will ever need exhaustive error handling. If we can create meaningful examples where we need nested error types (general one and specific one, and also the cause itself), I'd go with option 2.

@volovyks
Copy link

volovyks commented Jun 4, 2021

@frol I would vote for #3, it would work the same way on JS side as #2 in your example, but it's flat and clean. I'm not against #2 though. The only thing that I don't like about it is that it has 2 error_kind fields.

@mikedotexe
Copy link

Frol, this is the best GitHub comment ever.
I vote for the third option, and don't think it's confusing to folks who will be "in the weeds" of analyzing a raw RPC result.

@frol
Copy link
Collaborator

frol commented Jun 14, 2021

I am also between (2) and (3). If we go with (3), we need to brainstorm the naming. I came up with the following (I feel neutral about some of them, but I still want to list as many as I can)

  1. error_family + error_kind / error_name / error_code
  2. error_namespace + error_kind
  3. error_layer + error_kind
  4. error_type + error_subtype
  5. error_class + error_handle / error_name / error_code

I am not a huge fan of type and class variants since these terms are too overloaded.

I find error_family + error_kind to be the most compelling to me.

P.S. I checked Ethereum JSON RPC, and it was quite a challenge to find any documentation, and I only found something useful here. They went with numerical error codes, where they encoded special meaning into various ranges of the numbers. I find this approach the least developers-friendly.

@MaximusHaximus
Copy link

I think that to match up with the JSONRPC spec, we should change error_details to data, as it can be structured (ref: https://www.jsonrpc.org/specification#error_object). What I see in the data property on the examples above looks more like what I'd expect to see in message.

From a NodeJS developer perspective for de-serializing these errors, we have name and message properties on native errors. I would make error_kind->name, message->message, and if I was using VError (Joyent node error best practices - ref: https://github.com/joyent/node-verror#deeper-dive) then data would become my error info.

If we have decided that this problem is as simple as "add a field to the error that allows us to classify each error as one of two classes of errors -- either contract error or validation error", then I vote for # 3.

However, if we do see value in having truly nested errors, I would really want to see each nested layer be a complete error, with its own message, name, and data properties. Having error_details contain an error_kind property, and its own error_details property is confusing to me, and would make error_details and error_kind reserved property names for the structured data that can be added to error_details (data). I think combining nested error/causes with error context/data will be confusing.

I think if we want actual nested errors, they should be able to be a chain n levels deep of causes where each one is its own error entirely, with its own cause, and data for each error is context that is specific to that particular error.

e.g.:

{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "name": "VALIDATION_ERROR",
    "message": "Invalid parameter provided",
    "data": {
      "parameter_name": "method_name"
    },
    "cause": {
      "name": "METHOD_NOT_FOUND",
      "message": "Method not found",
      "data": {
        "unknown_method_name": "xxx"
      },
      "cause": {...}
    }
  }
}

If I re-constructed this using a chain of cause errors using VError, then:
message of my error would be: VALIDATION_ERROR: Invalid parameter provided: Method not found
and info (data) of this error would be:

{
  "parameter_name": "method_name",
  "unknown_method_name": "xxx"
}

It's true that I would need to construct the chain of errors from the inside-out, and finding a specific type of error would require doing some work (find any .cause() with a specific name), but truly nested errors are a lot more descriptive and powerful, and this pattern isn't uncommon - VError provides a handy helper for this -- VError.findCauseByName(). We could use the browser-friendly NError (ref: https://github.com/Netflix/nerror) in near-api-js to reconstruct error cause chains and directly throw 'native' JS Errors with causes, and anyone consuming them would use NError/VError helpers to work with them (we don't need to reinvent the wheel here)

If there is only an 'error family' to be added to each error, then I like the idea of it just being a property directly on the error; trying to model it as 'kinda sort of' a nested error feels dangerous, I would expect it to inevitably clash with any API we created for 'real' nested errors/cause chains when/if we build that.

Note that if we don't see value in fully nested errors and we decide to go with # 3 for now, we could still add fully nested errors later by adding a cause property as outlined here -- as long as we are careful that what we come up with here doesn't make implementing actual nested errors/cause chains later awkward IMO :).

@volovyks
Copy link

I am also between (2) and (3). If we go with (3), we need to brainstorm the naming. I came up with the following (I feel neutral about some of them, but I still want to list as many as I can)

1. `error_family` + `error_kind` / `error_name` / `error_code`

2. `error_namespace` + `error_kind`

3. `error_layer` + `error_kind`

4. `error_type` + `error_subtype`

5. `error_class` + `error_handle` / `error_name` / `error_code`

I am not a huge fan of type and class variants since these terms are too overloaded.

I find error_family + error_kind to be the most compelling to me.

P.S. I checked Ethereum JSON RPC, and it was quite a challenge to find any documentation, and I only found something useful here. They went with numerical error codes, where they encoded special meaning into various ranges of the numbers. I find this approach the least developers-friendly.

I would also vote for №1 ('error_family' + error_kind).
Definitely not error_code (I don't like the name and I don't think that numeric error codes will work the best here).

@frol
Copy link
Collaborator

frol commented Jun 15, 2021

@MaximusHaximus

I think that to match up with the JSONRPC spec, we should change error_details to data

For backward compatibility reasons, we don’t want to touch the existing message and data fields.

I like the proposal of having name + cause + info as it is defined in VError, so here is how I can see it fit into our reality (this is an updated version of the (2) nesting variant)

Examples:

{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "name": "VALIDATION_ERROR",
    "cause": {
      "name": "METHOD_NOT_FOUND",
      "info": {
        "unknown_method_name": "xxx"
      },
    },
    "message": "Server error",
    "data": "Method not found"
  }
}
{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "name": "HANDLER_ERROR",
    "cause": {
      "name": "UNKNOWN_BLOCK",
      "info": {
        "block_height": 1
      },
    },
    "message": "Server error",
    "data": "Block not found: Block #1 is unknown"
  }
}
{
  "id": "dontcare",
  "jsonrpc": "2.0",
  "error": {
    "name": "HANDLER_ERROR",
    "cause": {
      "name": "CONTRACT_EXECUTION_ERROR",
      "info": {
        "vm_error": "VM error message goes here",
        "block_height": 1,
        "block_hash": "AMABLOCKHASHHAHAHAHAHAHAHAMABLOCKHASHHAHAHAHAHAHAHHAHAHAHA",
      }
    },
    "message": "Server error",
    "data": "Function call returned an error: VM error message goes here"
  }
}

My personal scoreboard is:

TOP-1: VError-like nested errors (name + cause + info)
TOP-2: Flattened nested variant with error_family + error_kind + error_details

@khorolets
Copy link
Member

@frol's

TOP-1: VError-like nested errors (name + cause + info)

+1

P.S. I'm fine with top-2 as well

@frol
Copy link
Collaborator

frol commented Jun 16, 2021

I set the final comment period until 2021-06-20 (this Sunday).

The current decision is to go with VError style and naming (see the comment above)

@vgrichina
Copy link
Collaborator Author

@frol did you consult with any users of API besides wallet team?

Looks like the proposed grouping is extremely implementation centric. E.g. grouping by the module where error happened doesn’t seem as useful as understanding whether given error is permanent or temporary (i.e. client should retry the same query).

Maybe I’m missing some context though.

@frol
Copy link
Collaborator

frol commented Jun 18, 2021

did you consult with any users of API besides wallet team?

@vgrichina I invited to discuss this problem in public channels and then invited individual people to chip it.

Looks like the proposed grouping is extremely implementation centric. E.g. grouping by the module where error happened doesn’t seem as useful as understanding whether given error is permanent or temporary (i.e. client should retry the same query).

I would treat "validation error" as permanent errors, and "handler error" will require making some judgment on the application layer, as API cannot tell if an application needs to retry fetching the block which does not exist now.

If it is an error during the request processing (in my examples that is "handler error", but I am happy to hear more ideas here as well), you get a limited number of possible errors, such as "unknown block", "contract execution error".

Can you provide an example of how to make them less implementation-centric than that?

@vgrichina
Copy link
Collaborator Author

did you consult with any users of API besides wallet team?

@vgrichina I invited to discuss this problem in public channels and then invited individual people to chip it.

Looks like the proposed grouping is extremely implementation centric. E.g. grouping by the module where error happened doesn’t seem as useful as understanding whether given error is permanent or temporary (i.e. client should retry the same query).

I would treat "validation error" as permanent errors, and "handler error" will require making some judgment on the application layer, as API cannot tell if an application needs to retry fetching the block which does not exist now.

If it is an error during the request processing (in my examples that is "handler error", but I am happy to hear more ideas here as well), you get a limited number of possible errors, such as "unknown block", "contract execution error".

Can you provide an example of how to make them less implementation-centric than that?

sorry, looks I misunderstood the context

examples in #2976 (comment) make sense to me

@volovyks
Copy link

@frol can we consider this discussion as closed in favor of VError pattern?

@frol
Copy link
Collaborator

frol commented Jun 22, 2021

The final decision is to go with VError style and naming (see the comment above)

Thanks everyone for your review and suggestions!

@MaximusHaximus Thanks for suggesting VError!

The next step is to expose the internal structured errors in this VError format. @khorolets is going to keep it rolling.

@frol
Copy link
Collaborator

frol commented Aug 24, 2021

@volovyk-s @vgrichina @mikedotexe @amgando FYI, NEAR documentation covers all possible JSON RPC errors under each individual method, e.g. https://docs.near.org/docs/api/rpc/access-keys#what-could-go-wrong

Kudos to @khorolets for implementing the structured errors and documenting those!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-RPC Area: rpc P-high Priority: High
Projects
None yet
10 participants