Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Safe JSON RPC server and client #9656

Merged
merged 6 commits into from
Nov 1, 2024
Merged

Conversation

spalladino
Copy link
Collaborator

@spalladino spalladino commented Nov 1, 2024

Current situation

What was wrong with our way of exposing our APIs over JSON RPC? First of all, the JsonRpcServer defaulted to exporting every method on the underlying handler, including those flagged as private in ts (though js-private like #method were safe). While we had a blacklist, it's safer to be explicit on what to expose.

Then, argument deserialization was handled based on client inputs exclusively, allowing to target any class registered in the RPC interface. This means a method expecting an instance of class A as an argument, could just be fed an instance of B by its caller.

Deserialization itself was also unchecked. Most fromJSON methods did not validate any of their inputs, just assuming the any input object matched the expected shape. Primitive arguments to functions were also unchecked, so a caller could eg pass a number where a string was expected.

Other unrelated issues included often forgetting to register a class for (de)serialization in the rpc server, or forgetting to type all API return types as async (if they are over an RPC interface, calling any method should be async!).

These issues all affected the client as well. Though security is not so much of an issue on the client, lack of validation could indeed cause bugs by querying a server with a mismatching version.

Proposed fix

We introduce a pair of "safe" JSON rpc client and server abstractions, that consume a schema along with a handler. The schema leverages zod for validating all inputs (in the case of the server) and outputs (in the case of the client) and defining the exact set of methods that are exposed. Schemas are type-checked against the interface, so a change in the interface will trigger tsc to alert us to change the schemas as well.

As a side effect of this change, since each method now knows what it should be deserializing, we can kill much of the custom logic for (de)serialization, such as the string and class converters, and just rely on vanilla json serialization. Each class that we intend to pass through the wire should expose a static zod schema used for both validation and hydration, and a toJSON method that is used for serialization:

export class TestNote {
  constructor(private data: string) {}

  static get schema() {
    return z.object({ data: z.string() }).transform(({ data }) => new TestNote(data));
  }

  toJSON() {
    return { data: this.data };
  }
}

Then the API is defined as a plain interface:

export interface TestStateApi {
  getNote: (index: number) => Promise<TestNote>;
  getNotes: () => Promise<TestNote[]>;
  clear: () => Promise<void>;
  addNotes: (notes: TestNote[]) => Promise<TestNote[]>;
  fail: () => Promise<void>;
  count: () => Promise<number>;
  getStatus: () => Promise<{ status: string; count: bigint }>;
}

With its corresponding schema:

export const TestStateSchema: ApiSchemaFor<TestStateApi> = {
  getNote: z.function().args(z.number()).returns(TestNote.schema),
  getNotes: z.function().returns(z.array(TestNote.schema)),
  clear: z.function().returns(z.void()),
  addNotes: z.function().args(z.array(TestNote.schema)).returns(z.array(TestNote.schema)),
  fail: z.function().returns(z.void()),
  count: z.function().returns(z.number()),
  getStatus: z.function().returns(z.object({ status: z.string(), count: schemas.BigInt })),
};

Scope of this PR

This PR introduces the new safe json rpc client and server abstractions, but does not yet enable them. All start methods still use the old flavors. Upcoming PRs will add schemas for all our public interfaces and make the switch.

if (res.error) {
throw res.error;
}
// TODO: Why check for string null and undefined?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check for string null and undefined?

I don't see any reason to

it('calls an RPC function that throws an error', async () => {
const response = await send({ method: 'fail', params: [] });
expectError(response, 500, 'Test state failed');
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests look nice

await next();
} catch (err: any) {
this.log.error(err);
if (err instanceof SyntaxError) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One wishlist item for me would be a request ID that is used to log this error so that it can be easily correlated from the tx bot e.g. side

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@ludamad ludamad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this, effort definitely appreciated - don't take me nodding along as not appreciating every point you made :)

@spalladino spalladino merged commit e63e219 into master Nov 1, 2024
63 checks passed
@spalladino spalladino deleted the palla/safe-json-rpc branch November 1, 2024 18:56
spalladino added a commit that referenced this pull request Nov 10, 2024
**Adds schemas for every API.** Every API exposed via JSON RPC now
requires a zod schema (see #9656 for more context on the rationale for
this change). All schemas are in `circuit-types/interfaces`, and look
like:


https://github.com/AztecProtocol/aztec-packages/blob/3e78ec721285fcd533cff61329a8e156958e2d65/yarn-project/circuit-types/src/interfaces/prover-node.ts#L33-L45

These schemas are type-checked against the interface via the
`ApiSchemaFor` utility type, so if the interface changes, schemas are
required by the compiler to change as well. Schemas are now used in the
JSON RPC server to 1) identify which methods are exposed (so we no
longer need the method disallowlist) and 2) parse their arguments. The
JSON RPC server, once it has identified the method to be called, grabs
the arguments schema and funnels the result of a vanilla JSON parse
through it.

Every type or struct that is exposed via an interface now has an
associated schema, which is referenced in the API for parsing. Schemas
both validate input and hydrate instances. This means that we no longer
set a `type` property to identify how to hydrate each object in a
request during deserialization, which was a security risk.


https://github.com/AztecProtocol/aztec-packages/blob/3e78ec721285fcd533cff61329a8e156958e2d65/yarn-project/circuit-types/src/l2_block.ts#L24-L32

Schemas are also used in the JSON RPC client for deserializing the
result types. Again, this lets us remove the `type` parameter from all
serialized entities, though this is still present in since it is
required by the `TypeRegistry` (still to be removed) which is only used
in the snapshot manager.

All schemas are tested via mini integration tests. These tests define a
mock implementation for each service, use it for setting up a JSON RPC
server, starting it in a free port, and test calling every method
through JSON RPC.


https://github.com/AztecProtocol/aztec-packages/blob/3e78ec721285fcd533cff61329a8e156958e2d65/yarn-project/circuit-types/src/interfaces/prover-node.test.ts#L12-L31

These changes prompted other changes. For instance, we introduced the
following changes to APIs:

- `ProvingJobSource.rejectProvingJob` now accepts a reason `string`
instead of an `Error` type
- `PXE.getEvents(type)` is removed in favor of `PXE.getEncryptedEvents`
and `PXE.getUnencryptedEvents` since both methods required different
arguments

We also removed service-management methods (ie `stop`) from interfaces.
We were inadvertently calling `stop` on remote instances over http when
we shouldn't have. We also typed some previously untyped interfaces,
such as the TXE's.

Fixes #9455
@benesjan
Copy link
Contributor

benesjan commented Jan 2, 2025

Lovely PR writeup! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants