Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion/Tracking: Adding more structure to Node.js' tests #54796

Closed
jasnell opened this issue Sep 6, 2024 · 25 comments
Closed

Discussion/Tracking: Adding more structure to Node.js' tests #54796

jasnell opened this issue Sep 6, 2024 · 25 comments
Labels
discuss Issues opened for discussions and feedbacks. test Issues and PRs related to the tests.

Comments

@jasnell
Copy link
Member

jasnell commented Sep 6, 2024

It's no secret at this point that other runtimes (Deno, Bun, Workers, etc) are seeking to improve compatibility with Node.js. In order to accomplish this, these other runtimes need the ability to run Node.js' tests since those are really the closet thing we have to a conformance test suite. The challenge, however, is that the current corpus of tests are largely unstructured blobs of code that mix tests for public API with tests for internal API using a bespoke test harness that relies on Node.js specific characteristics. This makes it difficult for runtimes to partially/selectively implement Node.js compatibility and generally forces these other runtimes to manually port tests to work in those environments... particularly when the tests depend on specific Node.js internals that are not necessarily part of the public API surface (e.g. any test that relies on --expose-internals)

This issue serves two purposes: 1) as a place where we can discuss and try to come to consensus on a way of improving the structure of our tests, and 2) a tracking issue to track the progress of such activity.

There are four key issues:

  1. Mixing of internal and public APIs in tests without clear delineations between those.
  2. Lack of consistent documentation/comments explaining exactly what is being tested at any given point. This includes lack of documentation around what the expected behaviors being asserted are, etc.
  3. Haphazard organization. Some of the test files haphazardly just kind of sprinkle individual tests throughout the file in an almost organic or ad hoc way.
  4. Use of the bespoke test harness (require('../common') and it's reliance on various internal, undocumented behaviors and features that may be difficult to reconcile to other environments.

To improve overall structure of the tests, I propose:

  1. ... that we begin making use of our own node:test internal module to impose a more logical structure to the tests. I don't think it matters if that is using either the test('...', () => { ... }) or BDD describe('...', () => { ... }) pattern. I've seen both in use in a few tests already and some folks seem to prefer the BDD pattern. I'm good with that.

  2. ... that we reduce dependency on our internal bespoke test harness. For instance, common.mustCall() can be replaced (albeit with a bit more verbosity) with mock.fn() checks. The current ../common.js can be decomposed into individual imports to allow it to be more incrementally applied. The harness can also be made a bit less specific to Node.js and possibly even published as a separate module that can be made to run on multiple runtimes.

  3. ... that we begin asking for tests to include more documentation/comments on what is being tested, what the expected behaviors are, what the edge cases are, etc. With more structure using the BDD model that likely comes more naturally and we might not need more comments in the tests, but there are still nuances that are worthy of improved documentation.

  4. ... that we separate tests verifying internal behaviors from tests verifying external behavior.

Ultimately I'd like us to get to a point where we have a clearly delineated set of public API tests that other runtimes can pick up almost as is and run in their environments... a lot like we can do with the Web Platform Tests. These would be separated from the internal tests that verify specific aspects of Node.js' internals that aren't considered public API and wouldn't really be expected of other runtimes to implement in exactly the same way. To that end, I would propose eventually splitting the current test/parallel directory into two separate folders, test/parallel/public and test/parallel/internal, with the test/parallel/public being the primary set of tests other runtimes should seek to run and align with. Likewise, we'd have test/sequential/public and test/sequential/internal ... or, we can flip it around and have test/public/parallel, test/public/sequential, test/internal/parallel, test/internal/sequential. The key differentiator is that nothing in test/public would be allowed to use things like --expose-internals and would be set up only to set the documented public API surface.

Adding additional structure here obviously increases dev friction a bit, so all this needs to be carefully considered.

This is no small task. There are many thousands of individual test files that would need to be updated. So before jumping into this whole process in earnest, let's chat about it and gets some input from the contributor base.

@nodejs/tsc, @nodejs/collaborators, @nodejs/testing

@cjihrig
Copy link
Contributor

cjihrig commented Sep 6, 2024

For instance, common.mustCall() can be replaced (albeit with a bit more verbosity) with mock.fn() checks. The current ../common.js can be decomposed into individual imports to allow it to be more incrementally applied.

Not to sidetrack the issue, but I would welcome some of the features like common.mustCall() in node:test.

@RedYetiDev RedYetiDev added test Issues and PRs related to the tests. discuss Issues opened for discussions and feedbacks. labels Sep 6, 2024
@jasnell
Copy link
Member Author

jasnell commented Sep 6, 2024

common.mustCall() can be covered by use of mock.fn() ... it's a bit verbose right now but possible. The key thing that trips up about common.mustCall() is that not every runtime actually supports a mechanism like process.on('exit', ...) where those are checked. If it could be scoped to the individual test (e.g. test('...', () => { ... }) then it would be a lot easier.

@RedYetiDev
Copy link
Member

RedYetiDev commented Sep 6, 2024

To that end, I would propose eventually splitting the current test/parallel directory into two separate folders

I've proposed #53640, which would allow the test runner to be able to pick up those subdirectories. I closed (and now reopened) it as it didn't reach consensus, but this issue may be a reason to reconsider it. (in a new light)


Overall, I'm so excited for this to coming :-)

@cjihrig
Copy link
Contributor

cjihrig commented Sep 6, 2024

common.mustCall() can be covered by use of mock.fn() ... it's a bit verbose right now but possible. The key thing that trips up about common.mustCall() is that not every runtime actually supports a mechanism like process.on('exit', ...) where those are checked. If it could be scoped to the individual test (e.g. test('...', () => { ... }) then it would be a lot easier.

I think mustCall() would be fairly straightforward to implement. We already have mocks scoped to individual tests that are restored when the test completes. I think we would want to add a new option to mock.fn(). That option would be a callback function that runs when the mock is restored. That callback would receive the mock context as an argument, and could perform assertions. mustCall() would be sugar that leverages that callback.

I believe this is a minimal implementation of `t.assert.mustCall()` if the callback I mentioned above existed.
diff --git a/lib/internal/test_runner/test.js b/lib/internal/test_runner/test.js
index 2672a25b61..9104cad524 100644
--- a/lib/internal/test_runner/test.js
+++ b/lib/internal/test_runner/test.js
@@ -122,6 +122,14 @@ function lazyAssertObject(harness) {
       assertObj.set(methodsToCopy[i], assert[methodsToCopy[i]]);
     }
 
+    assertObj.set('mustCall', function(fn, times) {
+      return this.mock.fn(fn, {
+        onRestore(mock) {
+          assert.strictEqual(mock.calls.length, times);
+        }
+      });
+    });
+
     const { getOptionValue } = require('internal/options');
     if (getOptionValue('--experimental-test-snapshots')) {
       const { SnapshotManager } = require('internal/test_runner/snapshot');

@cclauss
Copy link
Contributor

cclauss commented Sep 6, 2024

My comment at #53640 (comment) clearly slowed this process. I apologize for that. That comment was focused on Python tests only. Our modifications to Python tests should remove the custom test discovery code and instead rely on the automated test discovery conventions that are broadly used in other massive projects.

There concerns are unrelated to non-Python tests.

@targos
Copy link
Member

targos commented Sep 6, 2024

I am in favor of the general idea, but not really convinced by the separation between public and private APIs in the folder structure.

@lpinca
Copy link
Member

lpinca commented Sep 6, 2024

  1. ... that we begin making use of our own node:test internal module to impose a more logical structure to the tests

There are tests with subtests and in this case using node:test and more structure might make sense, but there are also tests where using node:test would only be unnecessary added complexity. An as example take the following test (parallel/test-sys.js)

'use strict';
require('../common');
const assert = require('assert');
const sys = require('sys');
const util = require('util');

assert.strictEqual(sys, util);

Rewriting it using node:test would result in something like this

'use strict';
require('../common');
const test = require('node:test');
const assert = require('assert');
const sys = require('sys');
const util = require('util');

test('sys was renamed to util', (t) => {
  assert.strictEqual(sys, util);
});

What do we gain from this? The test now uses unnecessary modules like async_hooks, runs slower, and it is still not portable as it uses a non-standard test syntax.

I think this whole effort is more beneficial for other runtimes than Node.js itself.

@LiviaMedeiros
Copy link
Contributor

About the splitting tests into subdirectories: are symlinks an option here? Roughly,

test/group/by-subsystem/util/test-util-internal.js -> test/parallel/test-util-internal.js
test/group/by-visibility/uses-internals/test-util-internal.js -> test/parallel/test-util-internal.js

So we can avoid breaking all scripts and habits depending on test/parallel/*.{js,mjs}, and allow folks to work with smaller logical groups at the same time.

@H4ad
Copy link
Member

H4ad commented Sep 6, 2024

What do we gain from this? The test now uses unnecessary modules like async_hooks, runs slower, and it is still not portable as it uses a non-standard test syntax.

By just reading the code, I can understand why this test exist with the second example but not with the first example by just looking at the code, and the complexity was just +3 lines of code.

To that end, I would propose eventually splitting the current test/parallel directory into two separate folders, test/parallel/public and test/parallel/internal, with the test/parallel/public

I think we can keep the public tests in the same folder today, and only move the internal ones just to not loose the history of changes of that file.

Also, maybe just renaming the test to have .internal or -internal will be better because of the scripts the @LiviaMedeiros mentioned.

@lpinca
Copy link
Member

lpinca commented Sep 6, 2024

and the complexity was just +3 lines of code

Requiring node:test is not just +3 lines of code.

@mcollina
Copy link
Member

mcollina commented Sep 6, 2024

While I favor improving our test suite, I wonder if making it easier for other implementations should be one of our project goals. As stated, providing a clear definition of Node.js would directly benefit Workers, Deno, and Bun.

I'm unsure if definiting a "Node.js Compatibility Kit" should be part of our strategy and if volunteers would contribute/be interested in it. However, we have to talk about it because Node.js is the de-facto standard, and CloudFlare, Deno and Oven (Bun's company) are lifting our tests and part of our codebases anyway. Therefore, taking control of how another runtime can claim to be "Node.js Compatible" is worthwhile and it should attract funding and contributions from said companies.

This could be seen as a major strategic initiative. Is CloudFlare going to sponsor/fund this work?


I don't see how the technical details matter at this point, because if we decide we want to have a Node.js Compatibility Kit, any technical roadblocks could be removed. Nevertheless, here is my take: I don't see how to move all tests to node:test. node:test customizes the running environment to run the tests. For example, it adds an AsyncLocalStorage and a few other key changes, likely making it hard to test async_hooks and other public APIs. In other cases, a change in the underlining libraries used by node:test would prevent the test from running, making it really hard to debug what's going on.

@benjamingr
Copy link
Member

I'm -1 on any changes that make Node tests worse in order to make it easier for other runtimes to be compatible with its APIs.

Node.js APIs are either web standards or they're Node's APIs, we only make guarantees to our users through the stability index and release cycle.

If someone wants a runtime compatible with Node.js in its entirety - they should use Node.js. If they want interoperable code instead they should build on web standards.

That said: anything that generally improves Node's tests I'm +1 on. I just don't think compatibility of our non-standard APIs should be a goal.

@jasnell
Copy link
Member Author

jasnell commented Sep 6, 2024

@mcollina :

... Is CloudFlare going to sponsor/fund this work?

They are funding my time and @anonrig's, yes. For us, this wouldn't be volunteer effort. We end up needing to port these tests manually right now and it's a major pain currently.

@benjamingr :

... I'm -1 on any changes that make Node tests worse ...

I'm not sure how this effort would be making Node tests worse. We're talking about improving structure, documentation, clarity, modularity, etc. Instead of generalized statements like this, it would be easier to discuss if you called out which of the proposed changes you think would make the tests worse. Or, perhaps better, what changes do you think would make the tests better (Putting the other runtimes bit aside for a moment)

... If someone wants a runtime compatible with Node.js in its entirety - they should use Node.js. If they want interoperable code instead they should build on web standards.

This doesn't reflect reality. We do build on web standards and the ecosystem wants the other runtimes to be compatible with Node.js. In order to do so, without risking introducing further incompatibilities across the ecosystem, we'd like to have an easier way of ensuring that we're matching Node.js' behavior rather than diverging from it. Building on web standards is not enough.

Also, workers, at least, is not aiming to be compatible with Node.js "in its entirety". We are working to meet user demand to at least support a subset of the ecosystem of modules that developers are actively using and want to keep using. We want to make sure we do so in a way that does not fragment that ecosystem further by introducing incompatible behaviors.

But again, let's put that aside for now.... other than dealing with the increasing flakiness of the test suite as a whole (which in entirely unrelated to this effort but still a problem we want to work on).. what changes would you propose make the tests better overall?

@joyeecheung
Copy link
Member

I think changing the tests directly in Node.js as proposed has a lot of risks:

  • Making backports to LTS harder
  • If the tests are not ported correctly they could get even more flaky (introducing asynchronicity to an otherwise fully synchronous test can be a big problem if not reviewed carefully, there are a lot of flakes caused by subtle timing issues)
  • Git history gets lost due to all the moves - to fix a flaky test, the context of the original test is very important to understand what it was for. Either that needs to be carefully documented down from the git history before the move, or it will only make code archeology harder

But then if the goal is for compatibility testing, why target Node.js core test suite? JS engines don’t run each others tests (unless they are forks), neither are browsers. They do test262 and Web platform tests to ensure compatibility - developing a quality common test suite and gradually porting tests over sound like a better plan to me. If a test is flaky because it’s just problematic we might even get help fixing it because that flake might also reproduce on another runtime.

@targos
Copy link
Member

targos commented Sep 6, 2024

I'd also like to note that many tests done on the internal/private API are crucial for ecosystem compatibility. There are many reasons for that, one of them being that some popular modules abuse internals.

@jasnell
Copy link
Member Author

jasnell commented Sep 6, 2024

@joyeecheung

If the tests are not ported correctly they could get even more flaky (introducing asynchronicity to an otherwise fully synchronous test can be a big problem if not reviewed carefully, there are a lot of flakes caused by subtle timing issues)

I would think this is obvious and implicit in the process. Any time things are being refactored for any reason these are the kinds of concerns that need to be kept in mind.

They do test262 and Web platform tests to ensure compatibility - developing a quality common test suite and gradually porting tests over sound like a better plan to me

Ok... but how do we get there? In order to do this we need a clearer understanding of what Node.js' behaviors are. test262 and Web Platform Tests have the benefit of having documented standards that clearly delineate the expected behaviors. Node.js does not. Instead, Node.js has thousands of poorly documented and unstructured tests and incomplete documentation.

I absolutely agree that something like test262/WPT is the ideal, but we have to start somewhere and improving the structure of our existing tests, making sure that what is being tested is clear, and drawing clear distinctions between behaviors that are internal implementation details of Node.js vs. expectations of the public API surface, are all essential steps in developing such a suite.

We also have to consider the maintenance burden. I do not want to end up with a case where we have two distinct, overlapping sets of tests that have to be kept in sync... one for Node.js' use that is unstructured, undocumented, and convoluted, and another that is more structured. Keeping them both up to date presents a major burden.

@targos

I'd also like to note that many tests done on the internal/private API are crucial for ecosystem compatibility. There are many reasons for that, one of them being that some popular modules abuse internals.

Sadly, yes, we hit that a lot. Having a clear delineation of these in the tests helps. Many of these details are undocumented and hidden amongst other things that are being tested and they are easy to miss.

@debadree25
Copy link
Member

debadree25 commented Sep 6, 2024

developing a quality common test suite and gradually porting tests over sound like a better plan to me.

just a question regarding this point would this mean like establishing something like a common standard for nodejs specific apis?

also maybe what we could do is have a separate tests directory (say /compat-tests for now) where other runtimes could directly contribute their compat tests (instead of changing our tests structures) and we can be careful about not breaking behaviours ? that would build up over time i guess? (test262 for example has provisions for staging tests and a folder for implementation contributed tests iiuc)

@joyeecheung
Copy link
Member

joyeecheung commented Sep 6, 2024

I would think this is obvious and implicit in the process. Any time things are being refactored for any reason these are the kinds of concerns that need to be kept in mind.

One could say writing tests with comments and try not make it too brittle or rely on GC is also obvious, and yet we are still left with the status of our current test suite because humans can always be flexible with the rules. Considering that the green rate of our CI as reported by https://github.com/nodejs/reliability has dropped to constant below 10%, how do we make sure it doesn’t get even worse because of new bugs/misinterpretation of the old tests introduced in this process? It also feels a bit unrealistic to talk about any major changes to the tests that will run for the PR CI when it has been in this state for years - personally I wouldn’t feel very comfortable with any major structural changes to the test suite until the green rate stops being so miserable (say at least above 50%?) for some sustained period of time.

@srl295
Copy link
Member

srl295 commented Sep 6, 2024

Hi all. Following all of the considerations here. Just a comment and hopefully suggestion for moving forward:

It seems like improving CI reliabilty for Node.js itself benefits everyone. Is it possible that this effort could start with a certain scope- a certain sub area of tests- with the goal of improving reliability, documentation etc. A sub area would reduce the backport burden.

Then bring those results back here with a proposal to expand the scope to restructuring even more tests - by that time with a track record.

@jasnell
Copy link
Member Author

jasnell commented Sep 6, 2024

Well, ultimately we can't actually make progress on anything here until we get the CI stable again. There are just way too many flaky failures to be productive. I think that's a separate higher priority concern. @anonrig and I are already (separately) starting to dig in on that and I hope others jump in and also help investigate and fix those tests. We can't actually do any of the rest of this without that since the extreme flakiness of the CI is blocking us from being able to land PRs.

@lpinca
Copy link
Member

lpinca commented Sep 6, 2024

@anonrig and I are already (separately) starting to dig in on that and I hope others jump in and also help investigate and fix those tests

This is a bit off topic but just an observation. The testing area is (fortunately) so wide that a recurring flaky test is not flaky in the developer machine. For example a test is flaky only when using shared libs or on ARM or only in that particular VM with Windows 10 with Node.js compiled with that particular MSVC version. It is very hard to investigated in some cases.

@Qard
Copy link
Member

Qard commented Sep 6, 2024

A few observations:

I feel the current separation of single-purpose tests is a feature which makes development iteration a lot smoother. If merging all tests for a feature into one file is an intent here we should be aware that may be a drag on developer productivity.

Does node:test exist equivalently on all the runtimes to make that a cross-runtime basis? Perhaps the current framework-less structure remains necessary? If we truly want runtime agnostics tests then I feel like we need to reduce the dependencies required to write such tests as much as possible.

Also, as was already pointed out, certain features become difficult or impossible to test if the test mechanism itself also uses that feature.

Wherever these new or adapted tests end up, this could be a great task for future Code & Learn events to get new folks contributing.

The flakiness does indeed make it really hard to trust the tests we have and so hard to trust correctness of any changes to them. I would lean towards the idea of just having a new, separate location where we clone the tests with updates to provide more general usability. As feature coverage is expanded there we can gradually remove the older tests which have been duplicated.

A major underlying question also exists here: should Node.js consider itself to be a reference implementation of a pattern, or should there be an external specification which Node.js merely considers itself an implementation of and follows whatever is defined there? There are significant implications to that question on how the project should be run and how other runtimes should coordinate such development.

Similar to when the io.js fork happened, we have a scenario where people could choose the path of handing over control of that definition to a place of shared responsibility between the runtimes, or we could dictate that definition ourselves and expect others to follow. Given the ubiquity of Node.js it may make sense for us to lead and for others to follow, and that is essentially how things operate now. But that doesn't necessarily need to be how things work. We could hand more control over to WinterCG to actually define things and not just port things.

This of course comes back to the question of should cross-runtime compatibility actually be a goal of the project? There are clearly some corporate entities depending on various Node.js concepts and, were it more straightforward for them to be doing their work directly in a shared space it could benefit us. Though it remains to be seen if corporate contribution would actually improve were we to cater to their needs more. I feel like we may need a more firm commitment from them of what value they bring before we should consider significantly altering our current state of things to enable them.

@jasnell
Copy link
Member Author

jasnell commented Sep 7, 2024

I feel the current separation of single-purpose tests is a feature which makes development iteration a lot smoother. If merging all tests for a feature into one file is an intent here we should be aware that may be a drag on developer productivity.

No, that's not a goal here. Also, we are very inconsistent about having separate single-purpose tests. If our tests were consistently single-purpose it would be a heck of a lot easier to make sense of them ;-)

Does node:test exist equivalently on all the runtimes to make that a cross-runtime basis? Perhaps the current framework-less structure remains necessary? If we truly want runtime agnostics tests then I feel like we need to reduce the dependencies required to write such tests as much as possible.

Deno uses the BDD pattern along with Deno.test. Bun is implementing node:test. workers uses a different syntax for structured tests but I plan to iterate on it so that it's closer to a BDD pattern. I don't particularly care if we use node:test syntax or not, we need better structure so that the tests can be more easily broken down, ported or used incrementally, etc. It's the introduction of more structure that is important here, not how that structure is implemented.

... should there be an external specification which Node.js merely considers itself an implementation of and follows whatever is defined there?

I think this is ideal but it would require Node.js giving up a certain amount of control over it's API surface. Is Node.js ready to do that?

@joyeecheung
Copy link
Member

I think this is ideal but it would require Node.js giving up a certain amount of control over it's API surface. Is Node.js ready to do that?

I feel that Node.js would be ready if other bigger JS runtimes are also ready to give up this control over their API to the external specification (i.e. similar to browsers, but then as far as I can tell even browsers don't strictly restrict non-standard innovation and the ones with more company resources are always putting out non-standard APIs).

@ovflowd
Copy link
Member

ovflowd commented Sep 10, 2024

I have mixed feelings here, and pretty much they cover what some people already said, but here I go (please bear with me):

I believe having a structured "conformance" test suite is a good direction for us; It allows us to have increased reliability, structured testing, and something we can quickly test against. That must be limited to public APIs and their contracts- expected inputs and outputs. We should add internal tests there.

The benefits of having structured tests (since some are ancient, super flaky, and having them all using node:test would make sense -- but again, for public APIs) are there, and I believe it's worth investing.

Having said that, as @anonrig recently pointed out on his Twitter, our test suite also suffers from outer-worldly issues, such as having few and scattered resources due to our increased load. Our CI has been suffering a lot, primarily due to our limited resources.

If folks from Cloudflare, like James, want to champion this initiative, then I'm sure. But I am entirely against spending the already neglected resources we have to introduce a conformance suite that, although it benefits us, also enormously benefits third-party runtimes, which are ultimately the parties interested in such a suite.

I would be more grateful if companies such as Deno and Oven spared resources for our own CI and helped implement such a conformance suite—after all, they want to address this pain point. I'm not mentioning Cloudflare here because, realistically, Cloudflare already does a lot for us, and I would feel ashamed of asking for more.

Indeed, we are also interested in having such structured tests. Still, we also have few resources to spare to rewrite our tests -- this takes a lot of time -- and I believe it's in the interest of Engineers from Oven.sh and DenoLand to implement this on our side. I doubt that if we don't get outside resources, we would, in any viable manner, be able to deliver this.

This said, I'm definitely +1 on structured tests and this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests