Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETSCRIPT: Update ScriptDeath to extend Error #1685

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

alainbryden
Copy link
Contributor

Summary of changes

  • Rename the "name" property to "hostname" so that it doesn't clash with inherited Error.name property.
  • Extend Error, this way if a user ends up catching it they can:
    • Test the type of error caught (name) and ignore it if desired
    • Get a stack trace and find out where they went wrong
    • Not require special case error printing logic for catching things that are neither strings, nor Error objects.

Motivation

Not sure how many have seen one of these:
image

We throw a new ScriptDeath instance in the netscript checkEnvFlags when we want to (silently) kill code still running on a script that's been killed - whenever they next invoke an ns instance. Problem is users with their own error handling logic can catch these, and it's difficult to make heads or tails of them. If they behaved more like normal Error instances, they would be easier to deal with.

As of #1513 - we no longer even display these errors in the UI when handled, so this is really just for the benefit of the user, who might still catch one of these in their try/catch handlers and not know what it is or where it came from (I was one such user).

Tests

nano test.js

let _ns = null;
/** @param {NS} ns */
export async function main(ns) {
    try {
        _ns ??= ns; // Simulates a bad global cache (usually a result of bad coding)
        ns.tail();
        runHelper();
    } catch (err) {
        ns.print('ERROR: Caught an unexpected error!')
        ns.print('Direct ns.print:')
        ns.print(err)
        ns.print('Class Name: ' + err.constructor?.name)
        ns.print('Name: ' + err.name)
        if (err.toString !== undefined)
            ns.print('toString: ' + err.toString())
        if (err.stack !== undefined)
            ns.print('stack: ' + err.stack)
        ns.print('Stringified: ' + JSON.stringify(err))
        ns.print('Manually serialized: ' + Object.keys(err).map(k => `${k}: ${err[k]}`).join(', '))
    }
}

function runHelper() {
    _ns.print("Now run this script a second time.");
}

This is what it looks like when you run this script twice within a few seconds on the current "stable build"
image

This is what it looks like when you run it on the latest "dev" build (2.6.3):
image

Note that the unused "errorMessage" property was recently removed - other than that, no difference.

This is what it now looks like with the change in place:
image
In release mode, the "internal" part of the stack trace (first 2 lines) would be minified, but the important part is that the player can now see that the line of code in test.js that caused the error was "runHelper.js:24:7":
image
This information, combined with the error message "Attempted to invoke an unsupported ns function from a killed process" hopefully points them in the right direction (that they were using a "bad" ns instance from the previous run)

- Rename the "name" property to "hostname" so that it doesn't clash with inherited Error.name property.
- Extend Error, this way if a user ends up catching it they can:
  - Test the type of error caught (name) and ignore it if desired
  - Get a stack trace and find out where they went wrong
  - Not require special case error printing logic for catching things that are neither strings, nor Error objects.
@d0sboots
Copy link
Collaborator

d0sboots commented Oct 8, 2024

There's one (potential) significant issue here: ScriptDeath objects are created in a variety of circumstances, and the (small) object currently created is much cheaper than a full Error which contains stacktrace information. This might not be an issue, because normal script termination doesn't invoke ScriptDeath (or it can be trivially tweaked to not), but many scenarios that involve kill do. I'm not certain, but I'm wary that there may be a high-performance usecase that currently involves a lot of killing.

Is there a different way the object could be laid out that would give you more of a hint, without requiring it to be a full Error?

@Alpheus
Copy link
Contributor

Alpheus commented Oct 8, 2024

There's one (potential) significant issue here: ScriptDeath objects are created in a variety of circumstances, and the (small) object currently created is much cheaper than a full Error which contains stacktrace information. This might not be an issue, because normal script termination doesn't invoke ScriptDeath (or it can be trivially tweaked to not), but many scenarios that involve kill do. I'm not certain, but I'm wary that there may be a high-performance usecase that currently involves a lot of killing.

Is there a different way the object could be laid out that would give you more of a hint, without requiring it to be a full Error?

You can get a stack trace from new Error().stack but the performance impact will be from what the error does primarily, not how we're passing it.

Though you said it may not be an issue, let's run everyone's batchers on dev after the merge for a quick test or do perf tests (there already are some).

@d0sboots
Copy link
Collaborator

d0sboots commented Oct 8, 2024

The issue is that I'm concerned there's a class of scripts I don't know about, that will be impacted. There may not be, but I vaguely feel like there might be. And there is an overhead from Error, because the stack has to grabbed at the time of creation.

If it would be good enough to implement toString on ScriptDeath so that it has a more friendly representation, that has no overhead (since it's just part of the prototype).

@alainbryden
Copy link
Contributor Author

To help with the decision, here's a performance test of the two I whipped up.
Throwing the new object is 4-10 times more expensive depending on environment, but we can still generally throw ten thousand of these in under 30 milliseconds in a "kill-all-scripts" scenario:

Latest dev run in Firefox:
image

Latest dev run in Chrome:
image

Latest Steam:
image

Test code is below:

/** @param {NS} ns */
export async function main(ns) {
    let lastError;
    const tests = [1, 10, 100, 1000, 10_000, 100_000];
    for (const testSize of tests) {
        const startTime = performance.now();
        for (let i = 0; i < testSize; i++) {
            try { trace(true) } catch (error) { lastError = error; }
        }
        const endTime = performance.now();
        ns.print(`Took ${endTime - startTime} milliseconds to throw ${testSize} NEW ScriptDeath errors.`);
    }
    ns.print(`ERROR: ${lastError}\n${lastError.stack}`);

    for (const testSize of tests) {
        const startTime = performance.now();
        for (let i = 0; i < testSize; i++) {
            try { trace(false) } catch (error) { lastError = error; }
        }
        const endTime = performance.now();
        ns.print(`Took ${endTime - startTime} milliseconds to throw ${testSize} OLD ScriptDeath errors.`);
    }
    ns.print(`ERROR: ${lastError}\n${lastError.stack}`);
}

function trace(useNewScriptDeath) {
    stack(useNewScriptDeath);
}

function stack(useNewScriptDeath) {
    deep(useNewScriptDeath);
}

function deep(useNewScriptDeath) {
    some(useNewScriptDeath);
}

function some(useNewScriptDeath) {
    if (useNewScriptDeath)
        throw new NewScriptDeath(mockProcess);
    else
        throw new OldScriptDeath(mockProcess);
}

const mockProcess = { 'pid': 1234, 'name': 'test.js', 'hostname': 'home' };

export class OldScriptDeath {
    /** Process ID number. */
    pid;
    /** Filename of the script. */
    filename;
    /** IP Address on which the script was running */
    hostname;

    constructor(ws) {
        this.pid = ws.pid;
        this.name = ws.name;
        this.hostname = ws.hostname;

        Object.freeze(this);
    }
}

export class NewScriptDeath extends Error {
    /** Process ID number. */
    pid;
    /** Filename of the script. */
    filename;
    /** IP Address on which the script was running */
    hostname;

    constructor(ws) {
        // Invoke the Error constructor with a meaningful message
        const message = `Attempted to invoke an unsupported ns function from a killed process (${ws.name} running on ${ws.hostname} with pid ${ws.pid})`;
        super(message);
        // Setting the base Error.name property is important to facilitate easy
        // detection, since prototype.constructor.name might be minified for them.
        this.name = "ScriptDeath";

        // Set own properties
        this.pid = ws.pid;
        this.filename = ws.name;
        this.hostname = ws.hostname;

        // Fix prototype, required when extending Error
        Object.setPrototypeOf(this, NewScriptDeath.prototype);
        Object.freeze(this);
    }
}

@Alpheus
Copy link
Contributor

Alpheus commented Oct 8, 2024

Could you check in the tests as well please?

@catloversg
Copy link
Contributor

I think d0sboots's suggestion is the better approach. We can implement 2 functions:

  • toString gives a short but still useful error message.
  • getStack gives a full stack trace if the player wants.

PoC:

export class ScriptDeath {
  /** Process ID number. */
  pid: number;

  /** Filename of the script. */
  name: string;

  /** IP Address on which the script was running */
  hostname: string;

  constructor(ws: WorkerScript) {
    this.pid = ws.pid;
    this.name = ws.name;
    this.hostname = ws.hostname;

    Object.freeze(this);
  }

  toString(): string {
    return `Attempted to invoke an NS function from a killed process (${this.name} running on ${this.hostname} with pid ${this.pid}). Call getStack() to get the stack trace.`;
  }

  getStack(): string | undefined {
    return Error().stack;
  }
}

Capture

@Alpheus
Copy link
Contributor

Alpheus commented Oct 8, 2024

I think d0sboots's suggestion is the better approach. We can implement 2 functions:

  • toString gives a short but still useful error message.
  • getStack gives a full stack trace if the player wants.

PoC:

export class ScriptDeath {
  /** Process ID number. */
  pid: number;

  /** Filename of the script. */
  name: string;

  /** IP Address on which the script was running */
  hostname: string;

  constructor(ws: WorkerScript) {
    this.pid = ws.pid;
    this.name = ws.name;
    this.hostname = ws.hostname;

    Object.freeze(this);
  }

  toString(): string {
    return `Attempted to invoke an NS function from a killed process (${this.name} running on ${this.hostname} with pid ${this.pid}). Call getStack() to get the stack trace.`;
  }

  getStack(): string | undefined {
    return Error().stack;
  }
}

Capture

Won't that generate the wrong trace?

@catloversg
Copy link
Contributor

No.
The only difference is between this:
Capture1
and this:
Capture2
What matters is the stack trace in the player's scripts. That's what they need to troubleshoot their problems.

@Alpheus
Copy link
Contributor

Alpheus commented Oct 8, 2024

No. The only difference is between this: Capture1 and this: Capture2 What matters is the stack trace in the player's scripts. That's what they need to troubleshoot their problems.

It's odd to me that its implied you have to call stack immediately after the object is created. If the snapshot time is important then we're talking about 2 different ScriptDeaths, one with a stack, one without. Why not add a separate constructor for the stack case instead?

@catloversg
Copy link
Contributor

I don't understand your question. When the player catches a ScriptDeath, they can optionally get the stack trace (via calling getStack() on that ScriptDeath instance) to see why their script tries to run an NS function after the script was killed. If they don't care about that, they (and we) don't need to do anything. "On-demand" getStack() is why there is need to worry about the (potential) performance impact of a new Error object.

@alainbryden
Copy link
Contributor Author

Alpheus is right, that's the wrong stack trace - the stack where the exception was handled, not where it was thrown.

Try running the example I gave in my previous description. The stack trace should at least start from "runHelper", but it will only include main.

Throw an error in a function 4 or 5 deep (like in my performance test) and it'll be even more clear.

@d0sboots
Copy link
Collaborator

d0sboots commented Oct 9, 2024

Throwing the new object is 4-10 times more expensive depending on environment, but we can still generally throw ten thousand of these in under 30 milliseconds in a "kill-all-scripts" scenario

This is what I worried about, because IIRC that's approaching the same speed of running scripts in a tight loop, currently. So in performance critical code, this very well could be an issue.

Alpheus is right, that's the wrong stack trace - the stack where the exception was handled, not where it was thrown.

Also agreed. If the stack trace is actually needed, it must be captured at the creation of the object. What is still not clear to me is if/why the stack trace is needed, or if it's good enough to clearly identify the object in other ways.

@alainbryden
Copy link
Contributor Author

alainbryden commented Oct 9, 2024

Throwing the new object is 4-10 times more expensive depending on environment, but we can still generally throw ten thousand of these in under 30 milliseconds in a "kill-all-scripts" scenario

This is what I worried about, because IIRC that's approaching the same speed of running scripts in a tight loop, currently. So in performance critical code, this very well could be an issue.

Is there any scenario we would be throwing on the order of 10,000 script deaths while someone is trying to run their performance-critical (e.g. H/G/W timing) loops? Only time I think this would happen is resetting / ascending, in which case hack timing loops don't matter.

As you said earlier, normal scripts ending don't trigger this, only an explicit kill command trying to prematurely terminate a running script.

Here's my thinking:

Typically scripts are killed only one at a time. Maybe a few hundred in special circumstances (e.g. a user killing a slew of scheduled hack/grow scripts that are going against stock they want to manipulate). In these cases, its still sub-1ms, not even noticeable.

The only time all thousands or tens of thousands get killed, which is still only 30ms, less than a game tick, is when resetting, at which point user-script timings don't matter anymore (they're all being killed). Even then, I would suspect that there are much slower processes being kicked off to reset most aspects of the game (mostly UI stuff), and this is just a drop in the bucket that amounts to having to wait slightly longer for prestigeWorkerScripts to finish up.

@alainbryden
Copy link
Contributor Author

alainbryden commented Oct 9, 2024

What is still not clear to me is if/why the stack trace is needed, or if it's good enough to clearly identify the object in other ways.

This is a tough pitch, you've either been in a situation where stack traces save your bacon, or you haven't. Anecdotally, a bunch of my scripts are upwards of 1000 lines of code. Unless we have a way to say which line of code triggered the ns call that blew up, I'd be hard pressed to track it down.

This is what happened to me the other day. I made a slight change to my daemon.js (1997 lines of code) wherein I cached a list of Server objects (custom helper class) in sorted order, so that I didn't have to pay the O(N^2) price of re-sorting them each loop. This cache was declared at the top of the file with a few others, but whenever I introduced it, I forgot to also re-initialize it at the start of my "function main(ns)" with the other globals...

This is the change that murdered me, after 2 years of barely touching this code:
image
Rather than re-sort targets by their hack-experience potential on each loop, I decided to cache it after the first time it was retrieved, since it doesn't change significantly over time. Unfortunately, this variable which used to be overwritten on each loop was now persisted between script invocations.

The Server objects get this little "convenience" copy of the ns object when they're constructed, which is supposed to die with the class as it's disposed (when the script dies):
image
Innocuous enough, in the name of letting me define some ns-dependent getters, but definitely a recipe for disaster if somehow this "Server" instance doesn't die with the script.

So suddenly after a few days of coding changes (including that addition) my script started throwing encountering these weird throws and I had to start scouring all recently changed code to see what could have caused it. I was this close to wrapping each and every function in my script (about 100) with a try/catch and custom log so I could at least narrow down which function the error was happening in.

I finally solved it on a hunch, after remembering the issues the game had with global properties still having their old values between different runs of the same script. (I know this is a me-problem, and I need to re-write all my old scripts to not use any globals, but it's a big undertaking)

@d0sboots
Copy link
Collaborator

d0sboots commented Oct 9, 2024

You might have misunderstood what I was asking, but that's OK because you also answered the question just now. :) I wasn't sure if the problem was "it was unclear initially what these error objects were" or "I needed the stack traces to figure out what was going wrong in my code." From your description, it was primarily the latter.

I did an audit of the code, to find all the places a new ScriptDeath object can be created. I found these cases:

  • ns.exit()
  • ns.spawn()
  • Calling an ns function after the script is already dead.
  • Finishing a promise-based operation (like hack(), grow(), etc.) after the script is already dead.
  • Killing (for any reason) a script while a promise-based operation is ongoing. (Note that this is a different codepath than the previous bullet, but they essentially cover the same scenario. They are guarding different edge cases.)

Out of all of these, I only have high-frequency concerns about the last, and maybe the first. (But if someone is using ns.exit to kill themselves and running something like a batcher, probably they can restructure their code.) Even in the last case, killing a script while it is executing a long-running operation is not something you should be doing thousands of times per second; that seems like a failure to calculate proper run times. So, on the whole I think I will note my potential concerns, but say they are overridden by the better debuggability here.

src/Netscript/ScriptDeath.ts Outdated Show resolved Hide resolved
src/Netscript/ScriptDeath.ts Outdated Show resolved Hide resolved
src/Netscript/ScriptDeath.ts Outdated Show resolved Hide resolved
@Alpheus
Copy link
Contributor

Alpheus commented Oct 9, 2024

I did an audit of the code, to find all the places a new ScriptDeath object can be created. I found these cases:

  • ns.exit()
  • ns.spawn()
  • Calling an ns function after the script is already dead.
  • Finishing a promise-based operation (like hack(), grow(), etc.) after the script is already dead.
  • Killing (for any reason) a script while a promise-based operation is ongoing. (Note that this is a different codepath than the previous bullet, but they essentially cover the same scenario. They are guarding different edge cases.)

Out of all of these, I only have high-frequency concerns about the last, and maybe the first. (But if someone is using ns.exit to kill themselves and running something like a batcher, probably they can restructure their code.)

Now how do we make sure that we don't accidentally introduce ScriptDeath into one of the high-speed paths in the future? Or at least signal that it's a bad idea?

@alainbryden
Copy link
Contributor Author

alainbryden commented Oct 9, 2024

I see what you're saying. We are creating ScriptDeath objects in a few places where they aren't necessarily being thrown, and in those non-throw situations extending Error is quite frivolous...

Is it worth creating a separate ScriptDeathError object (constructed with a ScriptDeath object) which we construct only as and when we're getting ready to explicitly throw a ScriptDeath? Other cases where we merely return, or reject a promise via a ScriptDeath instance can continue using the simple object.

@Alpheus
Copy link
Contributor

Alpheus commented Oct 9, 2024

I see what you're saying. We are creating ScriptDeath objects in a few places where they aren't necessarily being thrown, and in those non-throw situations extending Error is quite frivolous...

Is it worth creating a separate ScriptDeathError object (constructed with a ScriptDeath object) which we construct only as and when we're getting ready to explicitly throw a ScriptDeath? Other cases where we merely return, or reject a promise via a ScriptDeath instance can continue using the simple object.

The crazy+good UX path would likely entail having a config option. Ie. an interface setting in-game that controls the stack trace generation. ("Enable Stack traces on Script Death")

That way you could be more aggressive on the performance for the positive case without impacting users. Also makes this PR much more trivial to merge if it's controllable.

@alainbryden
Copy link
Contributor Author

alainbryden commented Oct 9, 2024

I'm going off on a bit of a tangent here, but I'm wondering if there are other clever ways we can invalidate the ns object that was previously passed to a script (and which is perhaps living longer than it should).

Like if instead of passing an NS instance when we invoke a script's main we passed a revocable Proxy to an NS object - then when any script is killed we can simply revoke the proxy and any further attempts to call NS functions would naturally generate a TypeError with a stack trace to where they tried to use the "dead" ns instance.

That's probably a bigger refactor, but it seems to have been designed precisely for our use case. To be sure, we'd still need to deal with many existing edge cases such as rejecting promises-in-progress, but it may simplify others.

Edit I guess upon closer inspection NSProxyHandler is already our own home-grown version of the Proxy<T> built-in, and helpers.checkEnvFlags is our "revoke" implementation, where we throw a ScriptDeath object if the instance is revoked.

src/Netscript/ScriptDeath.ts Outdated Show resolved Hide resolved
src/Netscript/ScriptDeath.ts Outdated Show resolved Hide resolved
Co-authored-by: David Walker <d0sboots@gmail.com>
@d0sboots
Copy link
Collaborator

d0sboots commented Oct 10, 2024

I see what you're saying. We are creating ScriptDeath objects in a few places where they aren't necessarily being thrown, and in those non-throw situations extending Error is quite frivolous...

Actually in all the situations I found, we do throw the ScriptDeath. (Or reject a Promise with it, which is equivalent.)

Now how do we make sure that we don't accidentally introduce ScriptDeath into one of the high-speed paths in the future? Or at least signal that it's a bad idea?

A combination of review diligence and general speed testing. (We do check to see how long various operations take, and if certain things, especially the common path of script creation/destruction, get slower, people with batchers will notice.

The "good" answer would involve automated tracking of metrics like scripts/second, but we are nowhere near there. I've done that sort of testing professionally, and it's a pain to set up due to its inherent flakiness, and only worth it if you actually are willing to spend the time to chase down every performance regression.

@d0sboots d0sboots merged commit cd99006 into bitburner-official:dev Oct 10, 2024
5 checks passed
antoinedube pushed a commit to antoinedube/bitburner-source that referenced this pull request Oct 16, 2024
* Update ScriptDeath to extend Error

- Rename the "name" property to "hostname" so that it doesn't clash with inherited Error.name property.
- Extend Error, this way if a user ends up catching it they can:
  - Test the type of error caught (name) and ignore it if desired
  - Get a stack trace and find out where they went wrong
  - Not require special case error printing logic for catching things that are neither strings, nor Error objects.

It is possible (but unlikely) that this could make killing scripts slower in some circumstances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants