Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Source Maps to support post-hoc debugging #46695

Open
5 tasks done
robpaveza opened this issue Nov 5, 2021 · 11 comments Β· May be fixed by #47152
Open
5 tasks done

Extend Source Maps to support post-hoc debugging #46695

robpaveza opened this issue Nov 5, 2021 · 11 comments Β· May be fixed by #47152
Assignees
Labels
Experimentation Needed Someone needs to try this out to see what happens In Discussion Not yet reached consensus Suggestion An idea for TypeScript

Comments

@robpaveza
Copy link

robpaveza commented Nov 5, 2021

Suggestion

πŸ” Search Terms

Source maps, post hoc debugging

βœ… Viability Checklist

My suggestion meets these guidelines:

  • This wouldn't be a breaking change in existing TypeScript/JavaScript code
  • This wouldn't change the runtime behavior of existing JavaScript code
  • This could be implemented without emitting different JS based on the types of the expressions
  • This isn't a runtime feature (e.g. library functionality, non-ECMAScript syntax with JavaScript output, new syntax sugar for JS, etc.)
  • This feature would agree with the rest of TypeScript's Design Goals.

⭐ Suggestion

I would like to propose that TypeScript create an experimental source map extension (version 4). The purpose of this would be to improve the ability to apply Source Map data to stack traces and to have the Source Map contain everything needed to go from a minified stack trace to an unminified stack trace without using function name guessing.

πŸ“ƒ Motivating Example

Suppose I receive the following information from my application:

TypeError: Cannot read properties of undefined (reading 'x')
    at re.draw (bundle.js:1:16372)
    at re.drawLayer (bundle.js:1:14170)
    at be.draw (bundle.js:1:74592)
    at Te.render (bundle.js:1:114230)
    at Te.reflowCanvas (bundle.js:1:113897)
    at HTMLDocument.anonymous (bundle.js:1:135849)

The source map will resolve the first stack frame to draw(src, sprite, x, y, frame), and will correctly point out that the failure here is that the undefined value is actually frame. However, that is not the name of the function. The function's name is draw, which is a member of the Render class.

If I simply apply the value of the names array to the function, decoding the stack trace is not particularly useful. It would look something like this:

TypeError: Cannot read properties of undefined (reading 'x')
    at frame (file1.ts:302:7)
    at draw (file1.ts:144:5)
    at Render (file2.ts:95:5)
    at drawArray (file3.ts:178:5)
    at render (file3.ts:155:5)
    at game (file4.ts:39:5)

Using the source map to navigate the source code by hand, I can reconstruct the original call stack:

TypeError: Cannot read properties of undefined (reading 'x')
    at Render#draw (file1.ts:302:7)
    at Render#drawLayer (file1.ts:144:5)
    at GameObject#draw (file2.ts:95:5)
    at Game#render (file3.ts:178:5)
    at Game#reflowCanvas (file3.ts:155:5)
    at [anonymous function passed to addEventListener] (file4.ts:39:5)

But doing this required that I dump the mappings and the source files from the source map and manually inspect the source files. In general, this requires that I use a library to parse the mappings field from Source Maps (because as of Source Maps v3, this field is stored in a stateful way).

πŸ’» Use Cases

I want to use this to improve in-production debugging experiences, specifically, to improve stack traces, particularly those that exist after minification.

Presently, to work around this, we need to do one of two things:

  • Manually reconstruct the stack by inspecting the source contents (as mentioned above)
  • Or, by writing code that "guesses" the function name by walking backwards from the location the Mapping produces. (This is the mechanism that stacktrace-gps uses, although it often doesn't work for TypeScript sources because it doesn't recognize type annotations).

Proposed changes to the Source Map spec

Substantively: Only the mappings field would be altered, and would be altered by adding the 6th field. The complete section is included below:

The β€œmappings” data is broken down as follows:

  • each group representing a line in the generated file is separated by a ”;”
  • each segment is separated by a β€œ,”
  • each segment is made up of 1,4 or 5, 5, or 6 variable length fields.

The fields in each segment are:

  1. The zero-based starting column of the line in the generated code that the segment represents. If this is the first field of the first segment, or the first segment following a new generated line (β€œ;”), then this field holds the whole base 64 VLQ. Otherwise, this field contains a base 64 VLQ that is relative to the previous occurrence of this field. Note that this is different than the fields below because the previous value is reset after every generated line.
  2. If present, an zero-based index into the β€œsources” list. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.
  3. If present, the zero-based starting line in the original source represented. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented. Always present if there is a source field.
  4. If present, the zero-based starting column of the line in the source represented. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented. Always present if there is a source field.
  5. If present, the zero-based index into the β€œnames” list associated with this segment. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.
    6. If present, the zero-based index into the "names" list associated with the call stack of this segment. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.

In addition, the version field of the spec should be bumped to 4.

Suggested names

  • If functions are named directly, they should be preserved (function foo() { ... }, const foo = () => { ... }, { foo: function() { ... }`
  • If the function name cannot be deduced from an assignment, it should either be [anonymous function] or [anonymous arrow function]
  • If the function appears to be passed as a callback, and the expression being called can be identified, its name should be used: [anonymous function passed to addEventListener]
  • If the area is not contained within a function, it should be the name Global code
@DanielRosenwasser DanielRosenwasser added Experimentation Needed Someone needs to try this out to see what happens In Discussion Not yet reached consensus Suggestion An idea for TypeScript labels Nov 5, 2021
@DanielRosenwasser
Copy link
Member

Hey Rob, thanks for the proposal here. My thoughts here are that it would be worthwhile to try to get this into the sourcemap spec first; using a branch of TypeScript as a proof of concept implementation might be useful to prove out the proposal for the spec.

I'll point out though that I don't think we even support the mappings field today. From what I recall, we did back in the TS 1.0 compiler, but didn't add it back in the TS 1.1 rewrite. If I'm correct, that means we've gone about 7 years without a ton of feedback around mappings as a field. To get this going, you'd probably need to implement mappings as a whole, but to be honest I don't think we'd want to take on an implementation that adds a lot of complexity or memory overhead to every project build.

@fatcerberus
Copy link

Isn't mappings the field that remaps line numbers? I know for a fact that TS doesn't preserve line numbers in its output but I was pretty sure step-by-step debugging worked with TS sources...

...yeah, I was right, mappings is just the line number mappings, and I wrote the below code specifically to support TypeScript debugging in ssj:
https://github.com/fatcerberus/sphere/blob/master/src/neosphere/source_map.c#L256-L279

@rbuckton
Copy link
Member

rbuckton commented Nov 6, 2021

I'll point out though that I don't think we even support the mappingsnames field today. From what I recall, we did back in the TS 1.0 compiler, but didn't add it back in the TS 1.1 rewrite. If I'm correct, that means we've gone about 7 years without a ton of feedback around mappingsnames as a field. To get this going, you'd probably need to implement mappingsnames as a whole, but to be honest I don't think we'd want to take on an implementation that adds a lot of complexity or memory overhead to every project build.

@DanielRosenwasser, I think you meant names here. The names field definition in the Source Map spec is fairly ambiguous, which is why we dropped support for it.

Several years ago, @andysterland and I did an investigation into other extensions to the source map spec, primarily around identifying scopes and aliased names inside of a scope. Unfortunately, nothing ever came of that endeavor as the solution we were investigating wasn't sufficient for tools like minifiers.

@robpaveza
Copy link
Author

Several years ago, @andysterland and I did an investigation into other extensions to the source map spec, primarily around identifying scopes and aliased names inside of a scope. Unfortunately, nothing ever came of that endeavor as the solution we were investigating wasn't sufficient for tools like minifiers.

I'm not sure what you and Andy discussed back then, but I was discussing with some folks here on DevTools and our tentative belief is that so long as TypeScript emitted a new entry in the VLQ, I believe that entry could simply be preserved ('ish' -- it might need to be offset if a downstream tool inserted new entries into the names table). But for a trivial transpilation such as this:

function foo(bar: number): never {
    throw new Error('Intentional.');
}
foo();

This would likely emit:

function foo(bar) {
    throw new Error('Intentional.');
}
foo();

TypeScript might first emit 2 entries into names: ['foo(bar: number)', 'Global code']. Then the new, proposed "scope" mapping entry would:

  • for lines 1 and 4, emit the equivalent of 2 (because of the relative encoding point of the VLQ values, the output might not be identical)
  • for lines 2 and 3, emit the equivalent of 1

Assuming a minifier like terser or uglify came later, my supposition is that, as long as they respected the original contents of the names field, those values should be pretty much pass-through.

Isn't mappings the field that remaps line numbers?

The source map spec outlines as many as 6 data that are contained when parsing the mappings field: generated line, generated column, source line, source column, index into sources (to find the name of the source file), and index into names which represents the token at the current point of execution. (This is to allow for variable renaming to still function within the live debugger).

My thoughts here are that it would be worthwhile to try to get this into the sourcemap spec first; using a branch of TypeScript as a proof of concept implementation might be useful to prove out the proposal for the spec.

Yeah, that was my thinking as well, and specifically that "it works in this TypeScript branch" might be a compelling reason to revise the spec. :)

@robpalme
Copy link

robpalme commented Nov 6, 2021

Hello friends, this is a valuable topic. I'm pleased to see interest in progressing the sourcemap spec.

We're currently using the pasta-sourcemaps extension (originally created by @ldarbi) to deal with this. Pasta stands for "Pretty (and) Accurate Stack Trace Analysis".

The extension solves the exact use-case outlined above. It permits accurate decoding of function names without guessing and without the need to consult the original source files. The sourcemap tells you everything.

pasta-sourcemaps works by adding an additional field to the sourcemap called "x_com_bloomberg_sourcesFunctionMappings" which contains a series of VLQ-encoded mappings that identify named function spans in the original source. So you first use a pre-existing decoding function (that reads "mappings") to identify a source position (file, line, column), and then use that position to lookup the original function name in the dedicated list of function spans. Here's the spec.

pasta-sourcemaps has been used in production as part of the Bloomberg Terminal's crash stack telemetry for over two years, handling millions of stacks. It supports .js/.jsx/.ts/.tsx source files, is mature, and has a nifty logo. We aim to keep it up to date with the latest TypeScript version - though it is temporarily lagging on TS 4.3.

We had early discussions with @bcoe about getting pasta support into Node but never got around to acting on it. Since then, Node gained a best-effort (guessing) implementation, but would still benefit from something like this to increase reliability.

Please take a look at the approach for inspiration. It would be interesting to compare extending the "mappings" vs adding an extra field. We've not been in control of the tools (like TypeScript) that generate "mappings" and therefore it seemed easiest to add an extra field that can be guaranteed to be a complete record of all the necessary function ranges. Whereas the sourcemap spec itself says nothing about completeness of "mappings" - it's up to the specific encoder (e.g. in TypeScript, or in Terser) to decide when to emit them, so different tools make different decisions on the fidelity of the points.

So thank you @robpaveza for raising this issue! I'd love to see this problem more widely solved. Beyond stack trace decoding, this information could also be used in DevTools, e.g. the VSCode debugger's call stack could use it to show the original function name when debugging minified code.

@bcoe
Copy link

bcoe commented Nov 8, 2021

I would be interested in being looped in to this work, if an effort is made to dust off the Source Map spec. I would love to make sure Istanbul and Node.js both support extensions.

@robpaveza
Copy link
Author

@robpalme - I've been mulling over the options here, and I think that the most interesting possibility here is to leverage what you've put together with pasta-sourcemaps. To that end, I've authored an explainer here - see the direct explainer link - which borrows inspiration from pasta-sourcemaps but splits the function names into a second array, rather than into the names array. I did this primarily so that for the scenario:

file.ts -> file.js -> file.min.js

The minifier doesn't need to know anything about these fields, it just needs to pass them through. For tools like Rollup or Webpack, it's also fairly trivial: if they have access to the original source files anyway, the originally-transpiled values can just move into the scopes field in the same way.

Please feel welcome to leave comments in the explainer. Unfortunately, since it's been 8 years since the previous Source Maps specification, the Mozilla list is gone (not even archived). I've asked a few colleagues if they have contacts who worked on it previously, and I'll also be offering this proposal up to the Chromium DevTools list for discussion. I think it'd be great if we can leverage any of your team's learnings from having worked with pasta-sourcemaps.

@robpalme
Copy link

Thanks @robpaveza - sounds good - I'll do a detailed review on the explainer PR next week.

@robpalme
Copy link

Hey, apologies for the slow turnaround on this. @ldarbi (original author of pasta-sourcemaps ) would like to contribute. So I'll hand the review over to her.

@robpaveza
Copy link
Author

  • I'll have a pull request up, maybe Tuesday, exploring this. Here's an example file:
class Test {
  constructor() {

  }

  testMethod() {

  }

  get foo() {
    return 5;
  }

  public x = () => { console.log('x'); };

  static bar(): void {

  }
}

export function configureExample() {
  if (!('flatten' in Array.prototype)) {
    // @ts-ignore
    Array.prototype.flatten = () => {
      throw new Error('Not implemented.');
    };
  }
}

The source map which is generated by this looks like so:

{
    "version": 4,
    "file": "example.js",
    "sourceRoot": "",
    "sources": [
        "../src/example.ts"
    ],
    "names": [],
    "mappings": "AAAA,MAAM,IAAI;IACR;QAYO,MAAC,GAAG,GAAG,EAAE,GAAG,OAAO,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC;IAVvC,CAAC;IAED,UAAU;IAEV,CAAC;IAED,IAAI,GAAG;QACL,OAAO,CAAC,CAAC;IACX,CAAC;IAID,MAAM,CAAC,GAAG;IAEV,CAAC;CACF;AAED,MAAM,UAAU,gBAAgB;IAC9B,IAAI,CAAC,CAAC,SAAS,IAAI,KAAK,CAAC,SAAS,CAAC,EAAE;QACnC,aAAa;QACb,KAAK,CAAC,SAAS,CAAC,OAAO,GAAG,GAAG,EAAE;YAC7B,MAAM,IAAI,KAAK,CAAC,kBAAkB,CAAC,CAAC;QACtC,CAAC,CAAC;KACH;AACH,CAAC",
    "sourcesContent": [
        "class Test {\r\n  constructor() {\r\n\r\n  }\r\n\r\n  testMethod() {\r\n\r\n  }\r\n\r\n  get foo() {\r\n    return 5;\r\n  }\r\n\r\n  public x = () => { console.log('x'); };\r\n\r\n  static bar(): void {\r\n\r\n  }\r\n}\r\n\r\nexport function configureExample() {\r\n  if (!('flatten' in Array.prototype)) {\r\n    // @ts-ignore\r\n    Array.prototype.flatten = () => {\r\n      throw new Error('Not implemented.');\r\n    };\r\n  }\r\n}\r\n"
    ],
    "scopeNames": [
        "Test.x",
        "constructor for Test",
        "Test.testMethod",
        "Test.get foo",
        "static Test.bar",
        "Array.prototype.flatten",
        "configureExample",
        "Global code"
    ],
    "scopes": "AakBAwCA,AZeEGC,AIcEGC,AIWEGC,AMoBEGC,AQmCEKC,AHkCOCC,ApBA4BAC"
}

The scopes value here can be decoded to the following array:

[0,13,18,0,40,0], [0,-12,15,2,3,1], [0,4,14,2,3,1], [0,4,11,2,3,1], [0,6,20,2,3,1], [0,8,35,2,5,1], [0,-3,34,7,1,1], [0,-20,0,28,0,1]

It would decode as follows:

  • Item 1:
    • Source index 0
    • From/to: 13:18...13:40 (0-based)
    • Scope name: Test.x
  • Item 2:
    • Source index 0
    • From/to: 1:15...3:3 (0-based)
    • Scope name: constructor for Test
  • Item 3:
    • Source index 0
    • From/to: 5:14...7:3 (0-based)
    • Scope name: Test.testMethod
  • Item 4:
    • Source index 0
    • From/to: 9:11...11:3 (0-based)
    • Scope name: Test.get foo
  • Item 5:
    • Source index 0
    • From/to: 15:20...17:3 (0-based)
    • Scope name: static Test.bar
  • Item 6:
    • Source index 0
    • From/to: 23:35...25:5 (0-based)
    • Scope name: Array.prototype.flatten
  • Item 7:
    • Source index 0
    • From/to: 20:34...27:1 (0-based)
    • Scope name: configureExample
  • Item 8:
    • Source index 0
    • From/to: 0:0...28:0 (0-based)
    • Scope name: Global code

So if we had an exception in which the original source location was decoded to be at 24:6 (throw new Error('Not implemented'), we would scan all possible items looking for the candidates. The algorithm should first resolve the possible candidates:

  • Item 8 (Global code)
  • Item 7 (configureExample)
  • Item 6 (Array.prototype.flatten)

We would then analyze these and find that the nearest scope that encloses the area in question is Item 6, and that is the correct one. Without doing this in two passes, we can probably say "largest start line:col in which end line:col is after the location being searched" and do this in a single pass.

@bmeurer
Copy link
Contributor

bmeurer commented Dec 14, 2021

Adding @concavelenz here, who might have an opinion and relevant insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Experimentation Needed Someone needs to try this out to see what happens In Discussion Not yet reached consensus Suggestion An idea for TypeScript
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants