Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Index source map lookup #53

Closed
wants to merge 7 commits into from
Closed

Conversation

jridgewell
Copy link
Contributor

@jridgewell jridgewell commented Oct 25, 2022

I found two bugs while using the package.

  1. The last token in a Regular map would be returned for very large lookup line numbers. Deferred till later PR
  2. The Index source map would recurse into section i, i+1, etc, when it should have recursed into only i-1.

The first deals with the "greatest lower bound" that the Mozilla Source Map package used. When searching for a token, we want to find the one closest to (line, col), given (gen_line, gen_col) <= (line, col). This means that looking up (0, 10_000) should return the last token on line 0 (assuming the generated column is less than 10,000). Effectively, the last segment in a line extends to infinity columns. But, if we look for a very large input line, eg (10_000, 0), we accidentally returned the last token in our map (again, assuming there are less than 10,000 generated lines). That's because that token's position is <= our lookup position. This is a break with Mozilla's behavior, where the returned token must be on our lookup line.

Similarly, an Index map had another bug with the greatest lower bound lookup behavior. Each section's offset behaves the same as a token, where a section map applies for all positions after its offset. So we need to again find a (off_line, off_col) <= (line, col). The old code would find only (off_line, off_col) > (line, col) (and then would keep searching beyond!).

Additionally, the lookup column of an Index lookup is only subtracted if the lookup line is the same as the section's offset line. Effectively, the section's first line is shifted by off_col, but all following lines start at generated column 0.

The old code improperly found the section to call recurse into. Sourcemap sections behave very similarly to the mappings, and we need to find the section whose offset is `<=` the desired position.

Additionally, the `col` is only subtracted if it is on the offset's line. This matches [Mozilla's behavior](https://github.com/mozilla/source-map/blob/0.6.1/lib/source-map-consumer.js#L1000-L1003).

And as a final bonus, we can binary search it to speed up searches on larger maps.
Token lookup uses "greatest lower bound" lookups, which means we need to find the token on the line which is `<=` than our desired column. But, the old code accidentally considered the last token to span infinite columns and infinite lines, meaning a lookup of `(10000000, 0)` would return the last token.
Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 on the fix for indexed source maps. We have a couple of places now that are essentially manual reimplementations of binary search. Is it feasible that we can just switch to std binary_search_by_key, or is there some minute differences there?

I’m not too convinced about the restriction that "token must be on same line" though.

Is this documented somewhere properly? Or is it just an implementation detail of the mozilla sourcemap implementation?
I was under the impression that we are just looking for a token match lower to the source position we want, as you said.

@jridgewell
Copy link
Contributor Author

Is it feasible that we can just switch to std binary_search_by_key, or is there some minute differences there?

I don't think it's possible to use Rust's native binary search for our case. Because we need the closest <=, and not a strict equality, the basic search won't work. We could extract into a single helper and that takes a &[T] and |v| v.get_offset() <= (line, pos) to share the code.

Is this documented somewhere properly? Or is it just an implementation detail of the mozilla sourcemap implementation?

It's an implementation detail, originally implemented in mozilla/source-map@699dff1, which points to https://groups.google.com/g/mozilla.dev.js-sourcemap/c/Uo26hB5v5oY as the reasoning.

@Swatinem
Copy link
Member

we need the closest <=, and not a strict equality

You can still use binary search, you just have to match on the result then, like here:

.get(match mapping {
Ok(a) => a,
Err(a) => a.saturating_sub(1),
})?

As for the behavior, thanks for posting the links. I believe this might make sense when you "compress" source code down to a single line.
But what if you have the following usecase:

// # a.js:
export function a() {
  return "a.js";
}
// # b.js:
export function b() {
  return "b.js";
}

You then simply concat these two files and create a sourcemap with 2 mappings:

// v first sourcemap token at "generated" line 1, column 1 points to "a.js" at line 1, column 1
export function a() {
  return "a.js";
}

// v second sourcemap token at "generated" line 5, column 1 points to "b.js" at line 1, column 1
export function b() {
  return "b.js"; // <- we are looking at "generated" line 6, column C
}

In this example, when we look up line 6, there is not mapping on that exact line, but rather one line before.

So we have two choices here:

  • return "b.js" line 1 column 1, which is a bit off when looking at the code, but we get an approximate match, enough to know which file we are in.
  • return nothing, which is also bad.

In this specific example, we would change from returning an approximate location to none at all, which is not really a nice thing to do.

I would not do this filtering in our library code and rather return the approximate token.
An API user can do that filtering later if they really care about this:

let line = 6;
let token = sm.lookup_token(line, 1).filter(|token| token.get_dst_line() == line);
assert_eq!(token, None);

Assuming the same example as above, we can just filter the Option further down to a matching "generated" line.

@jridgewell
Copy link
Contributor Author

jridgewell commented Oct 31, 2022

You can still use binary search, you just have to match on the result then, like here:

Oh! I clearly didn't read the docs closely enough, I didn't notice that the Err case gave you the insertion index. Will update soon.

In this specific example, we would change from returning an approximate location to none at all, which is not really a nice thing to do.

I'll split this into a separate PR, but I'd really suggest that this be implemented as Mozilla's does. We're using this package to implement error stack tracing, and with the current behavior we return results that do not actually exist in user's source code. We insert synthetic runtime code into the output bundles, which is not mapped (!t.has_source()). The current behavior causes us to incorrectly point to user code that is present on prior lines, which is extremely confusing.

function userCode() {// code doesn't throw.
}

(function runtimeCode() {
  // some error is thrown by invalid runtime behavior
  if (Math.random() >= 0.5) {
    throw new Error();
  } // Runtime was intended to run user code


  userCode();
});
//# sourceMappingURL=data:application/json;charset=utf-8;base64,eyJ2ZXJzaW9uIjozLCJuYW1lcyI6WyJ1c2VyQ29kZSIsInJ1bnRpbWVDb2RlIiwicmFuZG9tIl0sInNvdXJjZXMiOlsiaW5wdXQuanMiXSwic291cmNlc0NvbnRlbnQiOlsiZnVuY3Rpb24gdXNlckNvZGUoKSB7XG4gIC8vIGNvZGUgZG9lc24ndCB0aHJvdy5cbn1cblxuKGZ1bmN0aW9uIHJ1bnRpbWVDb2RlKCkge1xuICAvLyBzb21lIGVycm9yIGlzIHRocm93biBieSBpbnZhbGlkIHJ1bnRpbWUgYmVoYXZpb3JcbiAgaWYgKE1hdGgucmFuZG9tKCkgPj0gMC41KSB7XG4gICAgdGhyb3cgbmV3IEVycm9yKCk7XG4gIH1cblxuICAvLyBSdW50aW1lIHdhcyBpbnRlbmRlZCB0byBydW4gdXNlciBjb2RlXG4gIHVzZXJDb2RlKCk7XG59KSJdLCJtYXBwaW5ncyI6IkFBQUEsU0FBU0EsUUFBVCxHQUFvQixDQUNsQjtBQUNEOztBQUVELFVBQVVDLFdBQVY7RUFDRTtFQURGLFNBRVdDLE1BRlg7SUFBQTtFQUFBLEVBTUU7OztFQU5GO0FBQUEifQ==

The code thrown on line 7 ends up finding the token for } on line 2, and we map to } in source on line 3.

What about taking binary_search_by's approach and returning an Ok(exact_match) or Err(closest_match)?

@jridgewell jridgewell changed the title Fix Regular and Index source map lookup Fix Index source map lookup Oct 31, 2022
@Swatinem
Copy link
Member

Swatinem commented Nov 2, 2022

What about taking binary_search_by's approach and returning an Ok(exact_match) or Err(closest_match)?

That is an interesting idea, but would be a breaking change. I believe just chaining .filter(|token| token.get_dst_line() == line) onto the option is simple enough.

I’m wondering from the example you gave, you claim the additional runtime code does not come from a file, but does that mean it still has a sourcemap token? shouldn’t the error find that token then?

src/utils.rs Show resolved Hide resolved
Copy link
Contributor Author

@jridgewell jridgewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering from the example you gave, you claim the additional runtime code does not come from a file, but does that mean it still has a sourcemap token? shouldn’t the error find that token then?

Not for the the runtime code. This is how Babel handles it, if the associated code is injected during the transformer on a new line, the generator doesn't insert any sourceless tokens. But if the code is on the same line, then it'll insert 1 sourceless segment at the start.

@Swatinem
Copy link
Member

Swatinem commented Nov 8, 2022

Last weeks rustup added more lint rules, luckily there is only one thing to fix here ;-)

Sorry for the delay here.

@jridgewell
Copy link
Contributor Author

Done.

@Swatinem Swatinem enabled auto-merge (squash) November 9, 2022 08:11
@Swatinem Swatinem disabled auto-merge November 9, 2022 08:20
@Swatinem Swatinem mentioned this pull request Nov 9, 2022
@Swatinem
Copy link
Member

Swatinem commented Nov 9, 2022

There was a second clippy lint which I fixed in #54 so this is merged now, thank you @jridgewell 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants