Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util.stripVTControlCharacters does not strip hyperlinks #53697

Closed
isaacs opened this issue Jul 2, 2024 · 11 comments
Closed

util.stripVTControlCharacters does not strip hyperlinks #53697

isaacs opened this issue Jul 2, 2024 · 11 comments
Labels
confirmed-bug Issues with confirmed bugs. util Issues and PRs related to the built-in util module.

Comments

@isaacs
Copy link
Contributor

isaacs commented Jul 2, 2024

Version

22.4.0

Platform

Darwin moxy.lan 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:17:33 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6031 arm64

Subsystem

util

What steps will reproduce the bug?

console.log(util.stripVTControlCharacters('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello'))

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

Should output:

This is a link hello

(With "This is a link" not hyperlinked.)

What do you see instead?

ttp://example.comThis is a link;; hello

Additional information

OCS codes are a bit tricky to capture and strip, but this is what I'm using in ansi-to-pre:

/\u001B\]8;;(.*?)(?:\u001B\\|\u0007)(.*?)\u001B]8;;(?\u001B\\|\u0007)/

Adding str.replaceAll(/\u001b\]8;;(.*?)(?:\u001b\\|\u0007)(.*?)\u001b]8;;(?:\u001b\\|\u0007)/g, '$2') should do the right thing, but not sure if there's some better way to do it in context.

@avivkeller avivkeller added util Issues and PRs related to the built-in util module. repro-exists labels Jul 2, 2024
@avivkeller
Copy link
Member

avivkeller commented Jul 2, 2024

+ repro-exists:

└─$ node
Welcome to Node.js v22.3.0.
Type ".help" for more information.
> util.stripVTControlCharacters('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello')
ttp://example.comThis is a link;; hello

@avivkeller
Copy link
Member

avivkeller commented Jul 2, 2024

The current RegEx used is:

/[\u001B\u009B][[\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\d\/#&.:=?%@~_]+)*|[a-zA-Z\d]+(?:;[-a-zA-Z\d\/#&.:=?%@~_]*)*)?\u0007)|(?:(?:\d{1,4}(?:;\d{0,4})*)?[\dA-PR-TZcf-ntqry=><~]))/g

@avivkeller
Copy link
Member

avivkeller commented Jul 2, 2024

I'm currently building and testing @isaacs suggested regex.

@isaacs
Copy link
Contributor Author

isaacs commented Jul 2, 2024

Oh, hey, looks like you can put other parameters between the ; in OCS 8 codes, so this regexp would probably be a bit more complete:

/\u001b\]8;[^;]*?;(.*?)(?:\u001b\\|\u0007)(.*?)\u001b]8;[^;]*?;(?:\u001b\\|\u0007)/g

Note the addition of [^;]*? between the ;; chars.

(I don't know of any terminal emulators or programs that do put parameters there, as it's just "reserved for future use", but that's what the spec says it can be.)

@isaacs
Copy link
Contributor Author

isaacs commented Jul 2, 2024

I was wrong, the id parameter is used in some cases to control link hover behavior, so yes, would need to strip that.

@avivkeller
Copy link
Member

I'm seeing a number of failures with your RegEx, but I do agree the current one may have a bug. (I would CC a util team but there isn't one :/)

@sindresorhus
Copy link

The regex at https://github.com/chalk/ansi-regex/blob/main/index.js supports links. You could maybe find some inspiration there.

@avivkeller
Copy link
Member

Hi @sindresorhus,

IIRC, the regex used is from ansi-regex, but an older version.

Later today, I'll test if a newer version of the regex fixes the issue.

@avivkeller
Copy link
Member

@sindresorhus

// From https://github.com/chalk/strip-ansi/blob/main/index.js
function ansiRegex({onlyFirst = false} = {}) {
	const pattern = [
		'[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]+)*|[a-zA-Z\\d]+(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?\\u0007)',
		'(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-nq-uy=><~]))'
	].join('|');

	return new RegExp(pattern, onlyFirst ? undefined : 'g');
}

console.log('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello'.replace(ansiRegex(), ''));

Outputs:

"ttp://example.com\u001b\\This is a link;;\u001b\\ hello"

https://runkit.com/6689b91eebd0a700080cd8b3/668c3336e85de300088f3f39

@avivkeller
Copy link
Member

I've opened an issue in chalk/ansi-regex#56, which is the source for the RegEx used by Node.js. Once resolved, I'll update the RegEx.

@avivkeller avivkeller added confirmed-bug Issues with confirmed bugs. and removed repro-exists labels Sep 10, 2024
@avivkeller
Copy link
Member

Fixed by 5b3f3c5...9416354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. util Issues and PRs related to the built-in util module.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants