Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp wcwidth #1470

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 98 additions & 5 deletions src/CharWidth.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,69 @@
* Copyright (c) 2016 The xterm.js authors. All rights reserved.
* @license MIT
*/
import { IwcwidthOptions } from './Types';

export const wcwidth = (function(opts: {nul: number, control: number}): (ucs: number) => number {
export function wcwidthFactory(opts: IwcwidthOptions): (num: number) => number {
// extracted from https://www.cl.cam.ac.uk/%7Emgk25/ucs/wcwidth.c
// ambiguous maps taken from https://chromium.googlesource.com/native_client/nacl-newlib/+/master-backup/newlib/libc/string/wcwidth.c
// ambiguous characters
const AMBIGUOUS_BMP = [
[0x00A1, 0x00A1], [0x00A4, 0x00A4], [0x00A7, 0x00A8],
[0x00AA, 0x00AA], [0x00AE, 0x00AE], [0x00B0, 0x00B4],
[0x00B6, 0x00BA], [0x00BC, 0x00BF], [0x00C6, 0x00C6],
[0x00D0, 0x00D0], [0x00D7, 0x00D8], [0x00DE, 0x00E1],
[0x00E6, 0x00E6], [0x00E8, 0x00EA], [0x00EC, 0x00ED],
[0x00F0, 0x00F0], [0x00F2, 0x00F3], [0x00F7, 0x00FA],
[0x00FC, 0x00FC], [0x00FE, 0x00FE], [0x0101, 0x0101],
[0x0111, 0x0111], [0x0113, 0x0113], [0x011B, 0x011B],
[0x0126, 0x0127], [0x012B, 0x012B], [0x0131, 0x0133],
[0x0138, 0x0138], [0x013F, 0x0142], [0x0144, 0x0144],
[0x0148, 0x014B], [0x014D, 0x014D], [0x0152, 0x0153],
[0x0166, 0x0167], [0x016B, 0x016B], [0x01CE, 0x01CE],
[0x01D0, 0x01D0], [0x01D2, 0x01D2], [0x01D4, 0x01D4],
[0x01D6, 0x01D6], [0x01D8, 0x01D8], [0x01DA, 0x01DA],
[0x01DC, 0x01DC], [0x0251, 0x0251], [0x0261, 0x0261],
[0x02C4, 0x02C4], [0x02C7, 0x02C7], [0x02C9, 0x02CB],
[0x02CD, 0x02CD], [0x02D0, 0x02D0], [0x02D8, 0x02DB],
[0x02DD, 0x02DD], [0x02DF, 0x02DF], [0x0391, 0x03A1],
[0x03A3, 0x03A9], [0x03B1, 0x03C1], [0x03C3, 0x03C9],
[0x0401, 0x0401], [0x0410, 0x044F], [0x0451, 0x0451],
[0x2010, 0x2010], [0x2013, 0x2016], [0x2018, 0x2019],
[0x201C, 0x201D], [0x2020, 0x2022], [0x2024, 0x2027],
[0x2030, 0x2030], [0x2032, 0x2033], [0x2035, 0x2035],
[0x203B, 0x203B], [0x203E, 0x203E], [0x2074, 0x2074],
[0x207F, 0x207F], [0x2081, 0x2084], [0x20AC, 0x20AC],
[0x2103, 0x2103], [0x2105, 0x2105], [0x2109, 0x2109],
[0x2113, 0x2113], [0x2116, 0x2116], [0x2121, 0x2122],
[0x2126, 0x2126], [0x212B, 0x212B], [0x2153, 0x2154],
[0x215B, 0x215E], [0x2160, 0x216B], [0x2170, 0x2179],
[0x2190, 0x2199], [0x21B8, 0x21B9], [0x21D2, 0x21D2],
[0x21D4, 0x21D4], [0x21E7, 0x21E7], [0x2200, 0x2200],
[0x2202, 0x2203], [0x2207, 0x2208], [0x220B, 0x220B],
[0x220F, 0x220F], [0x2211, 0x2211], [0x2215, 0x2215],
[0x221A, 0x221A], [0x221D, 0x2220], [0x2223, 0x2223],
[0x2225, 0x2225], [0x2227, 0x222C], [0x222E, 0x222E],
[0x2234, 0x2237], [0x223C, 0x223D], [0x2248, 0x2248],
[0x224C, 0x224C], [0x2252, 0x2252], [0x2260, 0x2261],
[0x2264, 0x2267], [0x226A, 0x226B], [0x226E, 0x226F],
[0x2282, 0x2283], [0x2286, 0x2287], [0x2295, 0x2295],
[0x2299, 0x2299], [0x22A5, 0x22A5], [0x22BF, 0x22BF],
[0x2312, 0x2312], [0x2460, 0x24E9], [0x24EB, 0x254B],
[0x2550, 0x2573], [0x2580, 0x258F], [0x2592, 0x2595],
[0x25A0, 0x25A1], [0x25A3, 0x25A9], [0x25B2, 0x25B3],
[0x25B6, 0x25B7], [0x25BC, 0x25BD], [0x25C0, 0x25C1],
[0x25C6, 0x25C8], [0x25CB, 0x25CB], [0x25CE, 0x25D1],
[0x25E2, 0x25E5], [0x25EF, 0x25EF], [0x2605, 0x2606],
[0x2609, 0x2609], [0x260E, 0x260F], [0x2614, 0x2615],
[0x261C, 0x261C], [0x261E, 0x261E], [0x2640, 0x2640],
[0x2642, 0x2642], [0x2660, 0x2661], [0x2663, 0x2665],
[0x2667, 0x266A], [0x266C, 0x266D], [0x266F, 0x266F],
[0x273D, 0x273D], [0x2776, 0x277F], [0xE000, 0xF8FF],
[0xFFFD, 0xFFFD]
];
const AMBIGUOUS_HIGH = [
[0xF0000, 0xFFFFD], [0x100000, 0x10FFFD]
];
// combining characters
const COMBINING_BMP = [
[0x0300, 0x036F], [0x0483, 0x0486], [0x0488, 0x0489],
Expand Down Expand Up @@ -58,6 +118,12 @@ export const wcwidth = (function(opts: {nul: number, control: number}): (ucs: nu
[0x1D242, 0x1D244], [0xE0001, 0xE0001], [0xE0020, 0xE007F],
[0xE0100, 0xE01EF]
];

const nul = opts.nul | 0;
const control = opts.control | 0;
const custom = opts.custom || Object.create(null);
const ambiguous = opts.ambiguous || null;

// binary search
function bisearch(ucs: number, data: number[][]): boolean {
let min = 0;
Expand All @@ -81,11 +147,25 @@ export const wcwidth = (function(opts: {nul: number, control: number}): (ucs: nu
function wcwidthBMP(ucs: number): number {
// test for 8-bit control characters
if (ucs === 0) {
return opts.nul;
return nul;
}
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0)) {
return opts.control;
return control;
}

// custom overrrides
if (custom && custom[ucs]) {
return custom[ucs];
}

// binary search for ambiguous characters
// only done if ambiguous is explicitly set
if (ambiguous) {
if (bisearch(ucs, AMBIGUOUS_BMP)) {
return ambiguous;
}
}

// binary search in table of non-spacing characters
if (bisearch(ucs, COMBINING_BMP)) {
return 0;
Expand All @@ -111,6 +191,18 @@ export const wcwidth = (function(opts: {nul: number, control: number}): (ucs: nu
(ucs >= 0xffe0 && ucs <= 0xffe6)));
}
function wcwidthHigh(ucs: number): 0 | 1 | 2 {
// custom overrrides
if (custom && custom[ucs]) {
return custom[ucs];
}

// binary search for ambiguous characters
// only done if ambiguous is explicitly set
if (ambiguous) {
if (bisearch(ucs, AMBIGUOUS_HIGH)) {
return ambiguous;
}
}
if (bisearch(ucs, COMBINING_HIGH)) {
return 0;
}
Expand All @@ -119,7 +211,6 @@ export const wcwidth = (function(opts: {nul: number, control: number}): (ucs: nu
}
return 1;
}
const control = opts.control | 0;
let table: number[] | Uint32Array = null;
function init_table(): number[] | Uint32Array {
// lookup table for BMP
Expand Down Expand Up @@ -168,4 +259,6 @@ export const wcwidth = (function(opts: {nul: number, control: number}): (ucs: nu
// do a full search for high codepoints
return wcwidthHigh(num);
};
})({nul: 0, control: 0}); // configurable options
}

export const wcwidthDefault = wcwidthFactory({nul: 0, control: 0});
18 changes: 16 additions & 2 deletions src/InputHandler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import { C0, C1 } from './EscapeSequences';
import { CHARSETS, DEFAULT_CHARSET } from './Charsets';
import { CHAR_DATA_CHAR_INDEX, CHAR_DATA_WIDTH_INDEX, CHAR_DATA_CODE_INDEX } from './Buffer';
import { FLAGS } from './renderer/Types';
import { wcwidth } from './CharWidth';
import { wcwidthFactory, wcwidthDefault } from './CharWidth';
import { EscapeSequenceParser } from './EscapeSequenceParser';

/**
Expand Down Expand Up @@ -112,6 +112,7 @@ class DECRQSS implements IDcsHandler {
*/
export class InputHandler implements IInputHandler {
private _surrogateHigh: string;
private _wcwidth: (ucs: number) => number;

constructor(
private _terminal: any, // TODO: reestablish IInputHandlingTerminal here
Expand Down Expand Up @@ -282,6 +283,19 @@ export class InputHandler implements IInputHandler {
*/
this._parser.setDcsHandler('$q', new DECRQSS(this._terminal));
this._parser.setDcsHandler('+q', new RequestTerminfo(this._terminal));

/**
* init wcwidth with default version
*/
this._wcwidth = wcwidthDefault;
}

public setWcwidthOptions(opts: {ambiguous?: 0 | 1 | 2, custom?: {[key: number]: 0 | 1 | 2}}): void {
if (opts.ambiguous === undefined && opts.custom === undefined) {
this._wcwidth = wcwidthDefault;
} else {
this._wcwidth = wcwidthFactory({nul: 0, control: 0, ambiguous: opts.ambiguous, custom: opts.custom});
}
}

public parse(data: string): void {
Expand Down Expand Up @@ -345,7 +359,7 @@ export class InputHandler implements IInputHandler {

// calculate print space
// expensive call, therefore we save width in line buffer
chWidth = wcwidth(code);
chWidth = this._wcwidth(code);

// get charset replacement character
// FIXME: Should code be replaced as well?
Expand Down
11 changes: 11 additions & 0 deletions src/Types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ export interface ICompositionHelper {
export interface IInputHandler {
parse(data: string): void;
print(data: string, start: number, end: number): void;
setWcwidthOptions(opts: {ambiguous?: 0 | 1 | 2, custom?: {[key: number]: 0 | 1 | 2}}): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to expose this as a setting? What other terminals have you checked that also do this?

Copy link
Member Author

@jerch jerch May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the ambiguous setting is comparable to the setting in iterm2, although they only offer to set full width on or off (here 2 or 1, we still support the current "default" by omitting this setting). It might be useful to expose this so people can hot patch the width of ambiguous chars if they encounter issues. I think VTE also supports this by an env variable. There is even a test file, that will show the difference (just open the attachment 00example.txt here mintty/mintty#88 (comment) in an editor in xterm.js with ambiguous unset, set to 1 and set to 2). I also gonna add some tests to show/test the difference.
The custom setting might be helpful internally, if the systems wcwidth does not agree with our implementation - we could simply overload it with the system setting to avoid cursor and line ending problems. I would not expose this the user since it might break more than it will help. Still we could offer an addon or something similar to do the nasty low level stuff.

Local terminals are normally not affected by this since they can use the systems wcwidth (and will have automatically the same settings). We cant do that in a browser component since we have no access to the C land. In VSCode (and similar "local" terminals) this could be achieved by some additional node package with access the system wcwidth (some C++ binding), in the demo it gets more tricky to load those data, maybe by some addon with a server addition.

See also my comment here #1467 (comment) which kinda addresses this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it might be useful to allow someone to pass a custom dictionary, in the case where we want to match with the system's wcwidth, it might be better if we can instead pass a function, because that matches the API POSIX gives us.

So, I'd recommend that the option be supplied as custom: (key: number) => 0 | 1 | 2 | undefined, where undefined causes fall-through behavior. If someone wants to use an object, then they can implement it as a function stub (e.g. (key) => {123: 2}[key]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgw Yupp thats better, gonna change it.


/** C0 BEL */ bell(): void;
/** C0 LF */ lineFeed(): void;
Expand Down Expand Up @@ -509,3 +510,13 @@ export interface IEscapeSequenceParser {
setErrorHandler(callback: (state: IParsingState) => IParsingState): void;
clearErrorHandler(): void;
}

/**
* Configure options for wcwidth
*/
export interface IwcwidthOptions {
nul: 0 | 1 | 2;
control: 0 | 1 | 2;
ambiguous?: 0 | 1 | 2;
custom?: {[key: number]: 0 | 1 | 2};
}