-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional character joiner #1460
Conversation
@princjef Nice, this would also be a good entry point for grapheme support in the future, maybe as an addon. Can you tell if the ligature detection will be able to spot most grapheme clusters too (since most end up being rendered differently) or if we have to build those blocks beforehand? Some notes on grapheme support: |
@jerch I'm not very familiar with grapheme clusters but from what i can tell the principles are similar in certain cases. If the graphemes are rendered as ligatures in the font it's likely that they would use very similar logic to programming ligatures and could potentially be covered with the same logic/addon. However, the link you sent also mentions this:
So in that regard, I think that this feature will work for the purposes of rendering grapheme clusters that have special rendering as a unit within the current font. However, the implementation in this PR does not treat such constructs as single units for purposes of selection or deletion as described in the link you provided. In the case of programming ligatures (or font ligatures of other types), the character joining is purely a visual treatment, so such behavior doesn't make sense. Perhaps there would need to be a companion function to handle the selection/deletion properties of the grapheme clusters, since it sounds like that behavior doesn't necessarily map 1-1 to a rendered ligature |
@jerch I was thinking we would leverage the eventual ligature support for these languages as well. Could we do that based on unicode ranges? For example all adjacent characters from U+0E00-U+0E7F (Thai) are drawn consecutively. If this is the approach we take maybe we should allow multiple character joiners and have a built in one to handle i18n? This would also mirror how link handlers work.
I think selection will probably be fine still, it might look a little off (for only RTL languages?) but we can handle that later. We don't have to worry about deletion as it's handled by the shell. Here's the closest issue tracking that atm #701, we would probably want to support grapheme clusters for LTR languages first. |
@princjef I'm in the process of adding a way to swap the renderer out #1432, these character joiners only apply to the canvas renderer which makes the API a little confusing. Perhaps calling it out that it only applies to the canvas renderer in the jsdoc is enough? @bgw any thoughts on how it integrates with |
Yes multiple character joiners should do the trick. Grapheme support should also cover all mandatory ligatures (even the emoji color joiners are defined as graphemes), but not the "nice to have" alternatives. Not sure yet if we need the full algo or can get away with some cheaper range assumptions. The algo is kinda expensive since every char needs to be evaluate twice (as Edit: |
@Tyriar I think that makes sense. I almost implemented it in the first pass just so it aligned with the link matchers. Having a second use case for the joiner seems like sufficient reason to add the extra logic.
Because of how different the canvas and DOM are from a technology perspective, I would expect that more features for only one or the other will pop up over time. Maybe we can segment these kinds of capabilities in a way that more strongly indicates which renderer they're used for. Perhaps namespacing it with something like
@jerch can you elaborate on what you mean by this? |
Well the DOM renderer is intentionally barebones. I'm thinking we just add a disclaimer in the jsdoc of rendererType (added in my PR) that ligatures are not supported in the DOM renderer. |
This is just a little rant into my own direction for this reason: at an early stage the terminal splits the chars into the cell model - just to join them later back together. Can be optimized if done right at the early stage. Its partly my fault that the early stage does that. |
27d099d
to
eabb25c
Compare
Updated with support for mulitple character joiners |
@princjef sorry for the delay in looking at this as I've been a bit busy. I just wanted a status update on this, would you say this PR is good to go from your perspective and just needs to react to feedback? Trying to budget time for it 😄 |
@Tyriar no worries I know how it goes 😄 I'm happy with it in its current state. The main place that I'd love some extra attention for feedback is the treatment of double width/zero width/etc. characters when determining the width of the replacement. I'm not familiar with all of the ins and outs there so I'm sure there are some bugs. Let me know if it would be helpful to see the usage in the |
@princjef great, I'll try get some time to look at this over the next couple of weeks (depending on my other priorities). |
src/renderer/Renderer.ts
Outdated
}; | ||
|
||
this._renderLayers.forEach(l => { | ||
if (l.registerCharacterJoiner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving registerCharacterJoiner
into BaseRenderLayer
with a default no-op implementation means you don't need this if
plus you'll get strong typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a result of moving things into a CharacterJoinerRegistry
, there is no longer a need to register the joiners directly with the layers. The registry is just passed to the constructor of the TextRenderLayer
now
src/renderer/Renderer.ts
Outdated
public registerCharacterJoiner(handler: CharacterJoinerHandler): number { | ||
const joiner: ICharacterJoiner = { | ||
id: this._nextJoinerId++, | ||
handler: handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can simply be handler
src/renderer/TextRenderLayer.ts
Outdated
@@ -52,6 +54,7 @@ export class TextRenderLayer extends BaseRenderLayer { | |||
terminal: ITerminal, | |||
firstRow: number, | |||
lastRow: number, | |||
foreground: boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if joiners get passed in instead of marking as foreground? That way it can be passed in when foreground is drawn and null for background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like that. feels cleaner
src/renderer/TextRenderLayer.ts
Outdated
@@ -190,7 +242,7 @@ export class TextRenderLayer extends BaseRenderLayer { | |||
} else { | |||
this._ctx.fillStyle = this._colors.foreground.css; | |||
} | |||
this.fillBottomLineAtCells(x, y); | |||
this.fillBottomLineAtCells(x, y, width); | |||
this._ctx.restore(); | |||
} | |||
this.drawChar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think code
can be infinite here if characters are being joined? I'm not totally sure how that will affect drawing/caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the trouble with the joined characters is that there isn't one code that defines the whole sequence. I did a pass through that logic and the code didn't seem to have any bearing on uncached drawing. Since the code seems to be primarily used for determining caching, I made it infinity to avoid clashing with any existing characters and to steer clear of the range that is actually cached at the moment. If the dynamic character atlas eventually expands to try to cache all character codes, we could definitely end up with a problem down the line.
I think the most reasonable alternative is to pass -1 for the code, as no valid character will ever have that code. We can then use that for cache control and essentially ignore any negative codes. I didn't use -1 initially because I saw at least one or two checks that didn't handle negative character codes and wanted to keep my changes as localized as possible at least initially.
Caching of ligatures is definitely possible but would require something more expressive than a single number to identify the characters drawn. I think we can punt on that part of it for the time being.
src/renderer/TextRenderLayer.ts
Outdated
currentIndex++; | ||
} | ||
|
||
// Process any trailing ranges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a high level explanation of how this ranges/sub-ranges thing works? It's not totally clear yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm confused about the use of range vs subrange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They mean the same thing relative to the text for the line being processed, though the naming I chose is definitely confusing as I read it back.
A range/subrange is a consecutive sequence of characters in the input text represented as a start and end index. In the context of this method (this._getJoinedCharacters
), the range that I'm talking about with "trailing ranges" and so on is the start and end index of a sequence of characters within the input text that have the same foreground color and attributes (background is specifically excluded because it doesn't affect joining characters).
The ranges returned by this._getSubRanges()
are zero or more start/end index pairs contained within the range mentioned above that represent the locations of the ligatures that were found.
For example, lets say I'm processing a line "a -> b -> c -> d", where "->" forms a ligature in the font and "a -> b -> " is a different color than "c -> d".
- The initial ranges identified as same-styled (via
rangeStartIndex
andcurrentIndex
) would be 0-10 and 10-16 (corresponding with "a -> b -> " and "c -> d") - We call
this._getSubRanges("a -> b -> c -> d", 0, 10)
, and find ligatures at subranges 2-4 and 7-9, returned as[[2, 4], [7, 9]]
- We then call
this._getSubRanges("a -> b -> c -> d", 10, 16)
, and find one ligature at the subrange 12-14, returned as[[12, 14]]
- The 'subranges' returned are all combined into the final array of ranges of characters that we should join
[[2, 4], [7, 9], [12, 14]]
, which is returned bythis._getJoinedCharacters()
The principal reason for breaking the line into chunks when passing to the joiner is to cache more effectively, as the subsequences are more likely to be present for multiple renders than a full line. It also has the side benefit that at the end of this._getJoinedCharacters()
we know any ligatures we find are valid because the ranges passed to the joiner have the same style.
Hopefully that clears things up (albeit in a long-winded way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subranges concept was confusing me as I refactored so I renamed it to joinedRanges
, which better describes the concept
@Tyriar let me know what you think about the character code stuff and if you think any changes should be made to the range/subrange parts. once everything has been resolved I'll rebase and push a new version |
I rebased it myself as some pretty significant changes happened (there is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments, it looks like this should be relatively safe to merge in and iterate on once these comments are resolved since joinedRanges.length
is checked for the majority of new code paths.
* (exclusive) indexes of ranges that should be rendered as a single unit. | ||
* @return The ID of the new joiner, this can be used to deregister | ||
*/ | ||
registerCharacterJoiner(handler: (text: string) => [number, number][]): number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API looks good overall and just have some suggestions on the documenation.
- It should mention that performance is extremely important as this will be run every single time a line is rendered as I understand it.
- It should mention how multiple registered character joiners interact, graphemes will eventually be built in using this same API in a similar way to how web links are supported out of the box so this is an important detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
src/renderer/TextRenderLayer.ts
Outdated
@@ -79,6 +84,41 @@ export class TextRenderLayer extends BaseRenderLayer { | |||
continue; | |||
} | |||
|
|||
// Just in case we ended up in the middle of a range, lop off any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this happen only when the character joiner returned invalid results? Could we move this validation into _getJoinedCharacters
if it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to end up in here is if a joined range somehow starts in the middle of a sequence of characters that is handled as a group (basically if it starts with a zero-width character). This seems exceedingly unlikely to happen but I included it just in case. If we're not worried about that case I can just remove it from here.
The alternative would be to pass the understanding of widths and overlaps into the underlying character joiner logic, but I'm pretty sure I'd have to do another pass over the full sequence of characters to compute it there in what is already a pretty hot code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to translate the ranges to cell ranges, which guarantees validity and remove a bunch of logic here so this is no longer needed. removed
src/renderer/TextRenderLayer.ts
Outdated
const code: number = <number>charData[CHAR_DATA_CODE_INDEX]; | ||
const char: string = charData[CHAR_DATA_CHAR_INDEX]; | ||
let char: string = charData[CHAR_DATA_CHAR_INDEX]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to chars
and make a note that it can include a single character for a single cell, or multiple cells worth of characters when a character joiner says to
src/renderer/TextRenderLayer.ts
Outdated
callback( | ||
char.length === 1 ? code : Infinity, | ||
char, | ||
char.length + width - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think char.length
will break for emojis as they often have more than 1 character for a single emoji.
It could also lead to weird behavior if there are multiple double width characters throughout the joined range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now have the joiner registry convert the string ranges back to cell ranges so that things like double width characters and emojis are accounted for. I've added tests for each and it seems to be working as expected.
I'm admittedly still a bit confused by the way emojis work. It appears that you end up with a >1 length string for the emoji itself (with width 1) that is always followed by a space with width 1, since the emoji character is treated as single width. Is this accurate?
I tested emojis/fullwidth characters with my actual xterm-ligature-support
code and things seem to be fine for fullwidth characters in all cases I tested. There are no ligatures in any fonts I'm aware of that contain such characters so I just tested ligatures before/after/in between. Emojis always seem to render fine, but if a ligature comes after the emoji, it is not rendered properly. This appears to be because the parsing logic of opentype.js treats the emoji + space as a single character rather than using javascript string lengths. It's a fringe case with low impact, so I'll look into it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm admittedly still a bit confused by the way emojis work. It appears that you end up with a >1 length string for the emoji itself (with width 1) that is always followed by a space with width 1, since the emoji character is treated as single width. Is this accurate?
Yes this is correct, emojis can be multiple "characters" (char.length > 1
). We also render them as single width (width===1
), they will probably be double width at some point (width===2
) but there's another issue for that, so whenever emoji stuff comes up I want to make sure we're not making a mistake assuming that width is always 1.
src/renderer/TextRenderLayer.ts
Outdated
@@ -260,6 +325,78 @@ export class TextRenderLayer extends BaseRenderLayer { | |||
return overlaps; | |||
} | |||
|
|||
private _getJoinedCharacters(terminal: ITerminal, row: number): [number, number][] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep some CharacterJoinerRegistry
or something which encapsulates as much complexity around character joining as possible. I'm concerned about all this extra code being added to TextRenderLayer
as it's already quite long/complex. We could also move MergeRanges.ts into this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion. I've made the change which allows this to get cleaned up a lot and makes the joining code much more testable. It's also led to a couple of other structural changes as a result of the fact that the joiners are tracked by the registry rather than the individual render layers.
src/renderer/TextRenderLayer.ts
Outdated
@@ -116,7 +156,19 @@ export class TextRenderLayer extends BaseRenderLayer { | |||
} | |||
} | |||
|
|||
callback(code, char, width, x, y, fg, bg, flags); | |||
callback( | |||
char.length === 1 ? code : Infinity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will change the code that emojis pass in when no character joiners are registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now set the code to infinity directly in the logic where i detect a range so this problem goes away.
While writing tests I discovered that zero-width cells are given a code of null
, so that is another potential option for joined ranges
0adf426
to
63b3a6f
Compare
Made all of the recommended changes and also cleaned up the logic to convert the string ranges returned by the joiners to cell ranges so that there is proper support for fullwidth characters and such. This should resolve most of the residual edge cases around character widths and string lengths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I say let's merge after 3.5 goes out.
🎉 Thanks @princjef! Let me know how progress on the addon goes and if you have feedback on the API. |
Sounds good. The addon is all written and tested with the latest code from this PR. I'll clean the repo files up and put together a PR over there so you can take a look. |
The xterm.js renderer currently renders all text cell by cell, which prevents font ligatures from being rendered in fonts like Fira Code. This PR addresses part of #958 by modifying the interface of the xterm.js renderer to allow a "character joiner" to be present during the rendering of foreground text.
When present, the joiner is called with each input text sequence that has styles which can be joined (i.e. has the same foreground color and flags). The function returns an array of tuples, where each tuple holds a start and end index of a subrange that should be rendered together. Here's an example for a function which joins
->
sequences:Any ranges returned by the joiner will be rendered together by the renderer. While this is not terribly useful all by itself, it enables font ligature support when paired with a joiner that understands ligatures for the current font (which will be provided as part of a separate addon).
The character joiner can be added and removed by calling
registerCharacterJoiner()
andderegisterCharacterJoiner()
on the terminal instance, respectively.Some questions/things to consider:
registerCharacterJoiner()
be called afterterminal.open()
so that the renderer instance is present. No error is thrown currently if the renderer is not present. Is there an alternative approach that is preferred for this kind of setup?registerCharacterJoiner()
will override the previously registered joiner. While additional joiners could theoretically be supported by combining any joined ranges from the various joiners, it seemed like more complexity than it was worth, especially since I can't see a scenario where someone would need such a setup.Infinity
as the code to the character drawing function to avoid caching. I also considered passing -1 to avoid any future interpretation as an actual character, but I see several places where comparisons likecode < 256
are made, so it seems like the rest of that codebase makes an assumption that the code is always 0 or greater. I'm open to changing it and updating the necessary places if people want me to