Skip to content

Commit

Permalink
Use ICU for text search (#15858)
Browse files Browse the repository at this point in the history
The ultimate goal of this PR was to use ICU for text search to
* Improve Unicode support
  Previously we used `towlower` and only supported BMP glphs.
* Improve search performance (10-100x)
  This allows us to search for all results in the entire text buffer
  at once without having to do so asynchronously.

Unfortunately, this required some significant changes too:
* ICU's search facilities operate on text positions which we need to be
  mapped back to buffer coordinates. This required the introduction of
  `CharToColumnMapper` to implement sort of a reverse-`_charOffsets`
  mapping. It turns text (character) positions back into coordinates.
* Previously search restarted every time you clicked the search button.
  It used the current selection as the starting position for the new
  search. But since ICU's `uregex` cannot search backwards we're
  required to accumulate all results in a vector first and so we
  need to cache that vector in between searches.
* We need to know when the cached vector became invalid and so we have
  to track any changes made to `TextBuffer`. The way this commit solves
  it is by splitting `GetRowByOffset` into `GetRowByOffset` for
  `const ROW` access and `GetMutableRowByOffset` which increments a
  mutation counter on each call. The `Search` instance can then compare
  its cached mutation count against the previous mutation count.

Finally, this commit makes 2 semi-unrelated changes:
* URL search now also uses ICU, since it's closely related to regular
  text search anyways. This significantly improves performance at
  large window sizes.
* A few minor issues in `UiaTracing` were fixed. In particular
  2 functions which passed strings as `wstring` by copy are now
  using `wstring_view` and `TraceLoggingCountedWideString`.

Related to #6319 and #8000

## Validation Steps Performed
* Search upward/downward in conhost ✅
* Search upward/downward in WT ✅
* Searching for any of ß, ẞ, ss or SS matches any of the other ✅
* Searching for any of Σ, σ, or ς matches any of the other ✅
  • Loading branch information
lhecker authored Aug 24, 2023
1 parent 5651f08 commit cd80f3c
Show file tree
Hide file tree
Showing 42 changed files with 1,035 additions and 952 deletions.
2 changes: 2 additions & 0 deletions .github/actions/spelling/allow/apis.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ futex
GETDESKWALLPAPER
GETHIGHCONTRAST
GETMOUSEHOVERTIME
GETTEXTLENGTH
Hashtable
HIGHCONTRASTON
HIGHCONTRASTW
Expand Down Expand Up @@ -186,6 +187,7 @@ snprintf
spsc
sregex
SRWLOC
srwlock
SRWLOCK
STDCPP
STDMETHOD
Expand Down
56 changes: 12 additions & 44 deletions .github/actions/spelling/expect/expect.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ ABCF
abgr
abi
ABORTIFHUNG
ACCESSTOKEN
acidev
ACIOSS
ACover
Expand Down Expand Up @@ -117,10 +116,8 @@ binplace
binplaced
bitcoin
bitcrazed
bitflag
bitmask
BITOPERATION
bitsets
BKCOLOR
BKGND
Bksp
Expand Down Expand Up @@ -149,12 +146,12 @@ bufferout
buffersize
buflen
buildtransitive
BUILDURI
burriter
BValue
bytebuffer
cac
cacafire
CALLCONV
capslock
CARETBLINKINGENABLED
CARRIAGERETURN
Expand Down Expand Up @@ -198,7 +195,6 @@ CHT
Cic
cielab
Cielab
Clcompile
CLE
cleartype
CLICKACTIVE
Expand Down Expand Up @@ -229,7 +225,6 @@ codepage
codepath
codepoints
coinit
COLLECTIONURI
colorizing
COLORMATRIX
COLORREFs
Expand Down Expand Up @@ -307,15 +302,13 @@ coordnew
COPYCOLOR
CORESYSTEM
cotaskmem
countof
CPG
cpinfo
CPINFOEX
CPLINFO
cplusplus
CPPCORECHECK
cppcorecheckrules
cpprest
cpprestsdk
cppwinrt
CProc
Expand Down Expand Up @@ -382,7 +375,6 @@ dai
DATABLOCK
DBatch
dbcs
DBCSCHAR
DBCSFONT
dbg
DBGALL
Expand Down Expand Up @@ -504,7 +496,6 @@ devicecode
Dext
DFactory
DFF
dhandler
dialogbox
directio
DIRECTX
Expand All @@ -522,7 +513,6 @@ dllmain
DLLVERSIONINFO
DLOAD
DLOOK
dmp
DONTCARE
doskey
dotnet
Expand Down Expand Up @@ -600,7 +590,6 @@ eplace
EPres
EQU
ERASEBKGND
etcoreapp
ETW
EUDC
EVENTID
Expand Down Expand Up @@ -642,7 +631,6 @@ FGs
FILEDESCRIPTION
FILESUBTYPE
FILESYSPATH
fileurl
FILEW
FILLATTR
FILLCONSOLEOUTPUT
Expand Down Expand Up @@ -824,15 +812,13 @@ HIWORD
HKCU
hkey
hkl
HKLM
hlocal
hlsl
HMB
HMK
hmod
hmodule
hmon
homeglyphs
homoglyph
HORZ
hostable
Expand Down Expand Up @@ -920,6 +906,7 @@ INSERTMODE
INTERACTIVITYBASE
INTERCEPTCOPYPASTE
INTERNALNAME
Interner
intsafe
INVALIDARG
INVALIDATERECT
Expand All @@ -941,7 +928,6 @@ IUI
IUnknown
ivalid
IWIC
IXMP
IXP
jconcpp
JOBOBJECT
Expand All @@ -965,7 +951,6 @@ kernelbasestaging
KEYBDINPUT
keychord
keydown
keyevent
KEYFIRST
KEYLAST
Keymapping
Expand Down Expand Up @@ -1012,7 +997,6 @@ LINEWRAP
LINKERRCAP
LINKERROR
linputfile
listproperties
listptr
listptrsize
lld
Expand Down Expand Up @@ -1131,7 +1115,6 @@ MIIM
milli
mincore
mindbogglingly
minimizeall
minkernel
MINMAXINFO
minwin
Expand Down Expand Up @@ -1318,7 +1301,7 @@ onecoreuuid
ONECOREWINDOWS
onehalf
oneseq
ONLCR
OOM
openbash
opencode
opencon
Expand All @@ -1328,13 +1311,6 @@ openps
openvt
ORIGINALFILENAME
osc
OSCBG
OSCCT
OSCFG
OSCRCC
OSCSCB
OSCSCC
OSCWT
OSDEPENDSROOT
OSG
OSGENG
Expand Down Expand Up @@ -1453,7 +1429,6 @@ PPEB
ppf
ppguid
ppidl
pplx
PPROC
ppropvar
ppsi
Expand All @@ -1467,8 +1442,8 @@ prc
prealigned
prect
prefast
preflighting
prefs
preinstalled
prepopulated
presorted
PREVENTPINNING
Expand All @@ -1481,7 +1456,6 @@ prioritization
processenv
processhost
PROCESSINFOCLASS
procs
PROPERTYID
PROPERTYKEY
PROPERTYVAL
Expand All @@ -1496,7 +1470,6 @@ propvariant
propvarutil
psa
PSECURITY
pseudocode
pseudoconsole
pseudoterminal
psh
Expand Down Expand Up @@ -1776,7 +1749,6 @@ SND
SOLIDBOX
Solutiondir
somefile
SOURCEBRANCH
sourced
spammy
SRCCODEPAGE
Expand Down Expand Up @@ -1828,7 +1800,6 @@ SUBLANG
subresource
subsystemconsole
subsystemwindows
suiteless
swapchain
swapchainpanel
swappable
Expand Down Expand Up @@ -1873,7 +1844,6 @@ tcommands
Tdd
TDelegated
TDP
TEAMPROJECT
tearoff
Teb
Techo
Expand All @@ -1885,23 +1855,18 @@ terminalrenderdata
TERMINALSCROLLING
terminfo
TEs
testbuildplatform
testcon
testd
testdlls
testenv
testlab
testlist
testmd
testmode
testname
testnameprefix
TESTNULL
testpass
testpasses
testtestabc
testtesttesttesttest
testtimeout
TEXCOORD
texel
TExpected
Expand Down Expand Up @@ -1929,7 +1894,6 @@ TJson
TLambda
TLDP
TLEN
Tlgdata
TMAE
TMPF
TMult
Expand Down Expand Up @@ -1989,11 +1953,14 @@ UAC
uap
uapadmin
UAX
UBool
ucd
uch
UChars
udk
UDM
uer
UError
uget
uia
UIACCESS
Expand Down Expand Up @@ -2023,13 +1990,14 @@ unknwn
UNORM
unparseable
unregistering
untests
untextured
untimes
UPDATEDISPLAY
UPDOWN
UPKEY
UPSS
uregex
URegular
usebackq
USECALLBACK
USECOLOR
Expand All @@ -2051,6 +2019,9 @@ USESIZE
USESTDHANDLES
usp
USRDLL
utext
UText
UTEXT
utr
UVWX
UVWXY
Expand Down Expand Up @@ -2134,7 +2105,6 @@ WDDMCONSOLECONTEXT
wdm
webpage
websites
websockets
wekyb
wex
wextest
Expand Down Expand Up @@ -2162,7 +2132,6 @@ windbg
WINDEF
windll
WINDOWALPHA
Windowbuffer
windowdpiapi
WINDOWEDGE
windowext
Expand Down Expand Up @@ -2306,7 +2275,6 @@ xunit
xutr
XVIRTUALSCREEN
XWalk
xxyyzz
yact
YCast
YCENTER
Expand Down
Loading

0 comments on commit cd80f3c

Please sign in to comment.