Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowness/choppy with specific Python syntax #1688

Closed
Tracked by #1751
scop opened this issue Jun 14, 2021 · 3 comments · Fixed by #1783
Closed
Tracked by #1751

Slowness/choppy with specific Python syntax #1688

scop opened this issue Jun 14, 2021 · 3 comments · Fixed by #1783
Labels

Comments

@scop
Copy link
Contributor

scop commented Jun 14, 2021

Describe the bug you encountered:

bat consumes 100% of one cpu and gives choppy output (~3 lines per second on my box) for a specific Python sample. Shell reproducer:

{ echo "foo={"; for i in {1..100}; do echo "                   #'a': '$i',"; done; } | bat --language python

This is sensitive to amount of whitespace within the echo. Adding or removing some might speed it up some, but is still choppy with some other amounts.

What did you expect to happen instead?

Normal smooth output.

How did you install bat?

git master


bat version and environment

Software version

bat 0.18.1 (35f3127-modified)

Operating system

Linux 5.8.0-55-generic

Command-line

bat --diagnostic 

Environment variables

SHELL=/bin/bash
PAGER=less
LESS=-iMR
BAT_PAGER='less -R'
BAT_CACHE_PATH=<not set>
BAT_CONFIG_PATH=<not set>
BAT_OPTS=<not set>
BAT_STYLE=<not set>
BAT_TABS=<not set>
BAT_THEME=Dracula
XDG_CONFIG_HOME=<not set>
XDG_CACHE_HOME=<not set>
COLORTERM=truecolor
NO_COLOR=<not set>
MANPAGER=<not set>

Config file

Could not read contents of '/home/scop/.config/bat/config': No such file or directory (os error 2).

Compile time information

  • Profile: release
  • Target triple: x86_64-unknown-linux-gnu
  • Family: unix
  • OS: linux
  • Architecture: x86_64
  • Pointer width: 64
  • Endian: little
  • CPU features: fxsr,sse,sse2
  • Host: x86_64-unknown-linux-gnu

Less version

> less --version 
less 551 (GNU regular expressions)
Copyright (C) 1984-2019  Mark Nudelman

less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Home page: http://www.greenwoodsoftware.com/less
@scop scop added the bug Something isn't working label Jun 14, 2021
@Enselic
Copy link
Collaborator

Enselic commented Jun 15, 2021

Thanks a lot for the bug report!

I can confirm that it is slow. I also tried with the optional fancy-regex backend in syntect, but it is still slow. A tip is to add --pager=never to the bat command in your reproducer to see the slowness live. Interestingly, while this is slow:

foo={
                   #'a': '1',
                   #'a': '2',
                   #'a': '3',
                   #'a': '4',
                   #'a': '5',
                   ...

this (just removing the first #) makes the whole file fast:

foo={
                   'a': '1',
                   #'a': '2',
                   #'a': '3',
                   #'a': '4',
                   #'a': '5',
                   ...

@scop
Copy link
Contributor Author

scop commented Jun 15, 2021

Yep, noticed myself too that various tiny changes to the input change the problem extent or remove it altogether.

@keith-hall
Copy link
Collaborator

I instrumented syntect with some debug printing of regex pattern timing, and found:

regex "(?=(?x:\n  \\s+                      # whitespace\n  | [urfb]*\"(?:\\\\.|[^\"])*\" # strings\n  | [urfb]*\'(?:\\\\.|[^\'])*\' # ^\n  | [\\d.ej]+               # numerics\n  | [+*/%@-] | // | and | or # operators\n  | (\\b[\\p{L}_][\\p{L}\\p{N}_]*\\b[ ]*\\.[ ]*)*\\b[\\p{L}_][\\p{L}\\p{N}_]*\\b               # a path\n)*:|\\s*\\*\\*)" took 131.704239ms to find a match
regex "(?=(?x:\n  \\s+                      # whitespace\n  | [urfb]*\"(?:\\\\.|[^\"])*\" # strings\n  | [urfb]*\'(?:\\\\.|[^\'])*\' # ^\n  | [\\d.ej]+               # numerics\n  | [+*/%@-] | // | and | or # operators\n  | (\\b[\\p{L}_][\\p{L}\\p{N}_]*\\b[ ]*\\.[ ]*)*\\b[\\p{L}_][\\p{L}\\p{N}_]*\\b               # a path\n)*[,}]|\\s*\\*)" took 134.816448ms to find a match

this corresponds with https://github.com/sublimehq/Packages/blob/09cb8000b383c2f32de6473f44fce7d43cb8772f/Python/Python.sublime-syntax#L57-L65, specifically https://github.com/sublimehq/Packages/blob/09cb8000b383c2f32de6473f44fce7d43cb8772f/Python/Python.sublime-syntax#L1013-L1025

I wouldn't be surprised to see this replaced with ST's new branch point functionality - which syntect doesn't support yet.
I think we may want to patch our Python.sublime-syntax file with a regex which performs better or just always scope/parse it as a "dictionary" instead of trying to determine whether it is a "set" or not as it likely makes little difference to highlighting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants