Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework std.Progress #20059

Merged
merged 60 commits into from
May 28, 2024
Merged

rework std.Progress #20059

merged 60 commits into from
May 28, 2024

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented May 24, 2024

Demo

simple asciinema demo source

demo: building a music player with zig build source

Motivation

The previous implementation of std.Progress had the design limitation that it could not assume ownership of the terminal. This meant that it had to play nicely with sub-processes purely via what was printed to the terminal, and it had to play nicely with progress-unaware stderr writes to the terminal. It also was forbidden from installing a SIGWINCH handler, or running ioctl to find out the rows and cols of the terminal.

This implementation is designed around the idea that a single process will be the sole owner of the terminal, and all other progress reports will be communicated back to that process. With this change in the requirements, it becomes possible to make a much more useful progress bar.

Strategy

This creates a standard "Zig Progress Protocol" and uses it so that the same std.Progress API works both when an application is the main owner of a terminal, and when an application is a child process. In the latter case, progress information is communicated semantically over a pipe to the parent process.

The file descriptor is given in the ZIG_PROGRESS environment variable. std.process.Child integrates with this, so attaching a child's progress subtree in a parent process is as easy as setting the child.progress_node field before calling spawn.

In order to avoid performance penalty for using this API, the Node.start and Node.end APIs are thread-safe, lock-free, infallible, and do minimal amount of memory loads and stores. In order to accomplish this, a statically allocated buffer of Node storage is used - one array for parents, and one array for the rest of the data. Children are not stored. The statically allocated buffer is used for a bespoke Node allocator implementation. A static buffer is sufficient because we can set an upper bound on supported terminal width and height. If the terminal size exceeds this, the progress bar output will be truncated regardless.

A separate thread periodically refreshes the terminal on a timer. This progress update thread iterates over the entire preallocated parents array, looking for used nodes. This is efficient because the parents array is only 200 8-bit integers, or about 4 cache lines. When iterating, this thread "serializes" the data into a separate preallocated array by atomically loading from the shared data into data that is only touched by a single thread - the progress update thread. It then looks for nodes that are marked with a file descriptor that is a pipe to a child process. Such nodes are replaced during the serialization process with the data from reading from the pipe. The data can be memcpy'd into place except for the parents array which needs to be relocated. Once this serialization process is complete, there are two paths, one for a child process, and one for the root process that owns the terminal.

The root process that owns the terminal scans the serialized data, computing children and sibling pointers. The canonical data only stores parents, so this is where the tree structure is computed. Then the tree is walked, appending to a static buffer that will be sent to the terminal with a single write() syscall. During this process, the detected rows and cols of the terminal are respected. If the user resizes the terminal, it will cause a SIGWINCH which signals the update thread to wake up and redraw with the new rows and cols.

A child process, instead of drawing to the terminal, takes the same serialized data and sends it across a pipe. The pipe is in non-blocking mode, so if it fills up, the child drops the message; a future update will contain the new progress information. Likewise when the parent reads from the pipe, it discards all messages in the buffer except for the last one. If there are no messages in the pipe, the parent uses the data from the last update.

Performance Data Points

Building the self-hosted compiler as a sub-process. This measures the cost of the new API implementations, primarily Node.start, Node.end, and the disappearance of Node.activate. In this case, standard error is not a terminal, and thus an update thread is never spawned.

Benchmark 1 (3 runs): master branch zig build-exe self-hosted compiler
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          51.8s  ±  138ms    51.6s  … 51.9s           0 ( 0%)        0%
  peak_rss           4.57GB ±  286KB    4.57GB … 4.57GB          0 ( 0%)        0%
  cpu_cycles          273G  ±  360M      273G  …  274G           0 ( 0%)        0%
  instructions        487G  ±  105M      487G  …  487G           0 ( 0%)        0%
  cache_references   19.0G  ± 31.8M     19.0G  … 19.1G           0 ( 0%)        0%
  cache_misses       3.87G  ± 12.0M     3.86G  … 3.88G           0 ( 0%)        0%
  branch_misses      2.22G  ± 3.44M     2.21G  … 2.22G           0 ( 0%)        0%
Benchmark 2 (3 runs): this branch zig build-exe self-hosted compiler
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          51.5s  ±  115ms    51.4s  … 51.7s           0 ( 0%)          -  0.4% ±  0.6%
  peak_rss           4.58GB ±  190KB    4.58GB … 4.58GB          0 ( 0%)          +  0.1% ±  0.0%
  cpu_cycles          272G  ±  494M      272G  …  273G           0 ( 0%)          -  0.4% ±  0.4%
  instructions        487G  ± 63.5M      487G  …  487G           0 ( 0%)          -  0.1% ±  0.0%
  cache_references   19.1G  ± 16.9M     19.1G  … 19.1G           0 ( 0%)          +  0.3% ±  0.3%
  cache_misses       3.86G  ± 17.1M     3.84G  … 3.88G           0 ( 0%)          -  0.2% ±  0.9%
  branch_misses      2.23G  ± 5.82M     2.22G  … 2.23G           0 ( 0%)          +  0.4% ±  0.5%

Building the self-hosted compiler, with time zig build-exe -fno-emit-bin ... so that the progress is updating the terminal on a regular interval.

  • Old:
    • 0m4.115s
    • 0m4.216s
    • 0m4.221s
    • 0m4.227s
    • 0m4.234s
  • New (1% slower):
    • 0m4.231s
    • 0m4.240s
    • 0m4.271s
    • 0m4.339s
    • 0m4.340s

Building my music player application with zig build, with the project-local cache cleared. This displays a lot of progress information to the terminal. This data point accounts for many sub processes sending progress information over a pipe to the parent process for aggregation.

  • Old
    • 65.74s
    • 66.39s
    • 71.09s
  • New (1% faster)
    • 65.51s
    • 65.88s
    • 66.09s

Upgrade Guide

  • Pass around std.Progress.Node instead of *std.Progress.Node (no longer a pointer).
  • Remove calls to node.activate(). Those are not needed anymore.
  • It's now safe to pass short-lived strings to Node.start since the data will be copied.
  • It is illegal to initialize std.Progress more than once. Do that in main() and nowhere else.
  • Use std.debug.lockStdErr and std.debug.unlockStdErr before writing to stderr to integrate properly with std.Progress (std.debug.print already does this for you).
-    var progress = std.Progress{
-        ...
-    };
-    const root_node = progress.start("Test", test_fn_list.len);
+    const root_node = std.Progress.start(.{
+        .root_name = "Test",
+        .estimated_total_items = test_fn_list.len,
+    });

All the options to start are optional.

Finally, when spawning a child process, populate the progress_node field first:

child.progress_node = node;
try child.spawn();

Follow Up Work

closes #9633
closes #17821

@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. release notes This PR should be mentioned in the release notes. labels May 24, 2024
@andrewrk andrewrk removed the request for review from thejoshwolfe May 25, 2024 20:19
@VisenDev
Copy link

Very nice! This would probably close this issue
#17821

@andrewrk
Copy link
Member Author

@squeek502 do you have any interest in helping out on the Windows side of things here? There are two challenges:

  1. The logic in computeRedraw works when the terminal supports ANSI escape codes, but it does not work for the case when we need to use kernel32 console function calls. The task here is to add a Windows-specific implementation for rendering this tree data structure to the terminal, as well as erasing the previous render.
  2. Child process integration. This implementation sets up a child with file descriptor 3 as a pipe for communicating progress to the parent on POSIX-like systems. On Windows we need to do some kind of equivalent.

I think the second one is the more difficult one, but even with only (1) solved, it would still be useful. There would not be visibility into child processes but it would be a good start.

@squeek502
Copy link
Collaborator

Sure, but it might be a few days before I can give it proper attention.

@squeek502
Copy link
Collaborator

squeek502 commented May 26, 2024

it might be a few days before I can give it proper attention

I lied; I worked on the Windows console API stuff and got a proof-of-concept working but it's a bit weird. The implementation is here:

squeek502@78a1093

(all the IPC stuff is commented out as a quick fix to get it to compile on Windows)

When the console supports ANSI escape codes, then ANSI escape codes are used and it works as expected:

progress_ansi.mp4

And when the console does not support ANSI escape codes, it falls back to the Windows console API:

(the flickering only happened while I was recording with OBS; there's very minimal flickering when not recording)

progress_consoleapi.mp4

Note that there is a ► character at the start of the progress when using the Windows API. This is because AFAICT there's no way to go to/find the start of a line if the line is being wrapped. That is, carriage returns only bring you to the start of the row, not the start of the line. So, the workaround is to write out a special character at the start of the progress and then take advantage of the ReadConsoleOutputCharacter API to find the marker and clear the screen starting from there. This makes it so that resizing/wrapping doesn't break the rendering.

The important part of the implementation is here:

https://github.com/squeek502/zig/blob/78a10937192376647b126216a41868f796823ebb/lib/std/Progress.zig#L524-L577

with some commented out alternate implementations below it that worked somewhat but had issues.

Also, when falling back to the Windows console API it currently assumes code page 437 (the default code page for consoles) is the active code page and uses that for the box drawing characters. Some TODOs about that are here:

https://github.com/squeek502/zig/blob/78a10937192376647b126216a41868f796823ebb/lib/std/Progress.zig#L471-L481

@andrewrk
Copy link
Member Author

Looks great!

New design ideas:
* One global instance, don't try to play nicely with other instances
  except via IPC.
* One process owns the terminal and the other processes communicate via
  IPC.
* Clear the whole terminal and use multiple lines.

What's implemented so far:
* Query the terminal for size.
* Register a SIGWINCH handler.
* Use a thread for redraws.

To be done:
* IPC
* Handling single threaded targets
* Porting to Windows
* More intelligent display of the progress tree rather than only using
  one line.
Move the mutex into the nodes

Track the whole tree instead of only recently activated node
This time, we preallocate a fixed set of nodes and have the user-visible
Node only be an index into them. This allows for lock-free management of
the node storage.

Only the parent indexes are stored, and the update thread makes a
serialized copy of the state before trying to compute children lists.

The update thread then walks the tree and outputs an entire tree of
progress rather than only one line.

There is a problem with clearing from the cursor to the end of the
screen when the cursor is at the bottom of the terminal.
the clear, save, restore thing doesn't work when the terminal is at the
bottom
so that the previous message can be used when the pipe is empty.

prevents flickering
and properly dup2 the file descriptor to make it handle the case when
other files are already open
also fix handling of BrokenPipe

also fix continuing wrong loop in error conditions
We cannot rely on host endianness because the parent or child process
may be executing inside QEMU.
The update thread was sometimes reading the special state and then
incorrectly getting 0 for the file descriptor, making it hang since it
tried to read from stdin.
`--color off` now disables the CLI progress bar both in the parent
process and the build runner process.
Before this fix, the dup2 of the progress pipe was clobbering the cwd
fd, causing the fchdir to return ENOTDIR in between fork() and exec().
7281cc1 did not solve the problem
because even when Node.index is none, it still counts as initializing
the global Progress object. Just use a normal zig optional, and all is
good.
Generates better machine code, particularly on ARM
lib/std/Progress.zig Outdated Show resolved Hide resolved
@squeek502
Copy link
Collaborator

Fixed up and rebased my Windows console API stuff on top of the latest commits here: squeek502@5b49900

(note: it's a different branch than the one in my previous comment)

Feel free to cherry pick the commit, or use it as reference and adapt it however you'd like.

@andrewrk
Copy link
Member Author

andrewrk commented May 28, 2024

Hmm, the windows implementation has the undesirable property that it clears the screen, including all scrollback. Ideally it would not do that. In fact I think it's better to show no progress than to destroy scrollback history.

Maybe I named that function poorly - it does not clear the full terminal, it only clears the progress data that has been written.

It's also not good to do the clear inside computeRedraw. That function is only supposed to create a buffer, which is then written later if a lock is obtained. Obtaining the lock may fail, in which case the buffer is silently dropped.

I think it would be better to spawn an entirely different update thread function for the case of windows console API.

* Merge a bunch of related state together into TerminalMode. Windows
  sometimes follows the same path as posix via ansi_escape_codes,
  sometimes not.
* Use a different thread entry point for Windows API but share the same
  entry point on Windows when the terminal is in ansi_escape_codes mode.
* Only clear the terminal when the stderr lock is held.
* Don't try to clear the terminal when nothing has been written yet.
* Don't try to clear the terminal in IPC mode.
* Fix size detection logic bug under error conditions.
@andrewrk
Copy link
Member Author

Hmm, the windows implementation has the undesirable property that it clears the screen, including all scrollback. Ideally it would not do that. In fact I think it's better to show no progress than to destroy scrollback history.

Might be an artifact from using wine. Haven't tried on my Windows laptop yet

@squeek502
Copy link
Collaborator

squeek502 commented May 28, 2024

Hmm, the windows implementation has the undesirable property that it clears the screen, including all scrollback

Can't reproduce this behavior on Windows. Not sure what Wine would be doing for this to be the behavior there, either, since the "clear" is just FillConsoleOutputCharacterW is being used to set some console cells to the space character.

In testing this, though, I have noticed that what I said in #15206 (comment) / #18692 (comment) appears to be wrong. With a togglevt.zig like this:

const std = @import("std");
const windows = std.os.windows;

pub fn main() void {
    const stderr = std.io.getStdErr();
    var mode: windows.DWORD = undefined;
    std.debug.assert(std.os.windows.kernel32.GetConsoleMode(stderr.handle, &mode) != 0);

    const new_mode = mode ^ windows.ENABLE_VIRTUAL_TERMINAL_PROCESSING;
    std.debug.assert(std.os.windows.kernel32.SetConsoleMode(stderr.handle, new_mode) != 0);

    const now_enabled = new_mode & windows.ENABLE_VIRTUAL_TERMINAL_PROCESSING != 0;
    std.debug.print("{s} virtual terminal processing for stderr: {}\n", .{ if (now_enabled) "enabled" else "disabled", stderr.handle });
}

then running it in Windows Terminal results in:

C:\Users\Ryan\Programming\Zig\tmp\progress> togglevt
disabled virtual terminal processing for stderr: anyopaque@290

C:\Users\Ryan\Programming\Zig\tmp\progress> togglevt
disabled virtual terminal processing for stderr: anyopaque@290

C:\Users\Ryan\Programming\Zig\tmp\progress> togglevt
disabled virtual terminal processing for stderr: anyopaque@290

(that is, the terminal is resetting the console mode to the default for each run; the mode is not preserved between runs which was what I thought happened [and is what happens with the code page])

With cmd.exe, the same thing happens except ENABLE_VIRTUAL_TERMINAL_PROCESSING is disabled by default:

C:\Users\Ryan\Programming\Zig\tmp\progress>togglevt
enabled virtual terminal processing for stderr: anyopaque@58

C:\Users\Ryan\Programming\Zig\tmp\progress>togglevt
enabled virtual terminal processing for stderr: anyopaque@58

So another option here might be to revisit those PRs (I like #18692 best) and try setting ENABLE_VIRTUAL_TERMINAL_PROCESSING within File.supportsAnsiEscapeCodes. If we go that route, we might be able to drop Windows console API support entirely since all the console API docs have a "use ANSI escape codes instead" notice on them and ENABLE_VIRTUAL_TERMINAL_PROCESSING was added in Windows 10 RS1 from June 2016.

@andrewrk andrewrk merged commit 963ffe9 into master May 28, 2024
10 checks passed
@andrewrk andrewrk deleted the progress branch May 28, 2024 23:27
chrboesch pushed a commit to ziglings-org/exercises that referenced this pull request May 29, 2024
divinetouch pushed a commit to divinetouch/zigling_practice that referenced this pull request Jun 26, 2024
jesseb34r pushed a commit to jesseb34r/ziglings that referenced this pull request Jul 16, 2024
hindenbug added a commit to hindenbug/ziglings-sol that referenced this pull request Jul 24, 2024
* origin/main: (88 commits)
  document -Drandom
  Update .woodpecker/eowyn.yaml
  Rename 'std.rand' to 'std.Random'
  attempt at implementing #113 "Add a way to do one random exercise"
  Add patch.
  046: Show usage of `.?` and hint towards new solution.
  Fixes because of a new Zig version, which changes some functions.
  Calling `split` is deprecated
  English fixes for 106_files.zig
  English fixes for 107_files2.zig
  fixed typo
  New Zig version
  Ignore .zig-cache for new zig versions
  Fixed the changes from reworking std.Progress. For details: ziglang/zig#20059
  Verbs agree with the head of the noun phrase, not the closest noun
  the Earth
  Update exercises/105_threading2.zig
  Nobody wants the long version of finding out if a variable is set. So switched to the short version with 'orelse'. ;)
  Add devcontainer.json
  Fix some typos
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: Better build-system compilation logging option to disable std.Progress.dont_print_on_dumb
5 participants