Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal Bug: Race condition looses terminal data in shell integration (AsyncIterable consumer with POC) #237208

Open
KJ7LNW opened this issue Jan 3, 2025 · 8 comments
Assignees
Labels
api bug Issue identified by VS Code Team member as probable bug terminal-shell-integration Shell integration infrastructure, command decorations, etc.
Milestone

Comments

@KJ7LNW
Copy link

KJ7LNW commented Jan 3, 2025

Terminal Shell Integration Race Condition

Overview

There is a race condition in VSCode's terminal shell integration where the command finished barrier is released before AsyncIterable consumers receive the sequence data. The sequence of events causing this race is:

  1. Process data arrives containing OSC 633 ; D
  2. Data is written to xterm.js
  3. xterm.js parser immediately processes the sequence
  4. Command finished handler is called which releases the barrier
  5. Write callback completes and fires onData event
  6. AsyncIterable consumers receive the data

The key issue is that steps 3-4 happen before step 5-6. This means:

  • The barrier is released in step 4
  • But consumers don't receive the data until step 6
  • There is no guarantee consumers will see the sequence data before the barrier releases

The "633;D found" case explained below (correct behavior) only occurs when the write callback happens to complete before the barrier releases, which is not guaranteed by the current implementation.

Does this issue occur when all extensions are disabled?

This is a bug related to an extension API so at least the bug-test plugin below is needed to inspect. See proof of concept bug test below.

Proposed Solution

The race condition can be fixed by delaying the barrier release:

  • Wait for write callback completion before releasing the barrier
  • Add a promise to track write completion
  • Only release after both parse and write are done

This ensures consumers will always see the sequence before the barrier releases, making the behavior deterministic rather than depending on timing.

Environment

  • VS Code Version: Tested on both stable 1.96.2 and insiders 1.97.0
Version: 1.97.0-insider
Commit: 89f808979a5151bd91324e65d4f7ab1b62896983
Date: 2024-12-20T05:04:19.167Z
Electron: 32.2.6
ElectronBuildId: 10629634
Chromium: 128.0.6613.186
Node.js: 20.18.1
V8: 12.8.374.38-electron.0
OS: Linux x64 6.6.67
  • OS Version: Oracle Linux 9
  • Extensions: This issue requires an extension to test, as it relates to the terminal shell integration API

Steps to Reproduce

  1. Clone and run the test extension: https://github.com/KJ7LNW/vsce-test-terminal-integration
  2. Load the folder in VSCode and press F5
  3. Press "Run Command" repeatedly and quickly to watch the counts change
  4. Observe the counts - in a correct implementation, only the "found" count should increment

Test Results

Running the test extension shows:

OSC ] 633;D
    found:     3     // GOOD: Consumer saw sequence before barrier released (expected behavior)
    not found: 12    // BUG: Barrier released before consumer saw sequence (race condition)
    neither:   1     // Test error or timeout
(Run 15)

Example of a "found" case (GOOD - expected behavior where consumer sees sequence before barrier, contains \x1B]633;D):

'echo a\r\n' +
'\x1B[?2004l\r\x1B]777;preexec\x1B\\\x1B]633;E;echo a;23d6ea60-059f-428c-9424-a497b533ee7a\x07\x1B]633;C\x07a\r\n' +
'\x1B]777;notify;Command completed;echo a\x1B\\\x1B]777;precmd\x1B\\\x1B]0;ewheeler@edesktop:~\x1B\\\x1B]7;file://edesktop.ewi/home/ewheeler\x1B\\\x1B]633;D;0\x07\x1B]633;P;Cwd=/home/ewheeler\x07'

Example of a "not found" case (BUG - race condition where barrier releases too early, does NOT contain \x1B]633;D):

'echo a\r\n' +
'\x1B[?2004l\r\x1B]777;preexec\x1B\\\x1B]633;E;echo a;8680055d-1545-4c64-859d-3c4ff3a0c412\x07\x1B]633;C\x07a\r\n'

Expected Behavior

The barrier should only be released after AsyncIterable consumers have received and processed the command finished sequence. The "found" count should increment consistently, indicating consumers reliably see the sequence before the barrier releases. The "not found" count should always be zero in a correct implementation.

Actual Behavior

The barrier is frequently released before consumers receive the sequence data:

  • Only 20% of runs (3/15) had the correct behavior (found case)
  • 80% of runs (12/15) exhibited the race condition (not found case)
  • The high rate of "not found" cases demonstrates this is a real race condition affecting most command executions

Technical Details

The race condition can also be verified by:

  1. Adding logging to the write callback in TerminalInstance._writeProcessData
  2. Adding logging to the command finished handler in ShellIntegrationAddon
  3. Observing that the command finished handler logs appear before write callback logs

This confirms the sequence is parsed and handled before the write completes and reaches consumers.

Alternative Solutions

Event Ordering:

  • Modify xterm.js to guarantee write callback executes before parser
  • Would require changes to xterm.js event handling architecture
  • Most invasive but provides strongest guarantees

However, the delay barrier release approach is simpler and more robust, especially given the empirical evidence showing the high frequency of the race condition in practice.

See the full technical analysis at: https://github.com/KJ7LNW/vsce-test-terminal-integration

@Tyriar
Copy link
Member

Tyriar commented Jan 6, 2025

To clarify, this is about 633;D not reaching it to the extension sometimes? I wouldn't have thought this would matter to a consumer of the API, since:

  • 633;D is an internal detail, hard coding the sequence in an extension is not ideal.
  • TerminalShellExecution.read() closes when we see that sequence, an extension shouldn't care and should generally ignore any unrecognized sequences.
  • The exit code is included in TerminalShellExecutionEndEvent:
    /**
    * The exit code reported by the shell.
    *
    * Note that `undefined` means the shell either did not report an exit code (ie. the shell
    * integration script is misbehaving) or the shell reported a command started before the command
    * finished (eg. a sub-shell was opened). Generally this should not happen, depending on the use
    * case, it may be best to treat this as a failure.
    *
    * @example
    * const execution = shellIntegration.executeCommand({
    * command: 'echo',
    * args: ['Hello world']
    * });
    * window.onDidEndTerminalShellExecution(event => {
    * if (event.execution === execution) {
    * if (event.exitCode === undefined) {
    * console.log('Command finished but exit code is unknown');
    * } else if (event.exitCode === 0) {
    * console.log('Command succeeded');
    * } else {
    * console.log('Command failed');
    * }
    * }
    * });
    */
    readonly exitCode: number | undefined;

Does this actually cause a problem in cline or is it just an unexpected inconsistency?

@Tyriar Tyriar added the info-needed Issue requires more information from poster label Jan 6, 2025
@KJ7LNW
Copy link
Author

KJ7LNW commented Jan 7, 2025

To clarify, this is about 633;D not reaching it to the extension sometimes?

yes

I wouldn't have thought this would matter to a consumer of the API, since:
...
Does this actually cause a problem in cline

It has caused problems in production, see below.

or is it just an unexpected inconsistency?

It is definitely an unexpected inconsistency; IMHO, extensions should either always get it or never get it.

As implemented in the currently released version of Cline, 633;D is not handled and library tools like stripAnsi do not strip them. That means sometimes the output contains a stray \x1b]633;D;.... When I first sat down to clean up Cline's execute_command output, initial testing showed that 633;D was always included. After testing for a while I was confused as to why sometimes the regex pattern would match and sometimes it not, which caused this investigation.

Given the concern for consistency, it would be nice to guarantee that either extensions always get ;D, or never get it. The consistency also raises concerns, as follows:

  1. Does vscode currently guarantee that the command output stream is flushed and the only thing that could possibly be lost is the \x1b]633;D closing sequence?
  2. Is it possible that the command could lose the tail of its output in the last chunk because of the race?
    1. For example is it possible for a chunk to contain <foo>\x1b]633;D such that <foo> will be lost, especially when <foo> is it very large, a single line, and unbuffered?
    2. We should also consider that, while it is very likely that the output of the closing sequence will happen in a single read() system call, this may not be guaranteed and may be necessary to support the possibility of terminal escape sequences split across multiple chunks. eg: <large-unbuffered-one-line-data>\x1b]633 followed by ;D.
    3. While stdout/err is probably flushed on program exit, and since __vsc_command_complete calls printf '\e]633;D\a' after the command completes, the system call to read() from the terminal's file descriptor could possibly receive the data has chunks in the same read, separate reads, or even split at an unexpected buffer boundary: if the terminal integration code is already handling partial command output that may appear in same read() that contained ;633D, then my concern in this bullet item is moot, but I would like to verify that since we are discussing the topic.

Cline's execute_command output has been unreliable for many releases, and I am trying to clean up Cline's exec code with something solid. For the moment, the workaround for both escape sequences in the testing code that has not yet released looks like this:

let match1 = output.match(/\x1b\]633;C\x07(.*?)\x1b\]633;D(?:;(\d+))?/s)?.[1]
let match2 = output.match(/.*\x1b\]633;C\x07(.*)$/s)?.[1]

let match = match1 !== undefined ? match1 : 
           (match2 !== undefined ? match2 : undefined)

There are probably more efficient ways to match this, but implementation aside, you can see that we have some special cases implementing workarounds for whether ;D exists or not. The special cases raised concern for other possible command output loss and led me to make an entire test plugin just to investigate the issue and open this ticket.

TL;DR:

Ultimately I do not mind the special casing, as we are going to need it anyway to be compatible with existing versions of vscode that are already deployed:

My biggest concerns are those listed above in the interest of guaranteeing that no command output text is lost.

@Tyriar
Copy link
Member

Tyriar commented Jan 7, 2025

I suppose it's better to be consistent. Here are some code pointers if anyone wants to take a stab at fixing it:

Where read is created (exthost process):

class ShellExecutionDataStream extends Disposable {
private _barrier: Barrier | undefined;
private _emitters: AsyncIterableEmitter<string>[] = [];
createIterable(): AsyncIterable<string> {
if (!this._barrier) {
this._barrier = new Barrier();
}
const barrier = this._barrier;
const iterable = new AsyncIterableObject<string>(async emitter => {
this._emitters.push(emitter);
await barrier.wait();
});
return iterable;
}
emitData(data: string): void {
for (const emitter of this._emitters) {
emitter.emitOne(data);
}
}
endExecution(): void {
this._barrier?.open();
this._barrier = undefined;
}
}

Where data is send to read (renderer process):

instanceDataListeners.set(instanceId, Event.accumulate(e.instance.onData, 50, this._store)(events => this._proxy.$shellExecutionData(instanceId, events.join(''))));

I'm guessing the problem is coming from the Event.accumulate here which is essentially a 50ms debounce.

@Tyriar Tyriar added bug Issue identified by VS Code Team member as probable bug help wanted Issues identified as good community contribution opportunities api terminal-shell-integration Shell integration infrastructure, command decorations, etc. labels Jan 7, 2025
@Tyriar Tyriar added this to the Backlog milestone Jan 7, 2025
@KJ7LNW
Copy link
Author

KJ7LNW commented Jan 7, 2025

In the current vscode implementation, does the implementation guarantee that the command output stream is flushed and the only thing that could possibly be lost is the \x1b]633;D closing sequence, or is there a risk of losing output data also?

@Tyriar
Copy link
Member

Tyriar commented Jan 8, 2025

@KJ7LNW can't say without investigating further, this is fairly low priority for me

@vs-code-engineering vs-code-engineering bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2025
@microsoft microsoft deleted a comment from vs-code-engineering bot Jan 16, 2025
@Tyriar Tyriar reopened this Jan 16, 2025
@Tyriar Tyriar removed the info-needed Issue requires more information from poster label Jan 16, 2025
@KJ7LNW
Copy link
Author

KJ7LNW commented Jan 16, 2025

I was able to confirm that data is lost. You can verify that using this test plugin with the command set to echo a and initial prompt command set to #: the a is never captured in the data stream. However, if you set prompt command to sleep 0.050 then succeeds because \x1b]633;D is delayed long enough that the terminal output read completes before the barrier is released:

https://github.com/KJ7LNW/vsce-test-terminal-integration

Without sleep (fails)

Fails with or without the shell integration box checked

Image

With 50ms sleep (succeeds)

Succeeds with or without the shell integration box checked

Image

PROMPT_COMMAND is set here (abbreviated excerpts):

            const env: { [key: string]: string } = {};
            if (promptCommand.trim()) {
                env.PROMPT_COMMAND = promptCommand;
            }
            this.terminal = vscode.window.createTerminal({
                name: 'Command Runner',
                env
            });

            // stream is picked up in the first line of onDidStartTerminalShellExecution as:
            // const stream = e.execution.read();

            // and then output is displayed in the UI with and without parsing
            let output = '';
            for await (const data of stream) {
                output += data;
            }

See https://github.com/KJ7LNW/vsce-test-terminal-integration/blob/380715847c334a6acab9c9043512d1fc16fb2533/src/terminal/terminal-handler.ts#L198 .

@KJ7LNW KJ7LNW changed the title Terminal: Race condition between shell integration sequence parsing and AsyncIterable consumers with POC Terminal Bug: Race condition looses terminal data in shell integration (AsyncIterable consumer with POC) Jan 16, 2025
@KJ7LNW
Copy link
Author

KJ7LNW commented Jan 16, 2025

Maybe this issue should be escalated now that we know it is losing data in some cases.

Note that data is lost in both terminal integration via terminal.shellIntegration.executeCommand(command) and also via terminal.sendText(command) by toggling the "Use shell integration" check box in the proof of concept.

@KJ7LNW
Copy link
Author

KJ7LNW commented Jan 17, 2025

Here is more information based on the testing I have done using the test extension linked above.

Note that the following test cases demonstrate behaviors that are reliably reproducible on my system, but due to the timing-sensitive nature of these race conditions, actual behavior may vary slightly on different systems or under different load conditions.

  • Race Condition Test Cases
    1. Testing with 'echo a' and varying PROMPT_COMMAND:
      • Command: echo a with PROMPT_COMMAND=true
        • Fails: output is lost
        • Bash builtin executes too quickly
      • Command: echo a with PROMPT_COMMAND=/bin/true
        • Works: output is preserved
        • Binary provides sufficient timing delay
    2. Testing with 'echo a' and varying command path:
      • Command: echo a with PROMPT_COMMAND=#
        • Fails: output is lost
      • Command: /bin/echo a with PROMPT_COMMAND=#
        • Works: output is preserved
        • Binary provides sufficient timing delay
    3. Testing with empty string and varying command path:
      • Command: echo "" with PROMPT_COMMAND=#
        • Fails: CR/LF output is lost
        • Bash builtin executes too quickly
      • Command: /bin/echo "" with PROMPT_COMMAND=#
        • Works: CR/LF is preserved
        • Binary provides sufficient timing delay

@Tyriar Tyriar modified the milestones: Backlog, February 2025 Jan 27, 2025
@Tyriar Tyriar removed the help wanted Issues identified as good community contribution opportunities label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api bug Issue identified by VS Code Team member as probable bug terminal-shell-integration Shell integration infrastructure, command decorations, etc.
Projects
None yet
Development

No branches or pull requests

3 participants
@Tyriar @KJ7LNW and others