UI: Log streaming bug fix medley #7820

DingoEatingFuzz · 2020-04-28T15:53:38Z

This fixes a couple bugs @Gurpartap reported in #7770 as well as a couple more I found along the way.

Bug 1: Switching from stdout to stderr before the stdout request fails or times out causes an error on the stderr log stream despite successfully streaming the logs.

This was a fun race condition. Logs first attempt to connect to the client an allocation is running on before falling back on the server which should always work. When the client request fails, a flag is set to failover to the server. When that server failover flag is already set and another request fails, connecting to logs is deemed a total loss and the "Cannot fetch logs" error is shown.

If a user clicks stderr before stdout triggers the first failure flag then the stderr request will attempt to connect to the client. This causes both stdout and stderr to fail if the client can't be reached. The first failure flips the fallback flag and the second failure flips the total loss flag. However, when the fallback flag is flipped, it still causes a server request to be made, which is why log streaming continued to work.

The fix here was to capture the failover flag state at request time rather than response time. This way both racing client requests flip the same flag (or rather the first one flips it and the second is a no-op).

Bug 2: The "Cannot fetch log" error persists when switching between stdout/stderr

This one technically isn't a bug. If the error message is shown, it means no connection can be established. It only seems like a bug when Bug 1 occurs. However, I too dislike error messages that can't be dismissed, so I made it dismissable anyway.

Bug 3: Failing over to a server request after a client request takes longer than the client timeout threshold but still eventually responds causes two parallel stream requests, one which never closes.

This one was always known about but AbortControllers where still a relatively new thing when this was first written.

The task logger wisely has its own timeout for client requests to prevent a hanging request from triggering the fallback to making a server request. 9/10 (maybe even 99/100?) times this request will eventually timeout because the client can't be reached, but in the event the client does respond, that connection would remain open indefinitely eating up one of the precious few parallel connections a browser gets to a single host.

The fix here is to use an AbortController as mentioned. They now have broad support. The only browser we are concerned with that does not have them is IE11 which is ok in this instance since IE11 also doesn't support response streaming so it will use the poll logger anyway.

Bug 4: Clicking stdout/stderr when already on stdout/stderr stops the log streamer.

This one wasn't tricky by any stretch of the imagination. Just a small detail I overlooked. The fix was to return early if the mode being switched to was already the current mode.

Fixes #7770

backspace

Nice collection of concise fixes for some difficult problems; I have a couple of comments on tests but nothing major.

backspace · 2020-04-28T21:33:52Z

ui/tests/integration/task-log-test.js

@@ -255,5 +279,54 @@ module('Integration | Component | task log', function(hooks) {
      'Log request was later made to the server'
    );
    assert.ok(find('[data-test-connection-error]'), 'An error message is shown');
+
+    await click('[data-test-connection-error-dismiss]');
+    // await settled();


I guess this could be removed?

backspace · 2020-04-28T21:35:24Z

ui/tests/integration/task-log-test.js

+
+    run.later(() => {
+      click('[data-test-log-action="stdout"]');
+      run.later(run, run.cancelTimers, commonProps.interval * 6);


Might as well use just interval here too?

backspace · 2020-04-28T21:47:51Z

ui/tests/integration/task-log-test.js

+      click('[data-test-log-action="stderr"]');
+      run.later(run, run.cancelTimers, commonProps.interval * 5);
+    }, allowedConnectionTime / 2);
+


Nice test setup!

backspace · 2020-04-28T21:48:19Z

ui/tests/integration/task-log-test.js

+      'Server request for stderr'
+    );
+
+    assert.notOk(find('[data-test-connection-error]'), 'An error message is shown');


The explanation should include “not” I think, but maybe it’s not necessary at all?

You caught me copy/pasting. The description probably isn't necessary, but I was following the convention of the file.

…onably Now options can be provided without also having to remember to pass credentials. This is convenient for abort controller signals.

Typically a failover means that the client can't be reached. However, if the client does eventually return after the timeout period, the log will stream indefinitely. This fixes that using an API that wasn't broadly available at the time this was first written.

…nnection error This would happen because a no connection error happens after the second request fails, but that's because it's assumed the second request is to a server node. However, if a user clicks stderr fast enough, the first and second requests are both to the client node. This changes the logic to check if the request is to the server before deeming log streaming a total failure.

pySilver · 2020-06-04T21:44:54Z

Quick question – it this change is available in 0.11.2 ? Somehow I've started having "Cannot fetch logs" all over again after upgrading from 0.11.1 :(

DingoEatingFuzz · 2020-06-04T22:46:53Z

Yes, this is in 0.11.2.

Since you're still seeing log funkiness, can you open a new issue with more details?

github-actions · 2023-01-03T02:14:31Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

DingoEatingFuzz added the theme/ui label Apr 28, 2020

DingoEatingFuzz requested review from backspace and a team April 28, 2020 15:53

backspace approved these changes Apr 28, 2020

View reviewed changes

DingoEatingFuzz added 5 commits April 29, 2020 17:34

Always pass credential in fetch requests, but also treat options reas…

6af31fe

…onably Now options can be provided without also having to remember to pass credentials. This is convenient for abort controller signals.

Clicking stdout/stderr when already on that tab is now a noop

e186554

Make the no connection error on the logs page dismissable

9557475

DingoEatingFuzz force-pushed the b-ui/ui-log-races branch from 30a12a9 to 9557475 Compare April 30, 2020 00:38

DingoEatingFuzz merged commit 25a74ec into master Apr 30, 2020

DingoEatingFuzz deleted the b-ui/ui-log-races branch April 30, 2020 01:06

github-actions bot locked as resolved and limited conversation to collaborators Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI: Log streaming bug fix medley #7820

UI: Log streaming bug fix medley #7820

DingoEatingFuzz commented Apr 28, 2020

backspace left a comment

backspace Apr 28, 2020

backspace Apr 28, 2020

backspace Apr 28, 2020

backspace Apr 28, 2020

DingoEatingFuzz Apr 30, 2020

pySilver commented Jun 4, 2020

DingoEatingFuzz commented Jun 4, 2020

github-actions bot commented Jan 3, 2023

UI: Log streaming bug fix medley #7820

UI: Log streaming bug fix medley #7820

Conversation

DingoEatingFuzz commented Apr 28, 2020

Bug 1: Switching from stdout to stderr before the stdout request fails or times out causes an error on the stderr log stream despite successfully streaming the logs.

Bug 2: The "Cannot fetch log" error persists when switching between stdout/stderr

Bug 3: Failing over to a server request after a client request takes longer than the client timeout threshold but still eventually responds causes two parallel stream requests, one which never closes.

Bug 4: Clicking stdout/stderr when already on stdout/stderr stops the log streamer.

backspace left a comment

Choose a reason for hiding this comment

backspace Apr 28, 2020

Choose a reason for hiding this comment

backspace Apr 28, 2020

Choose a reason for hiding this comment

backspace Apr 28, 2020

Choose a reason for hiding this comment

backspace Apr 28, 2020

Choose a reason for hiding this comment

DingoEatingFuzz Apr 30, 2020

Choose a reason for hiding this comment

pySilver commented Jun 4, 2020

DingoEatingFuzz commented Jun 4, 2020

github-actions bot commented Jan 3, 2023