Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIFO order is not respected in DefaultEndpoint#notifyChannelActive() #2598

Closed
leisurelyrcxf opened this issue Jan 17, 2024 · 5 comments · Fixed by #2597
Closed

FIFO order is not respected in DefaultEndpoint#notifyChannelActive() #2598

leisurelyrcxf opened this issue Jan 17, 2024 · 5 comments · Fixed by #2597
Labels
type: bug A general bug
Milestone

Comments

@leisurelyrcxf
Copy link

leisurelyrcxf commented Jan 17, 2024

Bug Report

FIFO order is not respected in DefaultEndpoint#notifyChannelActive()

Current Behavior

A later write from the same user thread may be sent first

@Test
    void writeShouldGuaranteeFIFOOrder() {
        sut.write(Collections.singletonList(new Command<>(CommandType.SELECT, new StatusOutput<>(StringCodec.UTF8))));

        sut.registerConnectionWatchdog(connectionWatchdog);
        doAnswer(i -> sut.write(new Command<>(CommandType.AUTH, new StatusOutput<>(StringCodec.UTF8)))).when(connectionWatchdog)
                .arm();
        when(channel.isActive()).thenReturn(true);

        sut.notifyChannelActive(channel);

        DefaultChannelPromise promise = new DefaultChannelPromise(channel, ImmediateEventExecutor.INSTANCE);

        when(channel.writeAndFlush(any())).thenAnswer(invocation -> {
            if (invocation.getArguments()[0] instanceof RedisCommand) {
                queue.add((RedisCommand) invocation.getArguments()[0]);
            }

            if (invocation.getArguments()[0] instanceof Collection) {
                queue.addAll((Collection) invocation.getArguments()[0]);
            }
            return promise;
        });

        assertThat(queue).hasSize(2).first().hasFieldOrPropertyWithValue("type", CommandType.SELECT);
        assertThat(queue).hasSize(2).last().hasFieldOrPropertyWithValue("type", CommandType.AUTH);
    }
// your stack trace here;

Input Code

Input Code
// your code here;

Expected behavior/code

assertThat(queue).hasSize(2).first().hasFieldOrPropertyWithValue("type", CommandType.SELECT);
assertThat(queue).hasSize(2).last().hasFieldOrPropertyWithValue("type", CommandType.AUTH);

Environment

  • Lettuce version(s): [e.g. 5.0.0.RELEASE, 4.2.2.Final]
  • Redis version: [e.g. 4.0.9]

Possible Solution

Additional context

@okg-cxf
Copy link
Contributor

okg-cxf commented Jan 19, 2024

propose a fix: #2597

@mp911de
Copy link
Collaborator

mp911de commented Jan 22, 2024

Good catch. How did you encounter the issue?

Looking at the PR, it generally makes sense to hold a volatile Channel in a local variable to avoid any races. Generally speaking, while all notification methods should be invoked sequentially, there's nothing that would prevent the code from being called concurrently.

@okg-cxf
Copy link
Contributor

okg-cxf commented Jan 23, 2024

Good catch. How did you encounter the issue?

Looking at the PR, it generally makes sense to hold a volatile Channel in a local variable to avoid any races. Generally speaking, while all notification methods should be invoked sequentially, there's nothing that would prevent the code from being called concurrently.

Hi @mp911de thank you for the reply! I found this issue by reading the source code.

The public <K, V, T> RedisCommand<K, V, T> write(RedisCommand<K, V, T> command) method is called in user threads and can be called concurrently with the netty eventloop io thread. The out-of-order could happen for the following execution order (supposing initially channel = null).

  1. User Thread: write(command1), will be queued to disconnectedBuffer since channel == null;
  2. Netty IO thread: notifyChannelActive()# this.channel = channel; (line 440);
  3. User Thread: write(command2), now he sees the channel is set, and it grabs the shared lock and uses the channel to send command2 immediately;
  4. Netty IO thread: notifyChannelActive()# flushCommands() (line 477), this will flush command1.

Also the notify event methods may be also called concurrently in the scenario where a connection disconnects and a new connection is established since netty by default uses round-robin to determine which event loop thread a connection should bind. But in lettuce, this should never happen since the connecting action is triggered by the channelInactive event, if I understand correctly.

@okg-cxf
Copy link
Contributor

okg-cxf commented Feb 21, 2024

@mp911de any updates?

@tishun tishun added for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Jul 17, 2024
@tishun tishun removed for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Oct 15, 2024
@tishun
Copy link
Collaborator

tishun commented Oct 15, 2024

I've spent some time looking at the test case you've prepared.
I think it is a valid (albeit corner case) scenario, that might result in commands being executed in the wrong order.

Let's continue the discussion in #2597

@tishun tishun added the type: bug A general bug label Oct 15, 2024
@tishun tishun added this to the 7.0.0.RELEASE milestone Oct 15, 2024
@tishun tishun modified the milestones: 7.0.0.RELEASE, 6.6.0.RELEASE Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants