Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track replication upstream.idle when calling on replica #453

Closed
grafin opened this issue Dec 6, 2023 · 9 comments
Closed

Track replication upstream.idle when calling on replica #453

grafin opened this issue Dec 6, 2023 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@grafin
Copy link
Member

grafin commented Dec 6, 2023

Currently any read call, routed to replica will be executed not regarding if the replica is connected to any other instance. In my case:

  1. Have a shard with 3 instances, one (A) being master (box.info.ro == false) others (B) and (C) being replicas (box.info.ro == true)
  2. For some unrelated to vshard reason all appliers on replica (B) broke down not being able to recover without manual instance restart. All box.info.replication[n].upstream.status are disconnected. (n. don't use DML in triggers on ro instances, if the row is received via applier)
  3. All reads, routed to replica (B) returned very old data (instance (B) was broken for several hours).

We could prevent such dirty reads, using available data, as stated in our docs:

Therefore, in a healthy replication setup, idle should never exceed replication_timeout: if it does, either the replication is lagging seriously behind, because the master is running ahead of the replica, or the network link between the instances is down.

It would be great if we could prevent those dirty reads by not routing any request to replicas (ro instances) which are disconnected from everyone, or maybe even give the ability to configure some max_replication_idle or max_replication_lag and return errors when trying to process a request on replica whose current max(replication[n].upstream.idle/lag) exceeds configured value.

@grafin grafin added the feature A new functionality label Dec 6, 2023
@Totktonada
Copy link
Member

Can't we lean just on box.info.status?

NB: My old findings regarding replication monitoring: tarantool/doc#2604.

@sergepetrenko
Copy link
Contributor

Can't we lean just on box.info.status?

I assume you meant upstream.status?
Still, no. upstream.status may be 'follow' but the replica still may have a huge lag to this upstream. For example, when master performs a batch of changes, which replica can't apply as fast as master did.

@grafin
Copy link
Member Author

grafin commented Dec 8, 2023

During fixing the initial problem i get the following upstream info in my case:

id: 1
    uuid: bbbbbbbb-bbbb-4000-b000-000000000001
    lsn: 60719
    upstream:
      peer: admin@localhost:48809
      lag: 0.0002281665802002
      status: stopped
      idle: 169.65149075794
      message: Can't modify data on a read-only instance - box.cfg.read_only is true

So monitoring upstream.lag doesn't solve the issue, and as @sergepetrenko mentioned upstream.status is not very effective too. Looks like it is best to check upstream.idle.

@grafin grafin changed the title Track replication status/lag when calling on replica Track replication upstream.idle when calling on replica Dec 8, 2023
@Totktonada
Copy link
Member

Totktonada commented Dec 8, 2023

Can't we lean just on box.info.status?

I assume you meant upstream.status?

No, but I got the answer :)

In my imagination tarantool should have some max_data_lag option to go into some 'stale data' state in such cases and broadcast the new state to clients.

Now, we are going to solve the same monitoring tasks in each client on the application level.

@sergepetrenko
Copy link
Contributor

sergepetrenko commented Dec 11, 2023

Looks like it is best to check upstream.idle.

It should be a combination of both lag and idle, whichever is greater. Or, as @Totktonada suggests, we may look at box.info.status == 'orphan', but then we have to make replicas go orphan if they see they stopped catching up with the master (this is a breaking change).

Or it can be box.info.status == 'stale_data'.

@Serpentian
Copy link
Contributor

After discussing with @sergepetrenko, it looks like we should go with the following condition for not healthy replica. If status of upstream to master is not follow or upstream.lag is > N, where N will be constant for now, then we consider replica as not healthy. If smb asks, we may introduce additional callback to mark replica as not healthy, e.g. not to route requests to replica, if cartridge config have not been properly applied.

I see this feature as automatic lowering of priority (temporary) on router if replica is not healthy. We'll increase priority back, if node considers itself healthy. No additional cfg parameters for now. @Gerold103, your opinion on this?

@R-omk
Copy link
Contributor

R-omk commented Sep 20, 2024

what is "priority"? if it's something like the probability of sending a request to this host, then I would prefer the probability to be zero if we agreed that too laggy replicas are not suitable for process request at all. Otherwise we will receive "flashing" data depending on which replica the request lands on.

@Serpentian
Copy link
Contributor

Serpentian commented Sep 20, 2024

what is "priority"?

Firstly we try to make request to the most prioritized replica and then to another ones, if most prioritized one failed. If you don't want for replica to serve any kind of requests, you can just disable the storage. But most users would prefer request to be non failing, so that we go to the non healthy replica, if requests to other instances fail.

Here's the conflict between consistency and availability

@Gerold103
Copy link
Collaborator

Sounds all good to me.

@Serpentian Serpentian self-assigned this Sep 24, 2024
@sergepetrenko sergepetrenko added bug Something isn't working and removed feature A new functionality labels Dec 3, 2024
Serpentian added a commit to Serpentian/vshard that referenced this issue Dec 3, 2024
Before this patch router didn't take into account the state of
box.info.replication of the storage, when routing requests to it.
From now on router automatically lowers the priority of replica,
when router supposes, that connection from the master to a replica
is dead (status or idle > 30) or too slow (lag is > 30 sec).

We also change REPLICA_NOACTIVITY_TIMEOUT from 5 minutes to 30 seconds.
This is needed to speed up how quickly a replica notices the master's
change. Before the patch the non-master never knew, where the master
currently is. Now, since we try to check status of the master's upstream,
we need to find this master in service_info via conn_manager. Since after
that replica doesn't do any requests to master, the connection is collected
by conn_manager in collect_idle_conns after 30 seconds. Then router's
failover calls service_info one more time and non-master locates master,
which may have already changed.

This patch allows to increase the consistency of read requests and
decreases the probability of reading a stale data.

Closes tarantool#453
Closes tarantool#487

NO_DOC=<bugfix>
Serpentian added a commit to Serpentian/vshard that referenced this issue Dec 3, 2024
Before this patch router didn't take into account the state of
box.info.replication of the storage, when routing requests to it.
From now on router automatically lowers the priority of replica,
when router supposes, that connection from the master to a replica
is dead (status or idle > 30) or too slow (lag is > 30 sec).

We also change REPLICA_NOACTIVITY_TIMEOUT from 5 minutes to 30 seconds.
This is needed to speed up how quickly a replica notices the master's
change. Before the patch the non-master never knew, where the master
currently is. Now, since we try to check status of the master's upstream,
we need to find this master in service_info via conn_manager. Since after
that replica doesn't do any requests to master, the connection is collected
by conn_manager in collect_idle_conns after 30 seconds. Then router's
failover calls service_info one more time and non-master locates master,
which may have already changed.

This patch allows to increase the consistency of read requests and
decreases the probability of reading a stale data.

Closes tarantool#453
Closes tarantool#487

NO_DOC=bugfix
Gerold103 pushed a commit that referenced this issue Dec 3, 2024
Before this patch router didn't take into account the state of
box.info.replication of the storage, when routing requests to it.
From now on router automatically lowers the priority of replica,
when router supposes, that connection from the master to a replica
is dead (status or idle > 30) or too slow (lag is > 30 sec).

We also change REPLICA_NOACTIVITY_TIMEOUT from 5 minutes to 30 seconds.
This is needed to speed up how quickly a replica notices the master's
change. Before the patch the non-master never knew, where the master
currently is. Now, since we try to check status of the master's upstream,
we need to find this master in service_info via conn_manager. Since after
that replica doesn't do any requests to master, the connection is collected
by conn_manager in collect_idle_conns after 30 seconds. Then router's
failover calls service_info one more time and non-master locates master,
which may have already changed.

This patch allows to increase the consistency of read requests and
decreases the probability of reading a stale data.

Closes #453
Closes #487

NO_DOC=bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants