You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It has been noticed that VTOrc sometimes has spurious logs like - DiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nil.
I have looked at the code and I know how this is happening. Let's say initially you have a vttablet with hostname h1, port p1, and alias a1. Then, in the VTOrc backend, you would have 1 row in vitess_tablet for this tablet having all the three values h1, p1 and a1 and you would have a record in database_instance for this tablet with the values h1, p1 in it.
Now, let's say that this tablet gets evicted by Kubernetes and it restarts on a different machine. The tablet's alias remains the same, but the host and port would change, let's say to h2 and p2.
When VTOrc tries to refresh the information from the topo-server it would see this new record for the vttablet and try to insert a row into vitess_tablet with the values h2, p2 and a1. Since there is a uniqueness constraint on alias we end up replacing the row and the first row is automatically removed. We also load the MySQL information for this tablet and populate the data in database_instance with the values h2, p2. We don't store the alias in this table, so no uniqueness constraint fails and we have both the rows in the table now!
Now, we run the check to see what all tablets we need to forget about. This check runs by looking at the tablet aliases only and since the tablet alias for the given tablet didn't change, we conclude we have nothing to forget about.
Overall, this sequence of steps leads to a row in the database_instance table that should have actually been removed and is in the table without having a corresponding row in vitess_tablet. ReadOutdatedInstanceKeys picks up on this record and tries to refresh its information, but this errors out with DiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nil
Reproduction Steps
Described in the description.
Binary Version
main
Operating System and Environment details
all
Log Fragments
No response
The text was updated successfully, but these errors were encountered:
Overview of the Issue
It has been noticed that VTOrc sometimes has spurious logs like -
DiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nil
.I have looked at the code and I know how this is happening. Let's say initially you have a vttablet with hostname
h1
, portp1
, and aliasa1
. Then, in the VTOrc backend, you would have 1 row invitess_tablet
for this tablet having all the three valuesh1, p1 and a1
and you would have a record indatabase_instance
for this tablet with the valuesh1, p1
in it.Now, let's say that this tablet gets evicted by Kubernetes and it restarts on a different machine. The tablet's alias remains the same, but the host and port would change, let's say to
h2
andp2
.When VTOrc tries to refresh the information from the topo-server it would see this new record for the vttablet and try to insert a row into
vitess_tablet
with the valuesh2, p2 and a1
. Since there is a uniqueness constraint onalias
we end up replacing the row and the first row is automatically removed. We also load the MySQL information for this tablet and populate the data indatabase_instance
with the valuesh2, p2
. We don't store the alias in this table, so no uniqueness constraint fails and we have both the rows in the table now!Now, we run the check to see what all tablets we need to forget about. This check runs by looking at the tablet aliases only and since the tablet alias for the given tablet didn't change, we conclude we have nothing to forget about.
Overall, this sequence of steps leads to a row in the
database_instance
table that should have actually been removed and is in the table without having a corresponding row invitess_tablet
.ReadOutdatedInstanceKeys
picks up on this record and tries to refresh its information, but this errors out withDiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nil
Reproduction Steps
Described in the description.
Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: