-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clickhouse-keeper + DROP TABLE ... ON CLUSTER ... SYNC, doesn't clean replica as expected #416
Comments
which clickhouse-backup version do you use? could you share |
Atomic databases keep dropped tables for 8 minutes. |
Backup Version and print config (buckets redacted)
For info about how my cluster is setup:
Only major difference is that we are using clickhouse-keeper, instead of regular zookeeper. |
did you run clickhouse-keeper as standalone process or as embedded inside clickhouse-server? |
We use DROP TABLE ... SYNC for atomic databases, to avoid this type of waiting. |
It is embedded currently. Also, the node I "took down" was not one of the 3 with zookeeper (only my first 3 nodes have clickhouse-keeper enabled. Again, not sure if that matters here, but wanted to make sure it was noted |
When i run |
We don't use We didn't test
so, you should use https://github.com/outbrain/zookeepercli and clean /clickhouse/tables/c73a3570-0a95-4c50-b769-ba46d21df393/1/replicas/ip-10-22-180-149.ec2.internal after it run
and share results |
So obviously the issue is these lines
|
It must clean Zookeeper / Clickhouse keeper path could you run clickhouse-client -q 'SYSTEM DROP REPLICA 'ip-10-22-180-149.ec2.internal'
LOG_LEVEL=debug clickhouse-backup restore --rm 2022-03-23T12-00-43 and share results? |
I ran
from both the 'current' node I'm trying to restore and another one, neither seemed to do anything, I still have the replica in in zookeeper
I think I might be approaching this all wrong, please bear with me while I explain my use case. I'm basically trying to replicate total loss of a node. Say I lost an entire instance (ec2 in my case), I'm trying to figure out how to 'recover' that node and get it happy again, ideally with clickhouse-backup. I simulated this by logging into my node, doing a By doing this, there are no existing databases, it's as if it is brand new. The other 2 replicas are up and happy. The first step, as i can see it, is to get my schema correct. When i restore using Once the schema is up, data restoration seems like the easiest part. Either I can stream from an existing node or do the data restore from clickhouse-backup. However I think it's the schema generation that's throwing me off. From what I can see I have the following options:
Am I correct here? Is there something obvious that I'm missing? |
Why do you do it? Instead of just run try to use SYSTEM DROP REPLICA 'ip-10-22-180-149.ec2.internal' FROM ZKPATH '/clickhouse/tables/0ab5ed25-2c59-4158-a316-c4b93c1c40d1/2/replicas';
SELECT * FROM system.zookeeper WHERE path = '/clickhouse/tables/0d933126-89c8-4c6b-af2b-5fcdf27e4419/2/replicas'; and after it
|
Ah! This finally worked. My mistake yesterday when I went to run
I did it from a node that was a different shard (not one in the same shard). Running the SYSTEM DROP REPLICA now properly saw and clenaed up all the entries in zookeeper. Exmaple:
And the
worked beautifully. Thanks for the trouble and helping me sort this all out! |
ok. still need to figure out why DROP TABLE IF EXISTS tutorial.hits_local ON CLUSTER 'prod_cluster' NO DELAY didn't clean replicas in clickhouse-keeper |
That I can't answer. I can certainly test some stuff out if you can point me in the right direction on where to look. I have an easy setup that I've been using to create/bootstrap clusters. Maybe it's just a timing/race condition? |
is issue still reproduced with clickhouse-keeper:22.8 ? |
I've been trying to test backup/restore on a cluster (2 shards x 3 replicas) = 6 nodes
I created the cluster, then imported the hits/visits database on the entire cluster: https://clickhouse.com/docs/en/getting-started/tutorial/
I created a remote backup in s3, then I then destroyed a node, and brought it back up 'blank' with the same ip and configuration.
I then attempted to restore the database, and got the following:
So the schema for the local table failed, and the distributed seemed to work:
This is an issue, however the bigger concern to me is that I went to go check my other nodes, and now the
hits_local
andvisits_local
are gone on ALL nodes.Once node has the schema, but the rows are all gone:
The text was updated successfully, but these errors were encountered: