Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to add ON CLUSTER clause when SHOW CREATE TABLE contains \n #642

Closed
NickStepanov opened this issue Apr 21, 2023 · 4 comments
Closed
Assignees
Milestone

Comments

@NickStepanov
Copy link

Hi,

Description.

When using a restore command with the restore_schema_on_cluster option set,
the DROP TABLE does work fine and the correct ON CLUSTER clause is being inserted,
but the next command CREATE TABLE doesn't have the ON CLUSTER clause.
If I look at the backup_name/metadata/db/table.json, I can see the table definition in the "query" sections contains \n's, for example:

"query": "CREATE TABLE db.table\n(\n col1 Date, .....

I suspect that the regex is not not working properly when adding the ON CLUSTER clause because of these new line symbols. I haven't noticed this behaviour when the table definition doesn't contain new line symbols.

Versions used.

Clickhouse server 22.8.8.3
Clickhouse-backup 2.2.0-2.2.2.

Additional comments.

I don't quite understand how this happens, but the SHOW CREATE TABLE shows \n symbols, I am guessing this is due to how Clickhouse formats queries?

Nik.

@Slach
Copy link
Collaborator

Slach commented Apr 21, 2023

Could you share:

LOG_LEVEL=debug clickhouse-backup restore your-backup-name

?

@Slach Slach self-assigned this Apr 21, 2023
@NickStepanov
Copy link
Author

Here is the log of the restore:

LOG_LEVEL=debug /usr/local/bin/build/linux/amd64/clickhouse-backup restore_remote --schema --tables=db.table shard01_db_table
2023/04/20 13:09:52.099271  info clickhouse connection prepared: tcp://clickhousehost:9000 run ping logger=clickhouse
2023/04/20 13:09:52.129436  info clickhouse connection open: tcp://clickhousehost:9000 logger=clickhouse
2023/04/20 13:09:52.129762  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2023/04/20 13:09:52.135646  info SELECT * FROM system.disks; logger=clickhouse
2023/04/20 13:09:52.143735  info SELECT max(toInt64(bytes_on_disk * 1.02)) AS max_file_size FROM system.parts logger=clickhouse
2023/04/20 13:09:52.164388  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros' logger=clickhouse
2023/04/20 13:09:52.174551  info SELECT * FROM system.macros logger=clickhouse
2023/04/20 13:09:52.189795 debug /tmp/.clickhouse-backup-metadata.cache.S3 load 24 elements logger=s3
2023/04/20 13:09:53.632286 debug /tmp/.clickhouse-backup-metadata.cache.S3 save 24 elements logger=s3
2023/04/20 13:09:53.635557 debug prepare table METADATA concurrent semaphore with concurrency=2 len(tablesForDownload)=1 backup=shard01_db_table logger=backuper operation=download
2023/04/20 13:09:53.652838  info done                      backup=shard01_db_table duration=17ms logger=backuper operation=download size=734B table_metadata=db.table
2023/04/20 13:09:53.654778 debug shard01_db_table/access.tar not exists on remote storage, skip download logger=downloadBackupRelatedDir
2023/04/20 13:09:53.656536 debug shard01_db_table/configs.tar not exists on remote storage, skip download logger=downloadBackupRelatedDir
2023/04/20 13:09:53.656838  info done                      backup=shard01_db_table duration=1.513s logger=backuper operation=download size=734B
2023/04/20 13:09:53.656900  info clickhouse connection closed logger=clickhouse
2023/04/20 13:09:53.657029  info clickhouse connection prepared: tcp://clickhousehost:9000 run ping logger=clickhouse
2023/04/20 13:09:53.663802  info clickhouse connection open: tcp://clickhousehost:9000 logger=clickhouse
2023/04/20 13:09:53.663854  info SELECT * FROM system.disks; logger=clickhouse
2023/04/20 13:09:53.673893  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros' logger=clickhouse
2023/04/20 13:09:53.684385  info SELECT * FROM system.macros logger=clickhouse
2023/04/20 13:09:53.692420  info SELECT engine FROM system.databases WHERE name = 'db' logger=clickhouse
2023/04/20 13:09:53.700042  info DROP TABLE IF EXISTS `db`.`table` ON CLUSTER 'qa_cluster'  logger=clickhouse
2023/04/20 13:09:53.819206  info CREATE DATABASE IF NOT EXISTS `db` ON CLUSTER 'qa_cluster' logger=clickhouse
2023/04/20 13:09:53.934663  info CREATE TABLE db.table ( ... ) SETTINGS index_granularity = 8192 logger=clickhouse
2023/04/20 13:09:53.968300  info done                      backup=shard01_db_table operation=restore
2023/04/20 13:09:53.968492  info clickhouse connection closed logger=clickhouse

The backup was done like this:

clickhouse-backup create_remote --tables=db.table shard01_db_table

And here is the config on the restore side:

general:
  remote_storage: s3
  log_level: info 
  restore_schema_on_cluster: qa_cluster
s3:
  ....
clickhouse:
  username: ...                                 # CLICKHOUSE_USERNAME
  password: ...          # CLICKHOUSE_PASSWORD
  host: hostname    # CLICKHOUSE_HOST
  port: 9000
  secure: true
  freeze_by_part: True
  check_replicas_before_attach: False
  skip_tables: 
     ...

@Slach
Copy link
Collaborator

Slach commented Apr 22, 2023

Some comments about backup config:

# slower freeze, default value `False`
# change to False
  freeze_by_part: True  


# unsafe when restore ReplicatedMergeTree, it could duplicate data, 
# cause during ATTACH PART no part deduduplication apply  
# change to True, 
#  use False only when you try to attach backup data to exists table with possible duplication, 
# or attach only not exists partition)

  check_replicas_before_attach: False 

Could you share backup_name/metadata/db/table.json as is?

@NickStepanov
Copy link
Author

NickStepanov commented Apr 22, 2023

Noted the comments on freeze_by_part parameter, thank you.

Here is the backup_name/metadata/db/table.json, redacted out just some column names and shortened the partition list for readability:

cat /backup_name/metadata/db/table.json 
{
	"files": {
		"default": [
			"default_20200309_211_247_5.tar",
			"default_20200309_248_250_1.tar",
			.....
			"default_20230419_248_250_1.tar"
		]
	},
	"table": "table",
	"database": "db",
	"parts": {
		"default": [
			{
				"name": "20200309_211_247_5"
			},
			{
				"name": "20200309_248_250_1"
			},
			.....
			{
				"name": "20230419_248_250_1"
			}
		]
	},
	"query": "CREATE TABLE db.table\n(\n    `col1` Date,\n    `col2` Int32 COMMENT,\n    `col3` LowCardinality(Nullable(String)),\n    .....    `colN` UInt8\n)\nENGINE = ReplicatedMergeTree('/tables/{cluster}/{shard}/db.table', '{replica}')\nPARTITION BY toYYYYMMDD(col1)\nORDER BY (col1, col2)\nSETTINGS index_granularity = 8192",
	"size": {
		"default": ...
	},
	"total_bytes": ....,
	"metadata_only": false
}

@Slach Slach added this to the 2.2.5 milestone Apr 24, 2023
@Slach Slach closed this as completed in 6021fe0 Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants