Skip to content

Latest commit

 

History

History
471 lines (310 loc) · 21.7 KB

配置参数详解-Ⅲ.md

File metadata and controls

471 lines (310 loc) · 21.7 KB

配置参数详解-Ⅲ

ApplyMySQLPromotionAfterMasterFailover

类型: bool

默认值: true

当为true 时, orchestrator 将在选举出的新主库上执行reset slave allset read_only=0 . 默认: true . 当该参数为true 时, 将覆盖MasterFailoverDetachSlaveMasterHost .

Should orchestrator take upon itself to apply MySQL master promotion: set read_only=0, detach replication, etc.

另见Configuration: Recovery

PreventCrossDataCenterMasterFailover

类型: bool

默认值: false

默认false . 当为true 时, orchestrator将只用与故障集群主库位于同一DC的从库替换故障的主库. 它将尽最大努力从同一DC中找到一个替代者, 如果找不到, 将中止(失败)故障转移. 另请参阅DetectDataCenterQueryDataCenterPattern配置变量.

When true (default: false), cross-DC master failover are not allowed, orchestrator will do all it can to only fail over within same DC, or else not fail over at all.

PreventCrossRegionMasterFailover

类型: bool

默认值: false

默认false . 当为true 时, orchestrator将只用与故障集群主库位于同一region的从库替换故障的主库. 它将尽最大努力找到同一region的替代者, 如果找不到, 将中止(失败)故障转移. 另请参阅DetectRegionQueryRegionPattern配置变量.

When true (default: false), cross-region master failover are not allowed, orchestrator will do all it can to only fail over within same region, or else not fail over at all.

MasterFailoverLostInstancesDowntimeMinutes

类型: uint

默认值: 0

个人理解为一个"置为下线"的时间. 一个节点发生或failover后, 不应该立即再次进行恢复(他这里包括主库又挂了, 和从库丢失).

MHA也有类似机制, 默认8小时内不能再次failover

Number of minutes to downtime any server that was lost after a master failover (including failed master & lost replicas). 0 to disable

这个参数在源码里已经没有地方使用了.. commit记录, 但是没找到什么时候删掉的

MasterFailoverDetachSlaveMasterHost

类型: bool

默认值: false

当该参数为true 时, orchestrator将对被选举为新主库的节点执行detach-replica-master-host(这确保了即使旧主库"复活了", 新主库也不会试图从旧主库复制数据). 默认值: false. 如果ApplyMySQLPromotionAfterMasterFailover为真,这个参数将失去意义. MasterFailoverDetachReplicaMasterHost是它的一个别名.

MasterFailoverDetachReplicaMasterHost

类型: bool

默认值: false

当该参数为true 时, orchestrator将对被选举为新主库的节点执行detach-replica-master-host(这确保了即使旧主库"复活了", 新主库也不会试图从旧主库复制数据). 默认值: false. 如果ApplyMySQLPromotionAfterMasterFailover为真,这个参数将失去意义.

Should orchestrator issue a detach-replica-master-host on newly promoted master (this makes sure the new master will not attempt to replicate old master if that comes back to life). Defaults 'false'. Meaningless if ApplyMySQLPromotionAfterMasterFailover is 'true'.

FailMasterPromotionOnLagMinutes

类型: uint

默认值: 0

默认0 (not failing promotion). 如果候选replica落后太多, 该参数可以用于将选举置为失败状态. 例如: 由于种种原因从库(候选主库)故障了5个小时(数据存在5小时延迟), 随后, 主库出现故障. 这种情况下, 我们可能希望阻止failover, 以便恢复那5个小时的复制延迟. 要使用这个参数, 你必须设置ReplicationLagQuery 并使用类似于pt-heatbeat 的心跳机制. 因为当复制故障时, SHOW SLAVE STATUS 中的Seconds_behind_master 不会显示延迟(会显示为Null).

when > 0, fail a master promotion if the candidate replica is lagging >= configured number of minutes.

FailMasterPromotionIfSQLThreadNotUpToDate

类型: bool

默认值: false

如果在故障发生时, 所有的从库都是滞后的, 即使是拥有最新数据的、被选举为新主库的候选节点也可能有未应用的中继日志. 在这样的节点上执行reset slave all会丢失中继日志数据. 当这个参数为true是, promotion is aborted with error

when true, and a master failover takes place, if candidate master has not consumed all relay logs, promotion is aborted with error

DelayMasterPromotionIfSQLThreadNotUpToDate

类型: bool

默认值: false

如果在故障发生时, 所有的从库都是滞后的, 即使是拥有最新数据的、被选举为新主库的候选节点也可能有未应用的中继日志. 当该参数为true 时, orchestrator 将等待SQL thread应用完所有relay log, 然后再将候选从库提升为新主库. FailMasterPromotionIfSQLThreadNotUpToDateDelayMasterPromotionIfSQLThreadNotUpToDate是相互排斥的.

when true, and a master failover takes place, if candidate master has not consumed all relay logs, delay promotion until the sql thread has caught up

PostponeSlaveRecoveryOnLagMinutes

类型: uint

默认值: 0

在崩溃恢复时, 延迟超过给定分钟的副本只会在恢复过程的后期恢复, 在 master/intermediate master 被选择并执行进程之后. 值 0 禁用此功能. 默认值: 0. 是PostponeReplicaRecoveryOnLagMinutes的别名.

PostponeReplicaRecoveryOnLagMinutes

类型: uint

默认值: 0

在崩溃恢复时, 延迟超过给定分钟的副本只会在恢复过程的后期恢复, 在 master/intermediate master 被选择并执行进程之后. 值 0 禁用此功能. 默认值: 0. PostponeSlaveRecoveryOnLagMinutes 是它的别名.

On crash recovery, replicas that are lagging more than given minutes are only resurrected late in the recovery process, after master/IM has been elected and processes executed. Value of 0 disables this feature

OSCIgnoreHostnameFilters

类型: []string

默认值: []string{}

直译: OSC 副本建议将忽略与给定模式匹配的副本主机名

OSC replicas recommendation will ignore replica hostnames matching given patterns

官方文档没有介绍此参数.

源码go/inst/instance_dao.go

// filterOSCInstances will filter the given list such that only replicas fit for OSC control remain.
// filterOSCInstances将过滤给定的列表,以便只保留适合OSC控制的副本。
func filterOSCInstances(instances [](*Instance)) [](*Instance) {
	result := [](*Instance){}
	for _, instance := range instances {
		if FiltersMatchInstanceKey(&instance.Key, config.Config.OSCIgnoreHostnameFilters) {
                  // FiltersMatchInstanceKey 这个代码之前看过, 他会先迭代filters, 判断每个元素值是否为ip地址
                  // 如果是, 精确匹配, 如果不是, 则将其视为正则表达式进行匹配
                  // 匹配, 丢弃
			continue
		}
		if instance.IsBinlogServer() {
                  // 是binlogserver, 丢弃
			continue
		}
		if !instance.IsLastCheckValid {
                  // 最后一次check is not valid, 丢弃
			continue
		}
		result = append(result, instance)
	}
	return result
}

filterOSCInstances被GetClusterOSCReplicas调用

// GetClusterOSCReplicas returns a heuristic list of replicas which are fit as controll replicas for an OSC operation.
// These would be intermediate masters
func GetClusterOSCReplicas(clusterName string) ([](*Instance), error) {

调用GetClusterOSCReplicas的有三处

  • which-cluster-osc-replicas 命令
	case registerCliCommand("which-cluster-osc-replicas", "Information", `Output a list of replicas in a cluster, that could serve as a pt-online-schema-change operation control replicas`):
		{
			clusterName := getClusterName(clusterAlias, instanceKey)
			instances, err := inst.GetClusterOSCReplicas(clusterName)
			if err != nil {
				log.Fatale(err)
			}
			for _, clusterInstance := range instances {
				fmt.Println(clusterInstance.Key.DisplayString())
			}
		}
  • ClusterOSCReplicas函数
// ClusterOSCReplicas returns heuristic list of OSC replicas
func (this *HttpAPI) ClusterOSCReplicas(params martini.Params, r render.Render, req *http.Request) {
	clusterName, err := figureClusterName(getClusterHint(params))
	if err != nil {
		Respond(r, &APIResponse{Code: ERROR, Message: fmt.Sprintf("%+v", err)})
		return
	}

	instances, err := inst.GetClusterOSCReplicas(clusterName)
	if err != nil {
		Respond(r, &APIResponse{Code: ERROR, Message: fmt.Sprintf("%+v", err)})
		return
	}

	r.JSON(http.StatusOK, instances)
}
  • GetClusterHeuristicLag函数
// GetClusterHeuristicLag returns a heuristic lag for a cluster, based on its OSC replicas
func GetClusterHeuristicLag(clusterName string) (int64, error) {
	instances, err := GetClusterOSCReplicas(clusterName)
	if err != nil {
		return 0, err
	}
	return GetInstancesMaxLag(instances)
}

先看which-cluster-osc-replicas 的作用

  Output a list of replicas in same cluster as given instance, that would server as good candidates as control replicas
  for a pt-online-schema-change operation.
  Those replicas would be used for replication delay so as to throtthe osc operation. Selected replicas will include,
  where possible: intermediate masters, their replicas, 3rd level replicas, direct non-intermediate-master replicas.

  orchestrator -c which-cluster-osc-replicas -i instance.to.check.com

  orchestrator -c which-cluster-osc-replicas
      -i not given, implicitly assumed local hostname

  orchestrator -c which-cluster-osc-replicas -alias some_alias
      assuming some_alias is a known cluster alias (see ClusterNameToAlias or DetectClusterAliasQuery configuration)

输出与给定实例相同的集群中的副本列表, 该列表将作为 pt-online-schema-change 操作的控制副本作为良好的候选服务器.

这些副本将用于复制延迟, 以限制 osc 操作. 在可能的情况下, 选定的副本将包括: 中间主副本、它们的副本、第 3 级副本、直接非中间主副本.

真相大白了, 原来是给pt-osc用的. 就是给pt-osc一个从库列表, 用来监控复制延迟呗

那应该还有给gh-ost用的

ClusterOSCReplicas

this.registerAPIRequest(m, "cluster-osc-slaves/:clusterHint", this.ClusterOSCReplicas)

这就是个接口. 和上面那个命令一样.

GetClusterHeuristicLag

	case registerCliCommand("get-cluster-heuristic-lag", "Information", `For a given cluster (indicated by an instance or alias), output a heuristic "representative" lag of that cluster`):
		{
			clusterName := getClusterName(clusterAlias, instanceKey)
			lag, err := inst.GetClusterHeuristicLag(clusterName)
			if err != nil {
				log.Fatale(err)
			}
			fmt.Println(lag)
		}

get-cluster-heuristic-lag命令作用

  For a given cluster (indicated by an instance or alias), output a heuristic "representative" lag of that cluster.
  The output is obtained by examining the replicas that are member of "which-cluster-osc-replicas"-command, and
  getting the maximum replica lag of those replicas. Recall that those replicas are a subset of the entire cluster,
  and that they are ebing polled periodically. Hence the output of this command is not necessarily up-to-date
  and does not represent all replicas in cluster. Examples:

  orchestrator -c get-cluster-heuristic-lag -i instance.that.is.part.of.cluster.com

  orchestrator -c get-cluster-heuristic-lag
      -i not given, implicitly assumed local host, cluster implied

  orchestrator -c get-cluster-heuristic-lag -alias some_alias
      assuming some_alias is a known cluster alias (see ClusterNameToAlias or DetectClusterAliasQuery configuration)

  对于一个给定的集群(由实例或别名表示), 输出一个该集群的启发式 "代表 "滞后.

  该输出是通过检查作为 "which-cluster-osc-replicas "命令成员的副本来获得的, 并获得这些副本的最大延迟.

  得到这些副本的最大延迟. 回顾一下, 这些副本是整个集群的一个子集.

  而且它们是定期轮询的. 因此, 这个命令的输出不一定是最新的, 并不代表集群中的所有副本

GraphiteAddr

类型: string

默认值: ""

指的是Graphite监控 地址. 如果配置了这个参数, orch的metrics会自动写到这个地址

Optional; address of graphite port. If supplied, metrics will be written here

GraphitePath

类型: string

默认值: ""

graphite路径的前缀. 可以包括{hostname}魔法占位符

Prefix for graphite path. May include {hostname} magic placeholder

GraphiteConvertHostnameDotsToUnderscores

类型: bool

默认值: true

如果为真, 在用于graphite path前, hostname中的. 将被替换为_

If true, then hostname's dots are converted to underscores before being used in graphite path

GraphitePollSeconds

类型: int 为啥不是uint

默认值: 60

监控数据写入的间隔. 0禁用

Graphite writes interval. 0 disables.

URLPrefix

类型: string

默认值: ""

URL prefix to run orchestrator on non-root web path, e.g. /orchestrator to put it behind nginx.

代码中会确保前缀以"/"开头, 且没有后缀.

    if this.URLPrefix != "" {
        // Ensure the prefix starts with "/" and has no trailing one.
        this.URLPrefix = strings.TrimLeft(this.URLPrefix, "/")
        this.URLPrefix = strings.TrimRight(this.URLPrefix, "/")
        this.URLPrefix = "/" + this.URLPrefix
    }

DiscoveryIgnoreReplicaHostnameFilters

类型: []string

默认值: []string{}

Discovery从库时会忽略改该表中正则匹配的主机. 比如orch和列表中的从库网络不通, 导致无法完成发现, 或者类似DM, canal等伪装成slave的组件, 它们也不需要被"发现"

Regexp filters to apply to prevent auto-discovering new replicas. Usage: unreachable servers due to firewalls, applications which trigger binlog dumps

DiscoveryIgnoreMasterHostnameFilters

类型: []string

默认值: []string{}

Discovery主库时会忽略该列表中正则匹配的主机. 列入, 你的主库需要暂时从其他机器复制一些数据, 那么可以将该主库临时添加到这个列表

Regexp filters to apply to prevent auto-discovering a master. Usage: pointing your master temporarily to replicate some data from external host

DiscoveryIgnoreHostnameFilters

类型: []string

默认值: []string{}

Discovery时会忽略该列表中正则匹配的主机.

Regexp filters to apply to prevent discovering instances of any kind

ConsulAddress

类型: string

默认值: ""

Consul地址

Address where Consul HTTP api is found. Example: 127.0.0.1:8500

ConsulScheme

类型: string

默认值: http

Scheme (http or https) for Consul

ConsulAclToken

类型: string

默认值: ""

ACL token used to write to Consul KV

ConsulCrossDataCenterDistribution

类型: bool

默认值: false

详见Consul specific

should orchestrator automatically auto-deduce all consul DCs and write KVs in all DCs

ConsulKVStoreProvider

类型: string

默认值: consul

详见Consul Transaction support

Consul KV store provider (consul or consul-txn), default: "consul"

ConsulMaxKVsPerTransaction

类型: int

默认值: 5

详见Consul Transaction support

Maximum number of KV operations to perform in a single Consul Transaction. Requires the "consul-txn" ConsulKVStoreProvider

ZkAddress

类型: string

默认值: ""

zookeeper地址. 注释写着还不支持. 但是看着已经有一些代码了. 不清楚

UNSUPPERTED YET. Address where (single or multiple) ZooKeeper servers are found, in `srv1[:port1][,srv2[:port2]...]` format. Default port is 2181. Example: srv-a,srv-b:12181,srv-c

KVClusterMasterPrefix

类型: string

默认值: mysql/master

KV store中集群主库前缀

Prefix to use for clusters' masters entries in KV stores (internal, consul, ZK), default: "mysql/master"

WebMessage

类型: string

默认值: ""

如果提供, 将显示在所有网页的标题栏下方

If provided, will be shown on all web pages below the title bar

MaxConcurrentReplicaOperations

类型: int

默认值: 5

replica上的最大并发操作数

Maximum number of concurrent operations on replicas

PR openark/orchestrator#1100

image

image

EnforceExactSemiSyncReplicas

类型: bool

默认值: false

决定是否执行严格的半同步复制拓扑结构的标志. 如果启用, LockedSemiSyncMasterMasterWithTooManyReplicas的恢复将在副本上启用和禁用半同步, 以根据优先级顺序完全匹配所需的拓扑结构.

If true, semi-sync replicas will be enabled/disabled to match the wait count in the desired priority order; this applies to LockedSemiSyncMaster and MasterWithTooManySemiSyncReplicas

另见Semi-sync replicas (rpl_semi_sync_slave_enabled)Failure detection

RecoverLockedSemiSyncMaster

类型: bool

默认值: false

决定是否从LockedSemiSyncMaster情况下恢复的标志. 如果启用, LockedSemiSyncMaster的恢复将按照优先级顺序在副本上启用(但绝不会禁用)半同步, 以匹配主库的等待计数(rpl_semi_sync_master_wait_for_slave_count). 如果EnforceExactSemiSyncReplicas被设置了, 这个选项就没有效果. 如果你想只处理半同步复制太少的情况, 而不是太多的话, 这个选项很有用.

If true, orchestrator will recover from a LockedSemiSync state by enabling semi-sync on replicas to match the wait count; this behavior can be overridden by EnforceExactSemiSyncReplicas

另见Semi-sync replicas (rpl_semi_sync_slave_enabled)Failure detection

ReasonableLockedSemiSyncMasterSeconds

类型: uint

默认值: 与ReasonableReplicationLagSeconds相同

触发LockedSemiSyncMaster条件的秒数; 如果没有设置, 则退回到ReasonableReplicationLagSeconds .

Time to evaluate the LockedSemiSyncHypothesis before triggering the LockedSemiSync analysis; falls back to ReasonableReplicationLagSeconds if not set

    if this.ReasonableLockedSemiSyncMasterSeconds == 0 {
        this.ReasonableLockedSemiSyncMasterSeconds = uint(this.ReasonableReplicationLagSeconds)
    }

另见Semi-sync replicas (rpl_semi_sync_slave_enabled)Failure detection