Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: refactor fault trigger #896

Merged
merged 32 commits into from
Oct 12, 2019
Merged

Conversation

xiaojingchen
Copy link
Contributor

@xiaojingchen xiaojingchen commented Sep 10, 2019

What problem does this PR solve?

  • support qm as vm manager
  • simplified the fault trigger

What is changed and how does it work?

Add a new VM manager to support qm.
Change the stability config, add the map of VM name and VM IP, avoid fault-trigger agent dynamic acquisition of IP

new stability config file:

   block_writer:
      concurrency: 12
    nodes:
      - physical_node: 172.16.5.11
        nodes:
        - ip: 172.16.4.247
          name: 105
      - physical_node: 172.16.5.26
        nodes:
        - ip: 172.16.4.133
          name: 200
      - physical_node: 172.16.5.27
        nodes:
        - ip: 172.16.4.121
          name: 203
      - physical_node: 172.16.5.28
        nodes:
        - ip: 172.16.4.139
          name: 204
      - physical_node: 172.16.5.29
        nodes:
        - ip: 172.16.5.147
          name: 137
        - ip: 172.16.5.148
          name: 138
    etcds:
      - physical_node: 172.16.5.11
        nodes:
        - ip: 172.16.4.247
          name: 105
      - physical_node: 172.16.5.26
        nodes:
        - ip: 172.16.4.133
          name: 200
      - physical_node: 172.16.5.27
        nodes:
        - ip: 172.16.4.121
          name: 203
    apiservers:
      - physical_node: 172.16.5.11
        nodes:
        - ip: 172.16.4.247
          name: 105
      - physical_node: 172.16.5.26
        nodes:
        - ip: 172.16.4.133
          name: 200
      - physical_node: 172.16.5.27
        nodes:
        - ip: 172.16.4.121
          name: 203

Check List

Tests

  • Unit test
  • E2E test
  • Stability test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

  • Has Helm charts change
  • Has Go code change

Side effects

  • Breaking backward compatibility

Related changes

NONE

Does this PR introduce a user-facing change?:

NONE

StartVM(*VM) error
}

type VirshVMManager struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a new file for VirshVMManager, e.g. vm_virsh.go and a new file for VirshVMManager, e.g. vm_qm.go what do you think?

@@ -225,3 +129,81 @@ func stripEmpty(data string) string {
}
return strings.Join(stripLines, "\n")
}

type QMVMManager struct {
*sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this RWMutex is not used.

@xiaojingchen xiaojingchen changed the title [WIP]tests: refactor fault trigger tests: refactor fault trigger Sep 11, 2019
@xiaojingchen
Copy link
Contributor Author

@@ -43,7 +45,7 @@ func main() {
logs.InitLogs()
defer logs.FlushLogs()

mgr := manager.NewManager()
mgr := manager.NewManager(vmManager)
Copy link
Contributor

@cofyc cofyc Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better not to allow invalid --vm-manager value, we can initialize vm manager and handle errors here, e.g.

var vmMgr VMManager
if vmManager == "qm" {
   vmMgr = NewQMManager()
} else if vmManager == "virsh" {
   vmMgr = NewVirshManager()
} else {
   // fatal error
}

mgr := manager.NewManager(vmMgr)

if some users configured an invalid value, but our program still works, this will confuse people because they don't know what virtual manager we use from the command-line flags unless they know implementation details

@xiaojingchen
Copy link
Contributor Author

/run-e2e-in-kind

} else if vmManagerName == "virsh" {
vmManager = &VirshVMManager{}
} else {
slack.NotifyAndPanic(fmt.Errorf("stability test have not supported the vm manager:[%s],please choose [qm] or [virsh].", vmManagerName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fault-trigger, slack.Webhook endpoint is not provided. IMO a simple fatal error is enough.

@xiaojingchen
Copy link
Contributor Author

/run-e2e-in-kind

@cofyc
Copy link
Contributor

cofyc commented Sep 18, 2019

this lgtm but I cannot test it locally
what's the progress, have you tested it successfully in the stability environment?

@xiaojingchen
Copy link
Contributor Author

xiaojingchen commented Sep 24, 2019

@cofyc this pr has been tested pass in stability env. now it is still running in stability env.

cofyc
cofyc previously approved these changes Sep 24, 2019
Copy link
Contributor

@cofyc cofyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaojingchen
Copy link
Contributor Author

@cofyc @weekface @aylei PTAL again

@weekface
Copy link
Contributor

Have you run this PR on the Virsh manager cluster?

@xiaojingchen
Copy link
Contributor Author

xiaojingchen commented Sep 27, 2019

it has run in stability env for serval days,except the apiserver fault trigger case.
but the apiserver fault trigger case has passed separately.

@xiaojingchen
Copy link
Contributor Author

xiaojingchen commented Sep 27, 2019

@weekface it has not run in stability env which usevirsh to manage vm. and now I lack the stability env base virsh

@weekface
Copy link
Contributor

You can run it at 149 env.

@tennix tennix added the test/stability stability tests label Oct 11, 2019
weekface
weekface previously approved these changes Oct 11, 2019
Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@cofyc cofyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cofyc
Copy link
Contributor

cofyc commented Oct 12, 2019

/run-e2e-in-kind

@cofyc
Copy link
Contributor

cofyc commented Oct 12, 2019

@xiaojingchen CI failed

@xiaojingchen
Copy link
Contributor Author

/run-e2e-in-kind

cofyc
cofyc previously approved these changes Oct 12, 2019
@xiaojingchen xiaojingchen dismissed stale reviews from cofyc and weekface via 1f4bd01 October 12, 2019 03:02
@xiaojingchen
Copy link
Contributor Author

/run-e2e-in-kind

Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaojingchen
Copy link
Contributor Author

@cofyc PTAL again

@cofyc cofyc merged commit 4d2dd0c into pingcap:master Oct 12, 2019
weekface pushed a commit to weekface/tidb-operator that referenced this pull request Oct 14, 2019
weekface added a commit that referenced this pull request Oct 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test/stability stability tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants