Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose vtbackup stats at --port /metrics #11388

Merged
merged 3 commits into from
Oct 3, 2022

Conversation

maxenglander
Copy link
Collaborator

@maxenglander maxenglander commented Sep 28, 2022

Description

As far as I can tell vtbackup does not currently expose metrics. It would be awesome if it did. In particular it would be great to have the following timings:

  • How long it takes to download the last backup.
  • How long it takes to start and stop MySQL (MySQL startup can be slow sometimes, e.g. during InnoDB initialization).
  • How long it takes to connect to the primary.
  • How long it takes to download the binary log.
  • How long it takes to apply the binlog.
  • How long it takes to perform and upload the new backup.

This PR modifies vtbackup command so that a server is started on --port. The server is similar to that launched by other VT components, and includes a /metrics route. With this PR this route include two useful metrics which are managed by the mysqlctl package:

  • vtbackup_restore_duration_seconds
  • vtbackup_backup_duration_seconds

Keeping this PR small to lay the groundwork. The other metrics can come in a later PR if we're OK with this overall approach.

Use cases

PlanetScale makes ongoing internal and public efforts to improve backup and restore performance. It would be great to have detailed metrics on current performance so that we can make informed decisions on where to put our energy.

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Sep 28, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@maxenglander maxenglander changed the title Maxeng vtbackup prom stats expose vtbackup stats at --port /metrics Sep 28, 2022
mysqlTimeout = 5 * time.Minute
initDBSQLFile string
detachedMode bool
keepAliveTimeout = 0 * time.Second
Copy link
Collaborator Author

@maxenglander maxenglander Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New, the rest is just formatting changes

}

func init() {
mathrand.Seed(time.Now().UnixNano())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is needed. Copy-pasted from another cmd package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's needed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -191,6 +208,15 @@ func main() {
log.Errorf("Couldn't prune old backups: %v", err)
exit.Return(1)
}

if keepAliveTimeout > 0 {
Copy link
Collaborator Author

@maxenglander maxenglander Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for local testing, but think it could be useful in K8s context to keep process alive long enough for a Prometheus scrape interval.

// Catch SIGTERM and SIGINT so we get a chance to clean up.
ctx, cancel := context.WithCancel(context.Background())
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signal handling now the responsibility of servenv

@maxenglander maxenglander force-pushed the maxeng-vtbackup-prom-stats branch from f7bf347 to 5f17ded Compare September 29, 2022 14:11
@maxenglander maxenglander marked this pull request as ready for review September 29, 2022 14:16
Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander force-pushed the maxeng-vtbackup-prom-stats branch from 5f17ded to bb164b4 Compare September 29, 2022 16:28
}

func init() {
mathrand.Seed(time.Now().UnixNano())
servenv.RegisterDefaultFlags()
dbconfigs.RegisterFlags(dbconfigs.All...)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from main

servenv.OnParse(registerFlags)
}

func main() {
defer exit.Recover()
dbconfigs.RegisterFlags(dbconfigs.All...)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to init

Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander added Component: Backup and Restore Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Sep 30, 2022
@deepthi deepthi added this to the v15.0 milestone Sep 30, 2022
Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other changes LGTM!

go/cmd/vtbackup/vtbackup.go Show resolved Hide resolved
Signed-off-by: Max Englander <max@planetscale.com>
Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@GuptaManan100 GuptaManan100 merged commit 55695d1 into vitessio:main Oct 3, 2022
@GuptaManan100 GuptaManan100 deleted the maxeng-vtbackup-prom-stats branch October 3, 2022 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Backup and Restore Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants