Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: hot ranges api #74377

Merged
merged 4 commits into from
Mar 2, 2022
Merged

server: hot ranges api #74377

merged 4 commits into from
Mar 2, 2022

Conversation

koorosh
Copy link
Collaborator

@koorosh koorosh commented Jan 2, 2022

This change implements second version of hot ranges api that's required
for UI to represent enhanced Hot Ranges view.

It reuses former implementation but doesn't expose sensitive internal data.
Instead, it provides flat structure of fields required for ui.

New HotRangeV2 endpoint is established to:

  • maintain backward compatibility
  • do not expose internal data as it's done in HotRange endpoint

Release note: None

Release justification: bug fixes and low-risk updates to new functionality

@koorosh koorosh added the do-not-merge bors won't merge a PR with this label. label Jan 2, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@koorosh koorosh changed the title WIP. server: hot ranges api server: hot ranges api Jan 4, 2022
@koorosh koorosh marked this pull request as ready for review January 5, 2022 15:54
@koorosh koorosh requested a review from a team as a code owner January 5, 2022 15:54
@koorosh koorosh removed the do-not-merge bors won't merge a PR with this label. label Jan 5, 2022
@zachlite
Copy link
Contributor

zachlite commented Jan 5, 2022

cc @abarganier

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @koorosh)


pkg/server/serverpb/status.proto, line 1721 at r1 (raw file):

  }

  rpc HotRanges2(HotRangesRequest) returns (HotRangesResponseV2) {

Can we reconsider the placement of this API call? We already have an /api/v2/ server that new stuff should go under and I'd prefer to use that if we can.

See https://github.com/cockroachdb/cockroach/blob/master/pkg/server/api_v2.go

Copy link
Collaborator Author

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/server/serverpb/status.proto, line 1721 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Can we reconsider the placement of this API call? We already have an /api/v2/ server that new stuff should go under and I'd prefer to use that if we can.

See https://github.com/cockroachdb/cockroach/blob/master/pkg/server/api_v2.go

Done. I updated /api/v2/ranges/hot api to provide updated version of Hot ranges, but I would prefer to keep /_status/v2/hotranges as well until /api/v2/ranges/hot handles authorization validation (for secure and insecure modes).

@koorosh koorosh force-pushed the hot-ranges-api-v2 branch 2 times, most recently from 5cce471 to 76d43e6 Compare January 10, 2022 11:58
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! It's great to see this coming together.

One general question I wanted to pose here is have we considered how this might work in a serverless context? With the upcoming project to improve tenant-level observability, we'd ideally like to be able to generate a hot ranges report for a specific serverless tenant. We're hoping to build some level of debug.zip functionality at the tenant level, and one of the pieces that we think will be useful for such a bundle is a hot ranges report, since this is relevant within the SQL domain.

This is not necessarily in scope for this PR, but we should think about it now instead of later to avoid having to build a HotRangesV3 endpoint a few months from now 😅. What would we need to do to make this work at the tenant-level?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @koorosh)


pkg/server/status.go, line 2073 at r4 (raw file):

	}

	dbNames := make(map[uint32]string)

Can we possibly make a struct type to more clearly keep track of all this information?

It seems like the key for all these maps is the catalog.Descriptor ID, so instead of having multiple maps, can we have a single struct type that encompasses all these attributes (db names, table names, index names, and parent ID), and then just use a single map? Something like

type HotRangeReportMeta struct {
    dbName string
    tableName string
    indexNames map[uint32]string
    parentId int32
}
rangeReportMetas := make(map[uint32]HotRangeReportMeta)
...
for _, desc := range descrs {
    rangeReportMetas[uint32(desc.GetId())] = HotRangeReportMeta{
        ...
    }
} 

Then the maintainer can get all the info they need with a single map access. Just an idea, let me know what you think. We could also potentially define functions on this type to encapsulate some of the more complex access patterns of this data, although maybe that's overkill :D

Code quote:

	dbNames := make(map[uint32]string)
	tableNames := make(map[uint32]string)
	indexNames := make(map[uint32]map[uint32]string)
	parents := make(map[uint32]uint32)

pkg/server/status.go, line 2106 at r4 (raw file):

	var ranges []*serverpb.HotRangesResponseV2_HotRange
	// TODO (koorosh): how to flatten triple nested loop?

Nothing stands out to me that would allow us to avoid this, I don't see it as too big of an issue, personally :)

Code quote:

// TODO (koorosh): how to flatten triple nested loop?

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1, 8 of 8 files at r4, 5 of 5 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @koorosh)


-- commits, line 8 at r4:
nit: can you reword this a bit? It seems there are two changes being made with v2: not exposing sensitive data and creating a flat structure. Please make that a bit more clear.


-- commits, line 16 at r5:
nit: use backticks for fields


-- commits, line 21 at r5:
nit: can you add a note about why this would lead to additional store scans?


-- commits, line 28 at r6:
nit: can you clarify what the implementation was before and after the change. It's unclear from this sentence.


docs/generated/swagger/spec.json, line 1779 at r6 (raw file):

    },
    "hotRangeInfo": {
      "description": "Hot range details",

This should be more detailed.


pkg/server/status.go, line 2065 at r4 (raw file):

}

func (s *statusServer) HotRanges2(

nit: docstring


pkg/server/serverpb/status.proto, line 1721 at r1 (raw file):

Previously, koorosh (Andrii Vorobiov) wrote…

Done. I updated /api/v2/ranges/hot api to provide updated version of Hot ranges, but I would prefer to keep /_status/v2/hotranges as well until /api/v2/ranges/hot handles authorization validation (for secure and insecure modes).

SGTM


pkg/server/serverpb/status.proto, line 1126 at r5 (raw file):

message HotRangesResponseV2 {
  message HotRange {

nit: docstring

Copy link
Collaborator Author

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


-- commits, line 8 at r4:

Previously, dhartunian (David Hartunian) wrote…

nit: can you reword this a bit? It seems there are two changes being made with v2: not exposing sensitive data and creating a flat structure. Please make that a bit more clear.

Done.


-- commits, line 16 at r5:

Previously, dhartunian (David Hartunian) wrote…

nit: use backticks for fields

Done.


-- commits, line 21 at r5:

Previously, dhartunian (David Hartunian) wrote…

nit: can you add a note about why this would lead to additional store scans?

Done.


-- commits, line 28 at r6:

Previously, dhartunian (David Hartunian) wrote…

nit: can you clarify what the implementation was before and after the change. It's unclear from this sentence.

Done.


docs/generated/swagger/spec.json, line 1779 at r6 (raw file):

Previously, dhartunian (David Hartunian) wrote…

This should be more detailed.

Done.


pkg/server/status.go, line 2065 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: docstring

Done.


pkg/server/status.go, line 2073 at r4 (raw file):

Previously, abarganier (Alex Barganier) wrote…

Can we possibly make a struct type to more clearly keep track of all this information?

It seems like the key for all these maps is the catalog.Descriptor ID, so instead of having multiple maps, can we have a single struct type that encompasses all these attributes (db names, table names, index names, and parent ID), and then just use a single map? Something like

type HotRangeReportMeta struct {
    dbName string
    tableName string
    indexNames map[uint32]string
    parentId int32
}
rangeReportMetas := make(map[uint32]HotRangeReportMeta)
...
for _, desc := range descrs {
    rangeReportMetas[uint32(desc.GetId())] = HotRangeReportMeta{
        ...
    }
} 

Then the maintainer can get all the info they need with a single map access. Just an idea, let me know what you think. We could also potentially define functions on this type to encapsulate some of the more complex access patterns of this data, although maybe that's overkill :D

Done.


pkg/server/status.go, line 2106 at r4 (raw file):

Previously, abarganier (Alex Barganier) wrote…

Nothing stands out to me that would allow us to avoid this, I don't see it as too big of an issue, personally :)

Ok.

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general question I wanted to pose here is have we considered how this might work in a serverless context?

Just noting - we (myself, @koorosh, @rimadeodhar) had a chat about this today. The effort involved to make this work in a serverless context will begin with our work to get a minimal version of debug.zip working for serverless tenants. Since debug.zip uses the original HotRanges endpoint, which is also used by this new version, the work we do on the original endpoint should carry over capabilities to the new version as well.

The effort involved should be somewhat minimal - we just need to filter down the ranges/replicas reported on to only those with the appropriate tenant key prefix.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)

@koorosh koorosh force-pushed the hot-ranges-api-v2 branch 4 times, most recently from 91e7800 to 2149fcf Compare January 18, 2022 09:50
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I think this is almost ready. I just have a few docstring nits and an error condition that I think we should probably log.

There are also some test failures in the CI build - have you tried running make testrace or make stressrace locally? I've re-triggered them in case it was a spurious failure, but if not, let me know if I can help debug 👍

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @koorosh)


pkg/server/status.go, line 2101 at r26 (raw file):

				_, tableID, err := s.sqlServer.execCfg.Codec.DecodeTablePrefix(r.Desc.StartKey.AsRawKey())
				if err != nil {
					continue

nit: should we consider at least logging the error here?

Code quote:

					continue

pkg/server/serverpb/status.proto, line 1071 at r26 (raw file):

    // API: PUBLIC ALPHA
    double queries_per_second = 2;
    // LeaseholderNodeID indicates on Node ID that contains replica that is leaseholder

nit: // LeaseholderNodeID indicates the Node ID that is the current leaseholder for the given range.

Code quote:

// LeaseholderNodeID indicates on Node ID that contains replica that is leaseholder

pkg/server/serverpb/status.proto, line 1129 at r26 (raw file):

  // HotRange message describes a single hot range, ie its QPS, node ID it belongs to, etc.
  message HotRange {
    // range_id indicates Range ID that's identified as hot range

nit: for this docstring and those below, let's make sure the sentences end with . periods to stay consistent with the rest of the file


pkg/server/serverpb/status.proto, line 1134 at r26 (raw file):

      (gogoproto.customname) = "RangeID"
    ];
    // node_id indicates on node that contains current hot range

nit: // node_id indicates the node that contains the current hot range.

Code quote:

// node_id indicates on node that contains current hot range

pkg/server/serverpb/status.proto, line 1144 at r26 (raw file):

      (gogoproto.customname) = "QPS"
    ];
    // table_name indicates table which data is stored in this hot range

nit: // table_name indicates the SQL table that the range belongs to.

Code quote:

// table_name indicates table which data is stored in this hot range

pkg/server/serverpb/status.proto, line 1155 at r26 (raw file):

        "github.com/cockroachdb/cockroach/pkg/roachpb.NodeID"
    ];
    // leaseholder_node_id indicates on Node ID that contains replica that is a leaseholder

nit: // leaseholder_node_id indicates the Node ID that is the current leaseholder for the given range.

Code quote:

/ leaseholder_node_id indicates on Node ID that contains replica that is a leaseholder

Copy link
Collaborator Author

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out on this, I didn't run them. Will run locally and to check what's wrong

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/server/status.go, line 2101 at r26 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: should we consider at least logging the error here?

Done.


pkg/server/serverpb/status.proto, line 1071 at r26 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: // LeaseholderNodeID indicates the Node ID that is the current leaseholder for the given range.

Done.


pkg/server/serverpb/status.proto, line 1129 at r26 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: for this docstring and those below, let's make sure the sentences end with . periods to stay consistent with the rest of the file

Done.


pkg/server/serverpb/status.proto, line 1134 at r26 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: // node_id indicates the node that contains the current hot range.

Done.


pkg/server/serverpb/status.proto, line 1144 at r26 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: // table_name indicates the SQL table that the range belongs to.

Done.


pkg/server/serverpb/status.proto, line 1155 at r26 (raw file):

// leaseholder_node_id indicates the Node ID that is the current leaseholder for the given range.
Done.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we squash some of the commits in this PR? We shouldn't have any "nit" commits in history. I'd prefer if we just made this PR a single commit but I can understand if you want to break it up into 2 or 3. Right now it's too many.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)

@koorosh
Copy link
Collaborator Author

koorosh commented Feb 28, 2022

bors r+

@knz
Copy link
Contributor

knz commented Feb 28, 2022

LGTM

@craig
Copy link
Contributor

craig bot commented Feb 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 28, 2022

Build failed:

@koorosh
Copy link
Collaborator Author

koorosh commented Feb 28, 2022

bors retry

@craig
Copy link
Contributor

craig bot commented Feb 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 1, 2022

Build failed (retrying...):

@ajwerner
Copy link
Contributor

ajwerner commented Mar 1, 2022

bors r-

This checks in a .pb.gw.go file which we usually don't check in and it leads to skew with #76592

@craig
Copy link
Contributor

craig bot commented Mar 1, 2022

Canceled.

@koorosh koorosh force-pushed the hot-ranges-api-v2 branch 3 times, most recently from 05d7a1a to 82808af Compare March 1, 2022 14:16
This change implements second version of hot ranges api that's required
for UI to represent enhanced Hot Ranges page.

Before, Hot ranges (under Advanced debugging page) used former version of
HotRanges api that provided information about hot ranges with internal/
sensitive data (see: cockroachdb#53212)
that should not be exposed to users.

Now, in addition to existing endpoint, additional one is implemented that is
based on a former version and provides hot ranges information that is only
needed for Hot Ranges page (range id, qps, table, db, and index names for
particular range).

The list of hot ranges and their QPS is provided by `HotRanges` service and
then information like DB, table and index names are retrieved from range's
`StartKey` that might include this info or not (in case if it's meta range,
or range that stores index itself for instance).

`HotRange` and `HotRangeV2` services expect the same request type as an
argument but return different responses.
`HotRangeV2` service returns a flat list of hot ranges instead of grouped
ranges per node/store.

Release note: None

server: add leaseholder node id to hot ranges api

Current change extends `statuspb.HotRangesResponse` to
include `LeaseholderNodeID` field to indicate the
node id that contains leaseholder replica for
current hot range.

This change was made in `localHotRanges` function (that is
used by `HotRanges` that in turn used by `HotRangeV2` service)
to reuse existing logic of iteration over the stores and querying
hot ranges. It extends its response by `LeaseholderNodeID` field.
Otherwise, the same logic should be implemented in `HotRangeV2`
service by calling `VisitStores` iterator.

Release note: None

Release justification: bug fixes and low-risk updates
to new functionality
Before, `listHotRanges` request handler of `apiV2Server` relied
on `HotRanges` service that is now should be replaced by new
`HotRangesV2` implementation.

Current change reuses HotRangeV2 service in `api/v2/ranges/hot`
api. It allows to share the same logic between REST and gRPC
endpoints and gradually migrate to new version of API.

Release note: None

Release justification: bug fixes and low-risk updates
to new functionality
Initially, `HotRangesV2` service in status server was defined
to use GET method to handle HTTP request. It was convenient way
to display response data in Advanced debugging page since it allowed
to render response body right on to page.

But now, hot ranges will be used on user facing page and request
dispatching for hot ranges api should follow generic workflow:
initialize `HotRangesRequest` protobuf message and dispatch request
with `src/util/api` service. This restriction forces to use POST
method (since GET method doesn't allow to provide request body).

In addition, Hot Ranges debugging page is refactored to use `api`
service.

Release note: None

Release justification: bug fixes and low-risk updates
to new functionality
This change extends hot ranges API to support pagination
of responses. It is also possible to avoid pagination when
page size is set to 0.

Release note: None

Release justification: bug fixes and low-risk updates
to new functionality
@koorosh
Copy link
Collaborator Author

koorosh commented Mar 2, 2022

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 2, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants