Skip to content

Commit

Permalink
Merge branch 'release-2.1' of https://github.com/pingcap/pd into rele…
Browse files Browse the repository at this point in the history
…ase-2.1
  • Loading branch information
Connor1996 committed Apr 10, 2019
2 parents 0a1937e + 66aeaa7 commit f3ba448
Show file tree
Hide file tree
Showing 27 changed files with 683 additions and 682 deletions.
18 changes: 12 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,26 @@
# PD Change Log

## v2.1.7
- Fix the issue that the transferring leader step cannot be created in the balance-region when the number of replicas is one [#1462](https://github.com/pingcap/pd/pull/1462)

## v2.1.5
- Provide the `ExcludeTombstoneStores` option in the `GetAllStores` interface to remove the Tombstone store from the returned result [#1444](https://github.com/pingcap/pd/pull/1444)

## v2.1.3
- Fix the `Watch` issue about leader election [#1404](https://github.com/pingcap/pd/pull/1404)

## v2.1.2
- Fix the Region information update issue about Region merge [#1377](https://github.com/pingcap/pd/pull/1377)

## v2.1.1
- Fix the issue that some configuration items cannot be set to `0` in the configuration file [#1334](https://github.com/pingcap/pd/pull/1334)
- Fix the issue that some configuration items cannot be set to `0` in the configuration file [#1334](https://github.com/pingcap/pd/pull/1334)
- Check the undefined configuration when starting PD [#1362](https://github.com/pingcap/pd/pull/1362)
- Avoid transferring the leader to a newly created peer, to optimize the possible delay [#1339](https://github.com/pingcap/pd/pull/1339)
- Fix the issue that `RaftCluster` cannot stop caused by deadlock [#1370](https://github.com/pingcap/pd/pull/1370)
- Fix the issue that `RaftCluster` cannot stop caused by deadlock [#1370](https://github.com/pingcap/pd/pull/1370)

## v2.1.0
+ Optimize availability
- Introduce the version control mechanism and support rolling update of the cluster compatibly
- Introduce the version control mechanism and support rolling update of the cluster compatibly
- [Enable `Raft PreVote`](https://github.com/pingcap/pd/blob/5c7b18cf3af91098f07cf46df0b59fbf8c7c5462/conf/config.toml#L22) among PD nodes to avoid leader reelection when network recovers after network isolation
- Enable `raft learner` by default to lower the risk of unavailable data caused by machine failure during scheduling
- TSO allocation is no longer affected by the system clock going backwards
Expand All @@ -29,7 +35,7 @@
- [Add more commands to control the scheduling policy](https://pingcap.com/docs/tools/pd-control/#config-show--set-option-value)
- Improve [PD simulator](https://github.com/pingcap/pd/tree/release-2.1/tools/pd-simulator) to simulate the scheduling scenarios

+ API and operation tools
+ API and operation tools
- Add the [`GetPrevRegion` interface](https://github.com/pingcap/kvproto/blob/8e3f33ac49297d7c93b61a955531191084a2f685/proto/pdpb.proto#L40) to support the `TiDB reverse scan` feature
- Add the [`BatchSplitRegion` interface](https://github.com/pingcap/kvproto/blob/8e3f33ac49297d7c93b61a955531191084a2f685/proto/pdpb.proto#L54) to speed up TiKV Region splitting
- Add the [`GCSafePoint` interface](https://github.com/pingcap/kvproto/blob/8e3f33ac49297d7c93b61a955531191084a2f685/proto/pdpb.proto#L64-L66) to support distributed GC in TiDB
Expand Down Expand Up @@ -119,7 +125,7 @@
* Enable Raft PreVote between PD nodes to avoid leader reelection when network recovers after network isolation
* Optimize the issue that Balance Scheduler schedules small Regions frequently
* Optimize the hotspot scheduler to improve its adaptability in traffic statistics information jitters
* Skip the Regions with a large number of rows when scheduling `region merge`
* Skip the Regions with a large number of rows when scheduling `region merge`
* Enable `raft learner` by default to lower the risk of unavailable data caused by machine failure during scheduling
* Remove `max-replica` from `pd-recover`
* Add `Filter` metrics
Expand All @@ -128,7 +134,7 @@
* Fix the issue that TiKV disk space is used up caused by replica migration in some scenarios
### Compatibility notes
* Do not support rolling back to v2.0.x or earlier due to update of the new version storage engine
* Enable `raft learner` by default in the new version of PD. If the cluster is upgraded from 1.x to 2.1, the machine should be stopped before upgrade or a rolling update should be first applied to TiKV and then PD
* Enable `raft learner` by default in the new version of PD. If the cluster is upgraded from 1.x to 2.1, the machine should be stopped before upgrade or a rolling update should be first applied to TiKV and then PD

## v2.0.4
### Improvement
Expand Down
6 changes: 3 additions & 3 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Gopkg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

[[constraint]]
name = "github.com/pingcap/kvproto"
branch = "master"
branch = "release-2.1"

[[constraint]]
name = "github.com/google/btree"
Expand Down
26 changes: 23 additions & 3 deletions client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ type Client interface {
// GetAllStores gets all stores from pd.
// The store may expire later. Caller is responsible for caching and taking care
// of store change.
GetAllStores(ctx context.Context) ([]*metapb.Store, error)
GetAllStores(ctx context.Context, opts ...GetStoreOption) ([]*metapb.Store, error)
// Update GC safe point. TiKV will check it and do GC themselves if necessary.
// If the given safePoint is less than the current one, it will not be updated.
// Returns the new safePoint after updating.
Expand All @@ -67,6 +67,19 @@ type Client interface {
Close()
}

// GetStoreOp represents available options when getting stores.
type GetStoreOp struct {
excludeTombstone bool
}

// GetStoreOption configures GetStoreOp.
type GetStoreOption func(*GetStoreOp)

// WithExcludeTombstone excludes tombstone stores from the result.
func WithExcludeTombstone() GetStoreOption {
return func(op *GetStoreOp) { op.excludeTombstone = true }
}

type tsoRequest struct {
start time.Time
ctx context.Context
Expand Down Expand Up @@ -682,7 +695,13 @@ func (c *client) GetStore(ctx context.Context, storeID uint64) (*metapb.Store, e
return store, nil
}

func (c *client) GetAllStores(ctx context.Context) ([]*metapb.Store, error) {
func (c *client) GetAllStores(ctx context.Context, opts ...GetStoreOption) ([]*metapb.Store, error) {
// Applies options
options := &GetStoreOp{}
for _, opt := range opts {
opt(options)
}

if span := opentracing.SpanFromContext(ctx); span != nil {
span = opentracing.StartSpan("pdclient.GetAllStores", opentracing.ChildOf(span.Context()))
defer span.Finish()
Expand All @@ -692,7 +711,8 @@ func (c *client) GetAllStores(ctx context.Context) ([]*metapb.Store, error) {

ctx, cancel := context.WithTimeout(ctx, pdTimeout)
resp, err := c.leaderClient().GetAllStores(ctx, &pdpb.GetAllStoresRequest{
Header: c.requestHeader(),
Header: c.requestHeader(),
ExcludeTombstoneStores: options.excludeTombstone,
})
cancel()

Expand Down
32 changes: 24 additions & 8 deletions client/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"testing"
"time"

"github.com/gogo/protobuf/proto"
. "github.com/pingcap/check"
"github.com/pingcap/kvproto/pkg/metapb"
"github.com/pingcap/kvproto/pkg/pdpb"
Expand Down Expand Up @@ -263,31 +264,46 @@ func (s *testClientSuite) TestGetStore(c *C) {
c.Assert(err, IsNil)
c.Assert(n, DeepEquals, store)

// Get a removed store should return error.
stores, err := s.client.GetAllStores(context.Background())
c.Assert(err, IsNil)
c.Assert(stores, DeepEquals, []*metapb.Store{store})

// Mark the store as offline.
err = cluster.RemoveStore(store.GetId())
c.Assert(err, IsNil)
offlineStore := proto.Clone(store).(*metapb.Store)
offlineStore.State = metapb.StoreState_Offline

// Get an offline store should be OK.
n, err = s.client.GetStore(context.Background(), store.GetId())
c.Assert(err, IsNil)
c.Assert(n.GetState(), Equals, metapb.StoreState_Offline)
c.Assert(n, DeepEquals, offlineStore)

// Should return offline stores.
stores, err = s.client.GetAllStores(context.Background())
c.Assert(err, IsNil)
c.Assert(stores, DeepEquals, []*metapb.Store{offlineStore})

// Mark the store as tombstone.
err = cluster.BuryStore(store.GetId(), true)
c.Assert(err, IsNil)
tombstoneStore := proto.Clone(store).(*metapb.Store)
tombstoneStore.State = metapb.StoreState_Tombstone

// Get a tombstone store should fail.
n, err = s.client.GetStore(context.Background(), store.GetId())
c.Assert(err, IsNil)
c.Assert(n, IsNil)
}

func (s *testClientSuite) TestGetAllStores(c *C) {
cluster := s.srv.GetRaftCluster()
c.Assert(cluster, NotNil)
// Should return tombstone stores.
stores, err = s.client.GetAllStores(context.Background())
c.Assert(err, IsNil)
c.Assert(stores, DeepEquals, []*metapb.Store{tombstoneStore})

stores, err := s.client.GetAllStores(context.Background())
// Should not return tombstone stores.
stores, err = s.client.GetAllStores(context.Background(), WithExcludeTombstone())
c.Assert(err, IsNil)
c.Assert(stores, DeepEquals, []*metapb.Store{store})
c.Assert(stores, IsNil)
}

func (s *testClientSuite) checkGCSafePoint(c *C, expectedSafePoint uint64) {
Expand Down
2 changes: 1 addition & 1 deletion pkg/integration_test/version_upgrade_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ func (s *integrationTestSuite) bootstrapCluster(server *testServer, c *C) {
bootstrapReq := &pdpb.BootstrapRequest{
Header: &pdpb.RequestHeader{ClusterId: server.GetClusterID()},
Store: &metapb.Store{Id: 1, Address: "mock://1"},
Region: &metapb.Region{Id: 2, Peers: []*metapb.Peer{{3, 1, false}}},
Region: &metapb.Region{Id: 2, Peers: []*metapb.Peer{{Id: 3, StoreId: 1, IsLearner: false}}},
}
_, err := server.server.Bootstrap(context.Background(), bootstrapReq)
c.Assert(err, IsNil)
Expand Down
2 changes: 1 addition & 1 deletion server/api/hot_status.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ func (h *hotStatusHandler) GetHotStores(w http.ResponseWriter, r *http.Request)
bytesWriteStats := h.GetHotBytesWriteStores()
bytesReadStats := h.GetHotBytesReadStores()
keysWriteStats := h.GetHotKeysWriteStores()
keysReadStats := h.GetHotKeysWriteStores()
keysReadStats := h.GetHotKeysReadStores()

stats := hotStoreStats{
BytesWriteStats: bytesWriteStats,
Expand Down
22 changes: 20 additions & 2 deletions server/grpc_service.go
Original file line number Diff line number Diff line change
Expand Up @@ -226,9 +226,21 @@ func (s *Server) GetAllStores(ctx context.Context, request *pdpb.GetAllStoresReq
return &pdpb.GetAllStoresResponse{Header: s.notBootstrappedHeader()}, nil
}

// Don't return tombstone stores.
var stores []*metapb.Store
if request.GetExcludeTombstoneStores() {
for _, store := range cluster.GetStores() {
if store.GetState() != metapb.StoreState_Tombstone {
stores = append(stores, store)
}
}
} else {
stores = cluster.GetStores()
}

return &pdpb.GetAllStoresResponse{
Header: s.header(),
Stores: cluster.GetStores(),
Stores: stores,
}, nil
}

Expand Down Expand Up @@ -578,8 +590,14 @@ func (s *Server) ScatterRegion(ctx context.Context, request *pdpb.ScatterRegionR
region = core.NewRegionInfo(request.GetRegion(), request.GetLeader())
}

cluster.RLock()
defer cluster.RUnlock()
co := cluster.coordinator
if op := co.regionScatterer.Scatter(region); op != nil {
op, err := co.regionScatterer.Scatter(region)
if err != nil {
return nil, err
}
if op != nil {
co.addOperator(op)
}

Expand Down
16 changes: 13 additions & 3 deletions server/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,10 @@ func (h *Handler) AddTransferPeerOperator(regionID uint64, fromStoreID, toStoreI
return err
}

op := schedule.CreateMovePeerOperator("adminMovePeer", c.cluster, region, schedule.OpAdmin, fromStoreID, toStoreID, newPeer.GetId())
op, err := schedule.CreateMovePeerOperator("adminMovePeer", c.cluster, region, schedule.OpAdmin, fromStoreID, toStoreID, newPeer.GetId())
if err != nil {
return err
}
if ok := c.addOperator(op); !ok {
return errors.WithStack(errAddOperator)
}
Expand Down Expand Up @@ -483,7 +486,10 @@ func (h *Handler) AddRemovePeerOperator(regionID uint64, fromStoreID uint64) err
return errors.Errorf("region has no peer in store %v", fromStoreID)
}

op := schedule.CreateRemovePeerOperator("adminRemovePeer", c.cluster, schedule.OpAdmin, region, fromStoreID)
op, err := schedule.CreateRemovePeerOperator("adminRemovePeer", c.cluster, schedule.OpAdmin, region, fromStoreID)
if err != nil {
return err
}
if ok := c.addOperator(op); !ok {
return errors.WithStack(errAddOperator)
}
Expand Down Expand Up @@ -569,7 +575,11 @@ func (h *Handler) AddScatterRegionOperator(regionID uint64) error {
return ErrRegionNotFound(regionID)
}

op := c.regionScatterer.Scatter(region)
op, err := c.regionScatterer.Scatter(region)
if err != nil {
return err
}

if op == nil {
return nil
}
Expand Down
3 changes: 3 additions & 0 deletions server/schedule/mockcluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,9 @@ func (mc *MockCluster) ApplyOperator(op *Operator) {
if region.GetStorePeer(s.FromStore) == nil {
panic("Remove peer that doesn't exist")
}
if region.GetLeader().GetStoreId() == s.FromStore {
panic("Cannot remove the leader peer")
}
region = region.Clone(core.WithRemoveStorePeer(s.FromStore))
case AddLearner:
if region.GetStorePeer(s.ToStore) != nil {
Expand Down
7 changes: 6 additions & 1 deletion server/schedule/namespace_checker.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,13 @@ func (n *NamespaceChecker) Check(region *core.RegionInfo) *Operator {
checkerCounter.WithLabelValues("namespace_checker", "no_target_peer").Inc()
return nil
}
op, err := CreateMovePeerOperator("makeNamespaceRelocation", n.cluster, region, OpReplica, peer.GetStoreId(), newPeer.GetStoreId(), newPeer.GetId())
if err != nil {
checkerCounter.WithLabelValues("namespace_checker", "create_operator_fail").Inc()
return nil
}
checkerCounter.WithLabelValues("namespace_checker", "new_operator").Inc()
return CreateMovePeerOperator("makeNamespaceRelocation", n.cluster, region, OpReplica, peer.GetStoreId(), newPeer.GetStoreId(), newPeer.GetId())
return op
}

checkerCounter.WithLabelValues("namespace_checker", "all_right").Inc()
Expand Down
Loading

0 comments on commit f3ba448

Please sign in to comment.