-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce current state in BlobStore to help validate requests #1315
Conversation
This PR introduces current state in BlobStore that is updated by state transition (Helix controller initiates the transition). Store's current state is used to validate requests and reject some of them if frontends have old view of the store. For example, store has transitted to INACTIVE state but due to Helix notification delay, some frontends may still believe store is in STANDBY. To avoid such inconsistent view, we use current state (which is always up-to-date) to validate coming requests.
Codecov Report
@@ Coverage Diff @@
## master #1315 +/- ##
=========================================
Coverage ? 72.06%
Complexity ? 6439
=========================================
Files ? 468
Lines ? 36961
Branches ? 4650
=========================================
Hits ? 26635
Misses ? 9088
Partials ? 1238
Continue to review full report at Codecov.
|
@@ -76,6 +77,7 @@ | |||
private BlobStoreStats blobStoreStats; | |||
private boolean started; | |||
private FileLock fileLock; | |||
private ReplicaState currentState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be volatile since it will be set from a different thread than where it is read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair point, will fix
Store store = storeManager.getStore(id); | ||
if (requestType == RequestOrResponseType.PutRequest && !POST_ALLOWED_STORE_STATES.contains( | ||
store.getCurrentState())) { | ||
logger.error("{} is not allowed because current state of store {} is {}", requestType, id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is error the correct log level for this? How often will this occur during normal operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be infrequent. It happens in two cases:
- remove replica: when local replica's state transition has completed but frontends still have old view and still issue PUT requests to the store. Once Helix state change notification arrives at frontend, this should die out quickly.
- shutdown server: when server is shut down, local replica is directly set to OFFLINE and some frontends may still attempt to write to this store due to Helix notification delay.
@@ -171,6 +173,7 @@ private BlobStore(ReplicaId replicaId, String storeId, StoreConfig config, Sched | |||
this.thresholdBytesLow = (long) (capacityInBytes * ((threshold - delta) / 100.0)); | |||
ttlUpdateBufferTimeMs = TimeUnit.SECONDS.toMillis(config.storeTtlUpdateBufferTimeSeconds); | |||
errorCount = new AtomicInteger(0); | |||
currentState = ReplicaState.OFFLINE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any callers to setCurrentState()
in the production code. Will this be added in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this will be called in storage manager when state transition occurs. I was planning to make the changes in this PR but then decided to split it into a separate PR.
private final StatsManager statsManager; | ||
private final ConcurrentHashMap<RequestOrResponseType, Set<PartitionId>> requestsDisableInfo = | ||
new ConcurrentHashMap<>(); | ||
private final ConcurrentHashMap<PartitionId, ReplicaId> localPartitionToReplicaMap; | ||
// POST requests are allowed on stores states: { LEADER, STANDBY } | ||
static final Set<ReplicaState> POST_ALLOWED_STORE_STATES = EnumSet.of(ReplicaState.LEADER, ReplicaState.STANDBY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be PUT instead of Post?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ambry-server/src/main/java/com.github.ambry.server/AmbryServerRequests.java
Show resolved
Hide resolved
private final ConcurrentHashMap<PartitionId, ReplicaId> localPartitionToReplicaMap; | ||
// POST requests are allowed on stores states: { LEADER, STANDBY } | ||
static final Set<ReplicaState> POST_ALLOWED_STORE_STATES = EnumSet.of(ReplicaState.LEADER, ReplicaState.STANDBY); | ||
// UPDATE requests (including DELETE, TTLUpdate) are allowed on stores states: { LEADER, STANDBY, INACTIVE, BOOTSTRAP } | ||
static final Set<ReplicaState> UPDATE_ALLOWED_STORE_STATES = | ||
EnumSet.of(ReplicaState.LEADER, ReplicaState.STANDBY, ReplicaState.INACTIVE, ReplicaState.BOOTSTRAP); | ||
static final Set<RequestOrResponseType> UPDATE_REQUEST_TYPES = | ||
EnumSet.of(RequestOrResponseType.DeleteRequest, RequestOrResponseType.TtlUpdateRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can just be a Map from RequestOrResponseType to EnumSet of ReplicaState. So in isRequestEnabled method, you don't have to do if-else statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a good idea but we only have two categories of request type to account for, I feel like we don't have to initialize a map and query it. The logic is already simple in isRequestEnabled
method.
Map<ReplicaState, ReplicaId> stateToReplica = new HashMap<>(); | ||
int cnt = 0; | ||
for (ReplicaState state : EnumSet.complementOf(EnumSet.of(ReplicaState.ERROR))) { | ||
stateToReplica.put(state, localReplicas.get(cnt++)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would discourage using ++ in the same place where you use cnt. This would cause confusion. Can you do
stateToReplica.put(state, localReplicas.get(cnt));
cnt++;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
System.out.println("replica state = " + entry.getKey() + ", store state = " + storageManager.getStore( | ||
entry.getValue().getPartitionId()).getCurrentState()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove System.out.println?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
This PR introduces current state in BlobStore that is updated by state
transition (Helix controller initiates the transition). Store's current
state is used to validate requests and reject some of them if frontends
have old view of the store. For example, store has transitted to
INACTIVE state but due to Helix notification delay, some frontends may
still believe store is in STANDBY. To avoid such inconsistent view, we
use current state (which is always up-to-date) to validate coming
requests.