Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EUREKA-1120 Move write.lock higher in stack... #1126

Merged
merged 3 commits into from
Oct 14, 2018

Conversation

mgtriffid
Copy link
Contributor

so that payload serialization is performed while locked

@mgtriffid
Copy link
Contributor Author

Fixes #1120

@mattnelson
Copy link
Contributor

Maybe I'm interpreting this wrong, but are the locks inverted? Could that be the root cause of this issue?

Mutative(write) actions are acquiring a read lock

public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
try {
read.lock();

public boolean statusUpdate(String appName, String id,
InstanceStatus newStatus, String lastDirtyTimestamp,
boolean isReplication) {
try {
read.lock();

Read operations are acquiring a write lock

public Applications getApplicationDeltas() {
GET_ALL_CACHE_MISS_DELTA.increment();
Applications apps = new Applications();
apps.setVersion(responseCache.getVersionDelta().get());
Map<String, Application> applicationInstancesMap = new HashMap<String, Application>();
try {
write.lock();

@mgtriffid
Copy link
Contributor Author

mgtriffid commented Oct 3, 2018

I don't think so, @mattnelson . RRWL's read lock can be entered by multiple threads, but if write lock is not locked, then other threads cannot enter even read lock. I think the intention here was to allow many threads modify different registered instances (statuses, metadata), but if response cache needs to be loaded - we take a pause and prepare response. But let's better wait for @qiangdavidliu to respond to your concern.

@qiangdavidliu
Copy link
Contributor

@mgtriffid thanks for the PR. I checked some statistics and the data serialization (in extreme p99.9+ cases) can be pretty heavy, beyond even a second. With a write lock around this, this means instance registration and updates will be unable to proceed for that duration, which can be problematic.

Seems like the root issue here is the mutable InstanceInfo objects to be serialized. It might be worth doing a quick experiment to see which is cheaper: locking around the serialization or ensuring copies of the data is returned by the getApplicationDelta* methods.

@mgtriffid
Copy link
Contributor Author

mgtriffid commented Oct 5, 2018

Thanks for pointing this out, I actually was concerned about performance too, for some reason thought serialization is quick. Good that you have some stats from production of your scale.
Naive benchmark shows that on average calling copying constructors is 11 times faster:

Encoding: 84819
Copying constructors: 6561
Now actual test:
Encoding: 81785
Copying constructors: 6803
Now without metadata
Encoding: 82318
Copying constructors: 7140

Will update PR.

Mikhail Gromov added 2 commits October 9, 2018 16:51
@mgtriffid
Copy link
Contributor Author

Moved locks to original place, added copying constructors.

@qiangdavidliu
Copy link
Contributor

Looks good, thanks @mgtriffid . Just want to verify with you that doing this is also able to solve your original issue?

@mgtriffid
Copy link
Contributor Author

@qiangdavidliu , well, at least with this fix I cannot reproduce that issue, while without this fix I can do that easily.

@pparth
Copy link

pparth commented Oct 12, 2018

@qiangdavidliu Well, in our case, we consider frequent full registry fetches to be an issue. Especially in times when multiple clients execute full registry fetches at the same time, we have seen that we end up with resource starvation problems. What is your opinion on that?

@qiangdavidliu
Copy link
Contributor

@pparth we run with a readonly tier internally (same REST API -> cache -> EurekaClient ---> write servers), which due to its nature mitigated the original issue referenced in this PR.

@qiangdavidliu
Copy link
Contributor

The PR looks good, thank you @mgtriffid for the fix.

@qiangdavidliu qiangdavidliu merged commit 67c28d4 into Netflix:master Oct 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants