avoid double creation of same volume, serialize with mutex #106

okartau · 2018-12-12T12:15:59Z

Sometimes, CSI master sends 2nd publish request just
after first one, either there was timeout or some other reason.
Node driver should withstand repeated requests without side effects.
In ControllerPublishVolume in Node part, if we receive
a second request with same volumeID, we attempt to create a next phys.volume
because we do not check have we already created and published.
There is local registry where we store published ID, so
let's check there and avoid double operation.

okartau · 2018-12-12T12:21:31Z

diff may look big but change itself is small, diff gets confused by indent changes. I use indent changes only to reuse return of OK-response part, as we have to response with OK in such case.

avalluri · 2018-12-12T12:50:40Z

@olev, The complete(real) fix to this issue would be to avoid repeated calls to node controller. i.e, from master controller, we are blindly passing all attach/publish calls to node controller without checking the cs.pmemVolumes[req.VolumeId].Status if its already 'Attached' or not.

Can you please if that also part of this change.

pohly · 2018-12-12T13:35:10Z

The fix must include protection against concurrent calls. As it stands now, a second call that arrives before the first one has updated cs.publishVolumeInfo would still create a new volume, wouldn't it?

The simplest solution is to lock a single mutex at the beginning of each call.

okartau · 2018-12-12T14:40:01Z

ok, I will add mutex in Node part.
But what about Controller part of same code i.e. just above: if cs.mode == Controller
where we initiate that RPC call to node.
Is controller part also able to run multiple instances in parallel?
Or is simple check there enough for already Attached state?

pohly · 2018-12-12T16:41:14Z

Olev Kartau <notifications@github.com> writes:

ok, I will add mutex in Node part. But what about Controller part of same code i.e. just above: if cs.mode == Controller where we initiate that RPC call to node. Is controller part also able to run multiple instances in parallel? Or is simple check there enough for already Attached state?

Don't assume that any data can be accessed safely unless explicitly documented as threadsafe. That means you need to check all gRPC calls and if they use data that might be modified in parallel, add a mutex for that data.

okartau · 2018-12-19T13:35:49Z

I added check for double publish request on Controller side, as that would catch that condition already before RPC to Node, if Controller had reached the point setting the Volume state to Attached. The open question here is, what to return on Controller, the nil is likely not that good? But as real return structure is not coming from Node, should we make up a fake one locally and return it?
The use of mutex still to be added in more commits to come

pohly · 2018-12-20T12:18:52Z

Olev Kartau <notifications@github.com> writes:

I added check for double publish request on Controller side, as that would catch that condition already before RPC to Node, if Controller had reached the point setting the Volume state to Attached. The open question here is, what to return on Controller, the _nil_ is likely not that good? But real return structure would come from Node, should we make up a fake one locally and return it?

I haven't looked in code, but the idea behind idempotency is that a function returns exactly the same thing as after the initial call.

okartau · 2018-12-31T10:30:26Z

I gave up on plan checking for repeated creation on Controller side, as it has to return with valid structure and it feels not right to start synthesize response structure from empty. Lets pass the request to Node which will respond with real response.
Also, I added use of mutex to protect Add, Delete operations.
Added mutex import forced pass of "dep ensure" which introduced many changes on vendor/ side,
there's separate commit with those

okartau · 2018-12-31T10:31:21Z

tested on unified mode with scripts in util/, also in a cluster with system under test/

pohly · 2018-12-31T13:12:29Z

Which version of dep did you use? Please try with 0.5, the vendor commit should be smaller then.

avalluri · 2019-01-02T08:50:24Z

I gave up on plan checking for repeated creation on Controller side, as it has to return with valid structure and it feels not right to start synthesize response structure from empty. Lets pass the request to Node which will respond with real response.

So you don't feel like below code at Controller side avoid some level of repeated calls to Node:

                if !ok {
                        return nil, status.Error(codes.NotFound, fmt.Sprintf("No volume with id '%s' found", req.VolumeId))
                }
+               if vol.Status == Attached && vol.NodeID == req.NodeId  {
+                       return &csi.ControllerPublishVolumeResponse{
+                               PublishInfo: map[string]string{
+                                       "name": vol.Name,
+                                       "size": strconv.FormatUint(vol.Size, 10),
+                               },
+                       }, nil
+               }
                node, err := cs.rs.GetNodeController(req.NodeId)
                if err != nil {
                        return nil, status.Error(codes.NotFound, err.Error())

okartau · 2019-01-02T09:18:59Z

I agree this code may avoid some repeated RPC calls, but my thinking is, overhead of one RPC call is not that significant that it justifies adding a 2nd instance of creating PublishVolumeRespone. Creating response on Controller feels like layering violation, as creating of response normally done on Node. My idea thus is, let all responses be created in one place in code.
But this is just one proposal.
If we reach agreement that such optimization/check is useful, I'm okay adding such code.
@pohly you want to give your vote?

okartau · 2019-01-02T09:21:27Z

another point with the mutex code is: is it OK to have it in controllerserver.go file as in current commit, or should we create a separate file as was in oim example I used. Current approach kind of limits mutex use in one file only, right.

avalluri · 2019-01-02T09:32:05Z

I agree this code may avoid some repeated RPC calls, but my thinking is, overhead of one RPC call is not that significant that it justifies adding a 2nd instance of creating PublishVolumeRespone. Creating response on Controller feels like layering violation, as creating of response normally done on Node

When i was writing this code, I used Controller{Publish,Unpublish}Volume at Node just for convince, but more suitable calls would be(at some point of time i think we should move to use) {Create,Delete}Volume. So i never felt like PublishVolumeResponse was Node's job.

pohly · 2019-01-02T14:47:31Z

I would not try to avoid a second gRPC call. If it is the node which does the work and creates the response, then it is that code which should detect an idempotent call and return the same response as before.

Phrasing it differently, all gRPC calls should be idempotent, even if they are only used internally. Then callers can always rely on that instead of having to implement additional logic.

pohly · 2019-01-02T14:50:07Z

Olev Kartau <notifications@github.com> writes:

another point with the mutex code is: is it OK to have it in controllerserver.go file as in current commit, or should we create a separate file as was in oim example I used. Current approach kind of limits mutex use in one file only, right.

If it's only used in a single file, then there's no need to declare it separately. In OIM I decided to use a separate file because the volumeNameMutex is used by both node and controller server.

okartau · 2019-01-02T15:50:59Z

Which version of dep did you use?

dep version:
dep:
 version     : devel

not very helpful.
By package version it is 0.3.2 as provided on Ubuntu 18.04 and has not been updated, and likely will not be, so 8 months old mainstream distro is hopelessly out of date for development.
But I guess we have to adapt to that in Go world. Seems the whole dep tool is about to be overtaken by next wave.
Does dep BTW record its own version somewhere in created structure? I havent find such yet.
If different developers use whatever tools/versions they happen to use, what about controlled and managed development toolchain.
Leads to question, should we document dep version requirement in README

avalluri · 2019-01-02T16:04:36Z

      I would not try to avoid a second gRPC call. If it is the node which does the work and creates the response, then it is that code which should detect an idempotent call and return the same response as before.
Phrasing it differently, all gRPC calls should be idempotent, even if they are only used internally. Then callers can always rely on that instead of having to implement additional logic.

I am not against making Node calls idempotent, but my point is why can't the caller(Controller) detect duplicates when its already having enough information to take decision.

pohly · 2019-01-03T07:29:02Z

Amarnath Valluri <notifications@github.com> writes:

I am not against making Node calls idempotent, but my point is why can't the caller(Controller) detect duplicates when its already having enough information to take decision.

Can it be 100% sure that this information is up-to-date? If both controller and node have the same information, how is it kept in sync?

pohly · 2019-01-03T07:37:56Z

Olev Kartau <notifications@github.com> writes:

By package version it is 0.3.2 as provided on Ubuntu 18.04 and has not been updated, and likely will not be, so 8 months old mainstream distro is hopelessly out of date for development. But I guess we have to adapt to that in Go world. Seems the whole dep tool is about to be overtaken by next wave.

I chatted with the main dep author and it is not certain that dep will disappear. I personally find the arguments convincing (see https://sdboyer.io/vgo/failure-modes/) that a more elaborate algorithm for dependency resolution than the simplistic one chosen by core Go is needed.

Does dep BTW record its own version somewhere in created structure? I havent find such yet.

I don't think it does. Compatibility between versions seems to be good enough that this hasn't been necessary, even if the result then is different.

If different developers use whatever tools/versions they happen to use, what about controlled and managed development toolchain.

They are already allowed to use different Go versions, right? My point is, there is no perfect control over what developers use, just recommendations. If it has an effect on what gets checked into the repo, then this has to be caught by code review or an automated CI check - but we don't have a CI at the moment.

Leads to question, should we document dep version requirement in README

Doesn't hurt, but IMHO isn't that important because not many people need that information.

okartau · 2019-01-03T07:54:18Z

I replaced 3rd commit so that "dep ensure" was applied using dep 0.5.0, and indeed change size drops drastically (to few thousands from million+ lines)

okartau · 2019-01-03T09:02:43Z

I added more mutex protection in 2nd commit to also cover state-changing operations on Node side, like mkfs, mount,umount

avalluri · 2019-01-03T10:04:33Z

Amarnath Valluri notifications@github.com writes:
I am not against making Node calls idempotent, but my point is why can't the caller(Controller) detect duplicates when its already having enough information to take decision.
Can it be 100% sure that this information is up-to-date? If both controller and node have the same information, how is it kept in sync?

It should be in sync, as all changes should go via Controller -> Node, and there is no way to sync back state change if any at Node side.

When ever Controller gets PublishVolume call it sends that request to Node, and on success updates it
cache that the volume state is 'Attached', the same way on UnpublishVolume call it updates cache to Detached. So i believe it should be upto date and can be trusted.

Sometimes, CSI master sends 2nd publish request just after first one, either there was timeout or some other reason. Node should withstand repeated requests without side effects. In ControllerPublishVolume in Node part, if we receive a second request with same volumeID, we attempt to create a next phys.volume because we do not check have we already created and published. There is local registry where we store published ID, so let's check there and avoid double operation.

As gRPC requests may get served in arbitrary order and also in parallel, and master can send repeated requests, we use mutex with key=VolumeID to protect operations that change state, like add/delete volumes/devices, mkfs, mount.

Importing mutex forced run of "dep ensure" which created this set of changes. dep v.0.5.0 was used.

okartau · 2019-01-07T10:37:15Z

So what is the decision now, should we merge this as is now, or add another check in Controller side. To get this moving, I propose that we merge it now, fixing actual issue, and treat need for additional check in separate thread

avalluri · 2019-01-07T11:07:43Z

I am ok to merge this, lets deal the other issue later.

okartau mentioned this pull request Dec 12, 2018

ControllerPublishVolume on Node called twice with same VolumeID #103

Closed

okartau force-pushed the use-published-registry branch from dffad7c to 1b72c0c Compare December 12, 2018 12:26

okartau force-pushed the use-published-registry branch 2 times, most recently from 82cbc5c to 4b5f4b0 Compare December 19, 2018 13:07

okartau force-pushed the use-published-registry branch from 4b5f4b0 to 11a434f Compare December 31, 2018 10:25

okartau force-pushed the use-published-registry branch from 11a434f to a4107d7 Compare January 3, 2019 07:52

okartau force-pushed the use-published-registry branch 2 times, most recently from 1699ab5 to dca2872 Compare January 3, 2019 09:00

okartau changed the title ~~controllerserver.go: avoid double creation of same volume~~ avoid double creation of same volume, serialize with mutex Jan 3, 2019

okartau force-pushed the use-published-registry branch from dca2872 to 090b48d Compare January 3, 2019 09:53

okartau force-pushed the use-published-registry branch 2 times, most recently from db4ce7d to 7e49d03 Compare January 3, 2019 11:32

okartau added 3 commits January 7, 2019 11:02

Use mutex to serialize state change operations

a059f04

As gRPC requests may get served in arbitrary order and also in parallel, and master can send repeated requests, we use mutex with key=VolumeID to protect operations that change state, like add/delete volumes/devices, mkfs, mount.

Update vendor/ and Gopkg.lock after adding mutex import

6b2b9d1

Importing mutex forced run of "dep ensure" which created this set of changes. dep v.0.5.0 was used.

okartau force-pushed the use-published-registry branch from 7e49d03 to 6b2b9d1 Compare January 7, 2019 09:02

okartau merged commit 0189627 into intel:devel Jan 7, 2019

okartau deleted the use-published-registry branch January 25, 2019 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid double creation of same volume, serialize with mutex #106

avoid double creation of same volume, serialize with mutex #106

okartau commented Dec 12, 2018

okartau commented Dec 12, 2018

avalluri commented Dec 12, 2018

pohly commented Dec 12, 2018

okartau commented Dec 12, 2018

pohly commented Dec 12, 2018 via email

okartau commented Dec 19, 2018 •

edited

Loading

pohly commented Dec 20, 2018 via email

okartau commented Dec 31, 2018

okartau commented Dec 31, 2018

pohly commented Dec 31, 2018

avalluri commented Jan 2, 2019 •

edited

Loading

okartau commented Jan 2, 2019

okartau commented Jan 2, 2019

avalluri commented Jan 2, 2019

pohly commented Jan 2, 2019

pohly commented Jan 2, 2019 via email

okartau commented Jan 2, 2019

avalluri commented Jan 2, 2019

pohly commented Jan 3, 2019 via email

pohly commented Jan 3, 2019 via email

okartau commented Jan 3, 2019

okartau commented Jan 3, 2019

avalluri commented Jan 3, 2019

okartau commented Jan 7, 2019

avalluri commented Jan 7, 2019

avoid double creation of same volume, serialize with mutex #106

avoid double creation of same volume, serialize with mutex #106

Conversation

okartau commented Dec 12, 2018

okartau commented Dec 12, 2018

avalluri commented Dec 12, 2018

pohly commented Dec 12, 2018

okartau commented Dec 12, 2018

pohly commented Dec 12, 2018 via email

okartau commented Dec 19, 2018 • edited Loading

pohly commented Dec 20, 2018 via email

okartau commented Dec 31, 2018

okartau commented Dec 31, 2018

pohly commented Dec 31, 2018

avalluri commented Jan 2, 2019 • edited Loading

okartau commented Jan 2, 2019

okartau commented Jan 2, 2019

avalluri commented Jan 2, 2019

pohly commented Jan 2, 2019

pohly commented Jan 2, 2019 via email

okartau commented Jan 2, 2019

avalluri commented Jan 2, 2019

pohly commented Jan 3, 2019 via email

pohly commented Jan 3, 2019 via email

okartau commented Jan 3, 2019

okartau commented Jan 3, 2019

avalluri commented Jan 3, 2019

okartau commented Jan 7, 2019

avalluri commented Jan 7, 2019

okartau commented Dec 19, 2018 •

edited

Loading

avalluri commented Jan 2, 2019 •

edited

Loading