Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout #38

Merged
merged 23 commits into from
Mar 18, 2020
Merged

Timeout #38

merged 23 commits into from
Mar 18, 2020

Conversation

JoaoAndreSa
Copy link
Contributor

Closes #30

I added the timeouts for all channels. I've added a default 10min value that can be easily changed at runtime. Is this enough?

Copy link
Contributor

@cgrigis cgrigis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Minor note: the break for no-op branches in select/switch is not necessary (see e.g. https://play.golang.org/p/3kj1htOkqNK), but it does not hurt. :)

Copy link
Contributor

@ineiti ineiti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions

Makefile Outdated
@@ -27,7 +27,7 @@ test_lint:
}

test_local:
go test -v -race -short -p=1 ./...
go test -v -short -p=1 ./...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a dangerous change! You should be more concerned to change the tests so that they don't time-out. The -race saved us on numerous occasions, so if you decide to really go without it, I'd make sure that the other members of the team are OK with it.

Copy link
Contributor Author

@JoaoAndreSa JoaoAndreSa Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I removed the race because while I was updating onet I started having this error for the simulations that only disappears after removing the -race flag. I'm not sure where is this coming from :/ Trying to debug what is the problem so I can re-add the flag.

Screenshot 2020-03-11 at 10 04 45

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plot thickens... apparently this only happens locally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is bbolt's error in conjunction with the latest go. Are you already using go 1.14?

@tharvik - what is the state of bbolt and the latest go?

Copy link
Contributor Author

@JoaoAndreSa JoaoAndreSa Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. If it's the case I can go back to a previous version (which I did and it works)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ineiti bbolt is still broken, we need etcd-io/bbolt#201 before updating onet/cothority (but containerd and kubernetes both waits on that, it will be fixed in a matter of days)

@@ -116,6 +118,10 @@ func NewShufflingProtocol(n *onet.TreeNodeInstance) (onet.ProtocolInstance, erro
break
}
}

// default timeout
dsp.Timeout = 10 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How has this timeout been chosen?

Why are there different timeouts?

Shouldn't it be possible to change the timeout, e.g., with an environment variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no rule to chose the timeout. I just chose enough time for it not to fail in normal query scenarios. As for the timeouts I added one for each different protocol or service. This way we can pass it when calling the protocol or service. You think an environment variable would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends (as always):

  • do you trust the leader to not send a malicious timeout that will keep your resources? (I suppose yes)
  • do you want to have each node decide on his own timeouts? (I suppose no)

So with a yes - no answer, I think you're doing the right thing ;)

But instead of hardcoding the timeout, I'd use https://golang.org/pkg/os/#Getenv to read the MEDCO_TIMEOUT environment variable, and, if it is not set, use a globally defined timeout.

Timeouts are always hard to get right (even @bford said "good luck" when I had to do it ;), so having them easily adjustable also from the outside is a plus!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ineiti I followed your suggestion and changed the timeouts 3dcc5b4
Would this be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, looks good to me.

@JoaoAndreSa
Copy link
Contributor Author

Looks good to me!

Minor note: the break for no-op branches in select/switch is not necessary (see e.g. https://play.golang.org/p/3kj1htOkqNK), but it does not hurt. :)

Removed break

@JoaoAndreSa JoaoAndreSa requested a review from ineiti March 11, 2020 11:38
@@ -201,14 +197,19 @@ func (p *CollectiveAggregationProtocol) Dispatch() error {

// Announce forwarding down the tree.
func (p *CollectiveAggregationProtocol) aggregationAnnouncementPhase() error {
timeout, err := time.ParseDuration(os.Getenv("MEDCO_TIMEOUT"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be enough to put this in afunc init() at the beginning of the file constants.go. Then you can remove it from all the methods you're calling it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done abbe277

lib/constants.go Outdated
func init() {
tmp, err := time.ParseDuration(os.Getenv("MEDCO_TIMEOUT"))
if err != nil {
TIMEOUT = tmp
Copy link
Contributor

@ineiti ineiti Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comment:

if len(os.Getenv("MEDCO_TIMEOUT") > 0 {
  log.Warn("Couldn't parse MEDCO_TIMEOUT")
}

This will save you hours later on ;)

Copy link
Contributor Author

@JoaoAndreSa JoaoAndreSa Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it in 5564657
However now that I updated to the new onet version 3.1.1 the error with the "E : overlay.go:613 (v3.(*Overlay).nodeDelete) - Error while closing node: Can't shutdown empty ProtocolInstance:" pops way more often (TestServiceClearAttr). Do you know what changed with this new 3.1 version of Onet?

Screenshot 2020-03-11 at 14 42 22

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It disappears if I remove the error condition when calling tn.SetConfig()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a simple test-case that I can run to trigger it?

And what do you mean by "removing the error condition when calling tn.SetConfig()"?

Copy link
Contributor Author

@JoaoAndreSa JoaoAndreSa Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ineiti see here #38 (comment)

^TestServiceClearAttr$ in github.com/ldsec/unlynx/services

I mean this

tn.SetConfig(conf)

instead of

if err := tn.SetConfig(conf); err != nil {
    return nil, xerrors.Errorf("couldn't set config: %+v", err)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return errors.New("Error sending <DataReferenceMessage>:" + err.Error())
select {
case dataReferenceMessage := <-p.DataReferenceChannel:
if !p.IsLeaf() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unneded if, SendToChildren when having none should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arf, I misexplained it; the if !p.IsLeaf() is unnecessary, as a Leaf sending to it's children is a NOP.
And you're droping the error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I did not get it :P
Ok it's fixed: f3812c3

case dataReferenceMessage := <-p.DataReferenceChannel:
if !p.IsLeaf() {
if err := p.SendToChildren(&dataReferenceMessage.DataReferenceMessage); err != nil {
return errors.New("Error sending <DataReferenceMessage>:" + err.Error())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the new fmt.Errorf when returning error (it will also keep the stacktrace).

Suggested change
return errors.New("Error sending <DataReferenceMessage>:" + err.Error())
return fmt.Errorf("Error sending <DataReferenceMessage>: %v", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched everywhere to fmt.Errorf: 8861342

select {
case tmp = <-p.PreviousNodeInPathChannel:
case <-time.After(libunlynx.TIMEOUT):
return errors.New(p.ServerIdentity().String() + " didn't get the <tmp> on time.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better name than tmp?

Copy link
Contributor Author

@JoaoAndreSa JoaoAndreSa Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clean a bit these tmp variables. I changed this one to spDDTbs (referring to the type of the object shufflingPlusDDTBytesStruct) even though IMHO is not much better :P

a4d36eb

if err != nil {
return nil, err
}
tn.SetConfig(conf)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't drop the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't drop the error 7f37465 some tests stop working
e.g. - ^TestServiceClearAttr$ in github.com/ldsec/unlynx/services

 - couldn't set config: Can't set config twice:
        go.dedis.ch/onet/v3.(*TreeNodeInstance).SetConfig
            /Users/jagomes/Go/pkg/mod/go.dedis.ch/onet/v3@v3.1.1/treenode.go:879:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might indicate a bug :P

Service.NewProtocol is handled specially by onet, to start new protocol on the children of the root node of the tree. To fix it, move the SetConfig in StartProtocol, just before calling NewProtocol, this way, only root will actually set the config for the whole tree, ensuring that every node get's it before starting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is perfect it works like a charm 6340084

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might indicate a bug :P

Service.NewProtocol is handled specially by onet, to start new protocol on the children of the root node of the tree. To fix it, move the SetConfig in StartProtocol, just before calling NewProtocol, this way, only root will actually set the config for the whole tree, ensuring that every node get's it before starting.

wow - @tharvik gets an onet expert! I know whom I'll ask for my next bug ;)


log.Lvl1(len(results), " proofs verified")

if results[0] == false {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if results[0] == false {
if !results[0] {

Copy link
Contributor Author

@JoaoAndreSa JoaoAndreSa Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this everywhere e5509ba

@mickmis
Copy link
Contributor

mickmis commented Mar 12, 2020

I agree with @ineiti comment about having it configurable as environment variable. We can that way set it easily in MedCo through the Docker environment.
It should also be documented somewhere, like in the README.md.

Copy link
Contributor

@mickmis mickmis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See:

Looks OK otherwise (modulo other's comments), thanks!

@JoaoAndreSa
Copy link
Contributor Author

JoaoAndreSa commented Mar 12, 2020

Shouldn't be called DEFAULT_TIMEOUT rather than TIMEOUT ?

So I can't use DEFAULT_TIMEOUT because it goes against golang naming conventions so I kept the same name.

Lint errors:
lib/constants.go:34:5: don't use ALL_CAPS in Go names; use CamelCase

8563eee

@JoaoAndreSa
Copy link
Contributor Author

JoaoAndreSa commented Mar 12, 2020

I agree with @ineiti comment about having it configurable as environment variable. We can that way set it easily in MedCo through the Docker environment.
It should also be documented somewhere, like in the README.md.

It's already done. There is this function to set it up based on the environment variable MEDCO_TIMEOUT

func init() {
	tmp, err := time.ParseDuration(os.Getenv("MEDCO_TIMEOUT"))
	if err == nil {
		DEFAULT_TIMEOUT = tmp
	} else {
		log.Warn("Couldn't parse MEDCO_TIMEOUT, using default value: ", DEFAULT_TIMEOUT.String())
	}
}

app/client.go Outdated
@@ -98,7 +98,7 @@ func parseQuery(el *onet.Roster, sum string, count bool, where, predicate, group
whereRegex := "{(w[0-9]+(,\\s*[0-9]+))*(,\\s*w[0-9]+(,\\s*[0-9]+))*}"
groupByRegex := "{g[0-9]+(,\\s*g[0-9]+)*}"

if !checkRegex(sum, sumRegex) {
if checkRegex(sum, sumRegex) == false {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? I thought @tharvik 's suggestion was to change from == false to !, no? Anyway, that would be my preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right I didn't see it correctly :(

Copy link
Contributor Author

@JoaoAndreSa JoaoAndreSa Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed e5509ba

@ineiti
Copy link
Contributor

ineiti commented Mar 12, 2020

Reality check: CoVid-19, people stay at home, no coffee talk, and everybody gathers around @JoaoAndreSa PR to tell him how to code... Sorry guy!

@JoaoAndreSa
Copy link
Contributor Author

I think I addressed all the reviews/suggestions. :) Let me know (@ineiti @tharvik @mickmis @cgrigis) if I forgot something.

@JoaoAndreSa JoaoAndreSa merged commit 6372d0e into dev Mar 18, 2020
@JoaoAndreSa JoaoAndreSa deleted the timeout branch March 18, 2020 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants