-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout #38
Timeout #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Minor note: the break
for no-op branches in select
/switch
is not necessary (see e.g. https://play.golang.org/p/3kj1htOkqNK), but it does not hurt. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions
Makefile
Outdated
@@ -27,7 +27,7 @@ test_lint: | |||
} | |||
|
|||
test_local: | |||
go test -v -race -short -p=1 ./... | |||
go test -v -short -p=1 ./... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a dangerous change! You should be more concerned to change the tests so that they don't time-out. The -race
saved us on numerous occasions, so if you decide to really go without it, I'd make sure that the other members of the team are OK with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plot thickens... apparently this only happens locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is bbolt's error in conjunction with the latest go. Are you already using go 1.14?
@tharvik - what is the state of bbolt and the latest go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. If it's the case I can go back to a previous version (which I did and it works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ineiti bbolt is still broken, we need etcd-io/bbolt#201 before updating onet/cothority (but containerd and kubernetes both waits on that, it will be fixed in a matter of days)
protocols/shuffling_protocol.go
Outdated
@@ -116,6 +118,10 @@ func NewShufflingProtocol(n *onet.TreeNodeInstance) (onet.ProtocolInstance, erro | |||
break | |||
} | |||
} | |||
|
|||
// default timeout | |||
dsp.Timeout = 10 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How has this timeout been chosen?
Why are there different timeouts?
Shouldn't it be possible to change the timeout, e.g., with an environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no rule to chose the timeout. I just chose enough time for it not to fail in normal query scenarios. As for the timeouts I added one for each different protocol or service. This way we can pass it when calling the protocol or service. You think an environment variable would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends (as always):
- do you trust the leader to not send a malicious timeout that will keep your resources? (I suppose yes)
- do you want to have each node decide on his own timeouts? (I suppose no)
So with a yes - no answer, I think you're doing the right thing ;)
But instead of hardcoding the timeout, I'd use https://golang.org/pkg/os/#Getenv to read the MEDCO_TIMEOUT
environment variable, and, if it is not set, use a globally defined timeout.
Timeouts are always hard to get right (even @bford said "good luck" when I had to do it ;), so having them easily adjustable also from the outside is a plus!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, looks good to me.
Removed |
@@ -201,14 +197,19 @@ func (p *CollectiveAggregationProtocol) Dispatch() error { | |||
|
|||
// Announce forwarding down the tree. | |||
func (p *CollectiveAggregationProtocol) aggregationAnnouncementPhase() error { | |||
timeout, err := time.ParseDuration(os.Getenv("MEDCO_TIMEOUT")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be enough to put this in afunc init()
at the beginning of the file constants.go
. Then you can remove it from all the methods you're calling it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done abbe277
lib/constants.go
Outdated
func init() { | ||
tmp, err := time.ParseDuration(os.Getenv("MEDCO_TIMEOUT")) | ||
if err != nil { | ||
TIMEOUT = tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment:
if len(os.Getenv("MEDCO_TIMEOUT") > 0 {
log.Warn("Couldn't parse MEDCO_TIMEOUT")
}
This will save you hours later on ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it in 5564657
However now that I updated to the new onet version 3.1.1 the error with the "E : overlay.go:613 (v3.(*Overlay).nodeDelete) - Error while closing node: Can't shutdown empty ProtocolInstance:"
pops way more often (TestServiceClearAttr). Do you know what changed with this new 3.1 version of Onet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It disappears if I remove the error condition when calling tn.SetConfig()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a simple test-case that I can run to trigger it?
And what do you mean by "removing the error condition when calling tn.SetConfig()
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ineiti see here #38 (comment)
^TestServiceClearAttr$
in github.com/ldsec/unlynx/services
I mean this
tn.SetConfig(conf)
instead of
if err := tn.SetConfig(conf); err != nil {
return nil, xerrors.Errorf("couldn't set config: %+v", err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed #38 (comment)
return errors.New("Error sending <DataReferenceMessage>:" + err.Error()) | ||
select { | ||
case dataReferenceMessage := <-p.DataReferenceChannel: | ||
if !p.IsLeaf() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unneded if, SendToChildren
when having none should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arf, I misexplained it; the if !p.IsLeaf()
is unnecessary, as a Leaf sending to it's children is a NOP.
And you're droping the error
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh I did not get it :P
Ok it's fixed: f3812c3
case dataReferenceMessage := <-p.DataReferenceChannel: | ||
if !p.IsLeaf() { | ||
if err := p.SendToChildren(&dataReferenceMessage.DataReferenceMessage); err != nil { | ||
return errors.New("Error sending <DataReferenceMessage>:" + err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the new fmt.Errorf
when returning error (it will also keep the stacktrace).
return errors.New("Error sending <DataReferenceMessage>:" + err.Error()) | |
return fmt.Errorf("Error sending <DataReferenceMessage>: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched everywhere to fmt.Errorf: 8861342
protocols/shuffling+ddt_protocol.go
Outdated
select { | ||
case tmp = <-p.PreviousNodeInPathChannel: | ||
case <-time.After(libunlynx.TIMEOUT): | ||
return errors.New(p.ServerIdentity().String() + " didn't get the <tmp> on time.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better name than tmp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clean a bit these tmp
variables. I changed this one to spDDTbs
(referring to the type of the object shufflingPlusDDTBytesStruct
) even though IMHO is not much better :P
services/service.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
tn.SetConfig(conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't drop the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't drop the error 7f37465 some tests stop working
e.g. - ^TestServiceClearAttr$
in github.com/ldsec/unlynx/services
- couldn't set config: Can't set config twice:
go.dedis.ch/onet/v3.(*TreeNodeInstance).SetConfig
/Users/jagomes/Go/pkg/mod/go.dedis.ch/onet/v3@v3.1.1/treenode.go:879:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might indicate a bug :P
Service.NewProtocol
is handled specially by onet, to start new protocol on the children of the root node of the tree. To fix it, move the SetConfig
in StartProtocol
, just before calling NewProtocol
, this way, only root will actually set the config for the whole tree, ensuring that every node get's it before starting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is perfect it works like a charm 6340084
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might indicate a bug :P
Service.NewProtocol
is handled specially by onet, to start new protocol on the children of the root node of the tree. To fix it, move theSetConfig
inStartProtocol
, just before callingNewProtocol
, this way, only root will actually set the config for the whole tree, ensuring that every node get's it before starting.
wow - @tharvik gets an onet expert! I know whom I'll ask for my next bug ;)
simul/proofs_verification_simul.go
Outdated
|
||
log.Lvl1(len(results), " proofs verified") | ||
|
||
if results[0] == false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if results[0] == false { | |
if !results[0] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this everywhere e5509ba
I agree with @ineiti comment about having it configurable as environment variable. We can that way set it easily in MedCo through the Docker environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See:
Looks OK otherwise (modulo other's comments), thanks!
So I can't use
|
It's already done. There is this function to set it up based on the environment variable func init() {
tmp, err := time.ParseDuration(os.Getenv("MEDCO_TIMEOUT"))
if err == nil {
DEFAULT_TIMEOUT = tmp
} else {
log.Warn("Couldn't parse MEDCO_TIMEOUT, using default value: ", DEFAULT_TIMEOUT.String())
}
} |
app/client.go
Outdated
@@ -98,7 +98,7 @@ func parseQuery(el *onet.Roster, sum string, count bool, where, predicate, group | |||
whereRegex := "{(w[0-9]+(,\\s*[0-9]+))*(,\\s*w[0-9]+(,\\s*[0-9]+))*}" | |||
groupByRegex := "{g[0-9]+(,\\s*g[0-9]+)*}" | |||
|
|||
if !checkRegex(sum, sumRegex) { | |||
if checkRegex(sum, sumRegex) == false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? I thought @tharvik 's suggestion was to change from == false
to !
, no? Anyway, that would be my preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right I didn't see it correctly :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed e5509ba
Reality check: CoVid-19, people stay at home, no coffee talk, and everybody gathers around @JoaoAndreSa PR to tell him how to code... Sorry guy! |
Closes #30
I added the timeouts for all channels. I've added a default 10min value that can be easily changed at runtime. Is this enough?