Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync blockchain before running e2e tests on public testnet #568 #612

Merged
merged 14 commits into from
Feb 7, 2018

Conversation

canercidam
Copy link
Contributor

@canercidam canercidam commented Feb 4, 2018

Option to sync blockchain and exit in statusd, for synchronizing before e2e tests.

Changes:

  • -exit-when-synced flag is added to statusd.
  • NodeManager is extended with Sync.
  • Sync reuses the code in "testing/testing.go", EnsureNodeSync.
  • .travis.yml is modified to sync before e2e tests on public network.

Closes #568

Copy link
Contributor

@adambabik adambabik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Overall, it's the way to go but there are a few things that could be improved.

@@ -62,6 +62,8 @@ var (
// Push Notification
enablePN = flag.Bool("shh.notify", false, "Node is capable of sending Push Notifications")
firebaseAuth = flag.String("shh.firebaseauth", "", "FCM Authorization Key used for sending Push Notifications")

exitWhenSynced = flag.Bool("exit-when-synced", false, "Exit after blockchain sync")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make it an integer and default to -1? If it's larger than -1 then this option is active and its value tells how long to wait for a synchronization to finish? 0 could mean wait infinitely long until it's synced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, shall we expect minutes for values larger than 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, it's already clear in the comment below.

// Sync blockchain and stop.
if *exitWhenSynced {
syncStopErr := backend.NodeManager().SyncAndStopNode(time.Minute * 50)
defer handleSyncStopErr(syncStopErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure why defer was chosen. I would suggest making SyncAndStopNode synchronous and actually, change this method a bit (more about that below).

err := backend.NodeManager().SyncAndStopNode(*exitWhenSynced * time.Minute)
if err != nil {
  log.Fatalf("Failed to sync: %v", err)
}
return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted it to play along with the existing flow and make it such as:

  1. Start node.
  2. Launch sync asynchronously.
  3. Wait until node is stopped.
  4. Handle aggregated errors from sync (max 2).

So I used defer there to handle aggregated errors before exiting the program and it makes the least amount of modification to main function thus more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed the fact that reused code just tracks the progress, sorry about that. :)


// SyncAndStopNode waits until node synchronization and exits.
// Reports errors to returned channel.
func (m *NodeManager) SyncAndStopNode(timeout time.Duration) (errChan <-chan error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any benefits in having a single method that does all this stuff. There is a method to stop a node so I guess only Sync() would be needed. In main.go, it can be called like this:

if err := m.Sync(); err != nil {
  // exit
}
if err := m.StopNode(); err != nil {
  // exit
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it should have a synchronous interface. If the caller wants to call it asynchronously, then it's its trouble.

defer ticker.Stop()
for {
select {
case <-timer.C:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use <-time.After(timeout) as it's used just once in contrast to ticker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@canercidam
Copy link
Contributor Author

@adambabik I think you're right about reducing it to Sync and making it synchronous. It's cleaner and also yields much less trouble. Also makes more sense in the interface as an extra function.

@@ -62,6 +62,8 @@ var (
// Push Notification
enablePN = flag.Bool("shh.notify", false, "Node is capable of sending Push Notifications")
firebaseAuth = flag.String("shh.firebaseauth", "", "FCM Authorization Key used for sending Push Notifications")

exitWhenSynced = flag.Int("exit-when-synced", -1, "Timeout for blockchain sync and exit in minutes, zero is infinite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is a bit too long, could it be sync-only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, changing.

Copy link
Contributor

@mandrigin mandrigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your submission!

In general, it looks good! It would be nice to see some small changes, mostly related to documentation.

@@ -62,6 +62,8 @@ var (
// Push Notification
enablePN = flag.Bool("shh.notify", false, "Node is capable of sending Push Notifications")
firebaseAuth = flag.String("shh.firebaseauth", "", "FCM Authorization Key used for sending Push Notifications")

exitWhenSynced = flag.Int("exit-when-synced", -1, "Timeout for blockchain sync and exit in minutes, zero is infinite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a separate mode of an app. Maybe we should add this to the documentation.
"Special mode of the app to sync blockchain and exit when synced. Timeout is N seconds. If N is zero, timeout is infinite"

The flag name was probably specified in the task, but I'd rather call it -sync-and-exit.

> statusd -sync-and-exit

is more readable to my eyes, IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please note that the waiting time is in minutes, that is not obvious without reading code.

Copy link
Contributor Author

@canercidam canercidam Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, -sync-and-exit looks better to me than -sync-only. Just because -sync-only is rather vague.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the waiting time: statusd -h should print the usage line which says "in minutes". But I can also add your description to somewhere else as well if you like. Where else do you use for denoting such things in the project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change "Timeout for blockchain sync and exit in minutes, zero is infinite" to "Timeout in minutes for blockchain sync and exit, zero is infinite" which is more intelligible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reasons not to use flag.Duration ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but I think we rely on only minutes here.

@@ -106,6 +108,21 @@ func main() {
go startCollectingStats(interruptCh, backend.NodeManager())
}

// Sync blockchain and stop.
if *exitWhenSynced > -1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a log entry there that tells that the node will just sync blockchain and then exit. I'd also probably extract this to a separate function to keep main() simpler.

if *exitWhenSynced >= 0 {
     syncAndStopNode(backend)
     return
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sync and exit stuff look like a fairly big block of code in main(). I'll do both.

@@ -53,6 +53,9 @@ type NodeManager interface {
// Stopped node cannot be resumed, one starts a new node instead.
StopNode() (<-chan struct{}, error)

// Sync waits until blockchain is synchronized.
Sync(timeout time.Duration) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to mention that 0 means infinite timeout and not "no timeout"?
Maybe also add that it should be > 2 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't only because it's not an implementation of the function. I'll instead add it to ./geth/node/manager.go if that works for you as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that means some terra incognita between a contract (an interface) and an implementation of the interface.
Maybe we could make a custom type to make the contract more explicit?
Currently, there are 2 cases that 1 parameter tries to match, using some implicit logic.
Can we "explicify" this logic using a custom type on top of time.Duration, so we can call

m.Sync(common.SyncTimeout(1 * time.Minute))
m.Sync(common.SyncTimeoutInfinite)

Or just extract all the logic about that to the main.go.

One "go" way to do that would be to actually provide a context inside Sync function, so we can cancel it. Then we can just use select in main.go to provide the timeout functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adambabik @mandrigin

One "go" way to do that would be to actually provide a context inside Sync function, so we can cancel it.

I think this is by far the best option here in terms of usage and contract definition. One could simply pass in context.Background() or context.WithTimeout(). That way, handling the timeout in Sync would be simplified and we could use the context timing logic in main().

timeout = time.Hour * 8765
}

ticker := time.NewTicker(1 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extract 1 * Second as a constant (tickerResolution) and re-use it in the timeout check in Sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know why it's 1 * time.Second. time.Second should already suit well. 😄

And consider it done.

@@ -62,6 +62,8 @@ var (
// Push Notification
enablePN = flag.Bool("shh.notify", false, "Node is capable of sending Push Notifications")
firebaseAuth = flag.String("shh.firebaseauth", "", "FCM Authorization Key used for sending Push Notifications")

syncAndExit = flag.Int("sync-and-exit", -1, "Timeout in minutes for blockchain sync and exit, zero is infinite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there was even a better suggestion from @mandrigin to replace zero is infinite with zero means no timeout unless sync is finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, changing.

@@ -106,6 +108,12 @@ func main() {
go startCollectingStats(interruptCh, backend.NodeManager())
}

// Sync blockchain and stop.
if *syncAndExit >= 0 {
syncAndStopNode(backend.NodeManager())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to reduce the number of functions that do os.Exit(). So maybe we could have syncAndStopNode to simply return error if there was any and do log.Fatalf() here in one place.

// We need to have a larger timeout than ticker delta here
// unless we use zero which means infinite timeout.
if timeout < tickerResolution*2 && timeout != 0 {
return errors.New("Sync timeout can only be zero (infinite) or at least two seconds")
Copy link
Contributor

@adambabik adambabik Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, errors start with a lower case letter for easier joining (log.Fatalf("sync error: %v", err)) and as we talked in a different thread, maybe we can rephrase it to: invalid timeout that can be zero (meaning no timeout) or greater than two seconds.

However, I would also get rid of this at least two seconds. If, for instance, the chain is already synced, I should not wait at all. It can be easily achieved by running a check before for loop. It may require running some checks on downloader in a separate method as well.

defer ticker.Stop()
for {
select {
case <-time.After(timeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I remember I suggested changing it to this form but in this case, it may be better to do it differently:

var timeoutCh <-chan time.Time

if timeout == 0 {
  timeoutCh = make(chan time.Time)
} else {
  timeoutCh = time.After(timeout)
}

// below
select {
case <-timeoutCh:
  // ...
case <-ticker.C:
  // ...

And in this way you can avoid a hackish one year timeout :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I was actually looking for something like this before but couldn't imagine 😄

Copy link
Contributor

@mandrigin mandrigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Just a few tiny changes.

log.Println("Node will synchronize and exit")
err := nodeManager.Sync((time.Duration)(*syncAndExit) * time.Minute)
if err != nil {
log.Fatalf("Failed while waiting for sync: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as @adambabik suggested, use errors.Wrap here and in line 179 and return error from the function.

@@ -53,6 +53,9 @@ type NodeManager interface {
// Stopped node cannot be resumed, one starts a new node instead.
StopNode() (<-chan struct{}, error)

// Sync waits until blockchain is synchronized.
Sync(timeout time.Duration) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that means some terra incognita between a contract (an interface) and an implementation of the interface.
Maybe we could make a custom type to make the contract more explicit?
Currently, there are 2 cases that 1 parameter tries to match, using some implicit logic.
Can we "explicify" this logic using a custom type on top of time.Duration, so we can call

m.Sync(common.SyncTimeout(1 * time.Minute))
m.Sync(common.SyncTimeoutInfinite)

Or just extract all the logic about that to the main.go.

One "go" way to do that would be to actually provide a context inside Sync function, so we can cancel it. Then we can just use select in main.go to provide the timeout functionality.

@canercidam
Copy link
Contributor Author

@adambabik @mandrigin Would you agree renaming Sync to EnsureSync as it just tracks the progress?

Copy link
Contributor

@mandrigin mandrigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only 1 tiny thing left and it should be approved, I think :)

if *syncAndExit >= 0 {
if err = syncAndStopNode(backend.NodeManager()); err != nil {
log.Fatalf("Node synchronization failed: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a return, right? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Fatalf calls os.Exit(1) but I would use err := as there is no point in overriding previous err value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but if there is no error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mandrigin Thanks for pointing out. :)

I think it should return if there is no error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adambabik

but I would use err := ...

I can reflect this change in the next push.

Copy link
Contributor

@adambabik adambabik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

if *syncAndExit >= 0 {
if err = syncAndStopNode(backend.NodeManager()); err != nil {
log.Fatalf("Node synchronization failed: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Fatalf calls os.Exit(1) but I would use err := as there is no point in overriding previous err value.

@@ -158,6 +168,28 @@ func startCollectingStats(interruptCh <-chan struct{}, nodeManager common.NodeMa
}
}

func syncAndStopNode(nodeManager common.NodeManager) (err error) {
log.Println("Node will synchronize and exit")
v := *syncAndExit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why v? :P value or timeout should be easier to reason about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even move it as a param of the function to make it stateless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v is just short for value. I can make the change if you like but I don't think it would make a big difference. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Basically, I just thought there's no need for a meaningful name for it in that scope.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just not a fan of one character variable names except for method receiver names. I don't see any value in shortening them but I see disadvantages :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, will change this as well :)

err = nodeManager.EnsureSync(ctx)
defer cancel()
}
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can skip this error check. It's ok to properly stop a node even if it didn't manage to sync the chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then I definitely need to use (github.com/pkg/errors).Wrap here to return combined errors. Shall I dep ensure --add that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't errors.Wrap a standard library function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it isn't. I also got confused for a moment: https://golang.org/pkg/errors/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adambabik @mandrigin Would you mind adding this as a dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it makes sense to add it. It is useful in many places. @adambabik what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is moving stopping node to main.go. Wrapping both errors together is also not really correct because the sync error has nothing to do with stop node error.

Maybe something like this:

func ensureSync(nodeManager common.NodeManager, timeout int) error {
  // it's not possible to pass negative timeout from the condition in main.go

  if timeout == 0 {
    return nodeManager.EnsureSync(context.Background())
  }

  ctx, cancel := context.WithTimeout(context.Background(), (time.Duration)(timeout)*time.Minute)
  defer cancel()

  return nodeManager.EnsureSync(ctx)
}

// in main.go
if *syncAndExit >= 0 {
  var exitCode int

  if err := ensureSync(backend.NodeManager(), *syncAndExit); err != nil {
    log.Printf("Failed to sync the chain: %v", err)
    exitCode = 1
  }

  done, err := backend.NodeManager().StopNode()
  if err != nil {
    log.Fatalf("Failed to stop node: %v", err)
  }
  <-done

  os.Exit(exitCode)
}

Ugh, I am not happy with this neither :(

return
}
var done <-chan struct{}
done, err = nodeManager.StopNode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, err := nodeManager.StopNode()
<-done

should work just fine.

Technically, it should also have a timeout but I think we can skip it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns (err error) so return is compulsory there. And I wanted it to write the error to existing var (return var).

Copy link
Contributor

@mandrigin mandrigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dshulyak
Copy link
Contributor

dshulyak commented Feb 5, 2018

@canercidam do you see any reasons why EnsureSync can't be reused for tests helper?

@canercidam
Copy link
Contributor Author

@dshulyak Now that NodeManager has an implementation of it, it doesn't seem to make sense to have that helper I guess.

Copy link
Contributor

@adambabik adambabik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution with returning exitCode from syncAndStopNode!

Minor comments but please make sure <-done is called after err check as it may be nil and it will blow up.

}
var done <-chan struct{}
done, err = nodeManager.StopNode()
<-done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<-done should be after checking error if err != nil {.

defer cancel()
}
if err != nil {
log.Printf("Sync and stop error: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording: Sync error: %v (no need for a new line when using log).

done, err = nodeManager.StopNode()
<-done
if err != nil {
log.Printf("Sync and stop err: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording: Stop node err: %v.

@adambabik
Copy link
Contributor

@canercidam @dshulyak if the test helper is not needed we can create a follow-up issue to clean it up and fix tests after this one is merged.

@canercidam
Copy link
Contributor Author

@dshulyak @adambabik I'm happy to submit a PR for it.

log.Printf("Stop node err: %v", err)
exitCode = 1
}
<-done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I was precise enough. If err occurs during StopNode() call, we should return from the if statement. This still can blow up if err occurred as done may be nil.

var done <-chan struct{}
done, err = nodeManager.StopNode()
if err != nil {
  log.Printf("Stop node err: %v", err)
  return 1
}
<-done
return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry.

@mandrigin mandrigin merged commit ca719af into status-im:develop Feb 7, 2018
@mandrigin
Copy link
Contributor

@canercidam thanks for making Status better! 🥇

@canercidam
Copy link
Contributor Author

@mandrigin A pleasure! And thanks for your time and efforts. @adambabik @mandrigin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants