Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Network stuck at epoch boundary #2616

Closed
LaurenceIO opened this issue Apr 17, 2021 · 12 comments
Closed

[BUG] - Network stuck at epoch boundary #2616

LaurenceIO opened this issue Apr 17, 2021 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@LaurenceIO
Copy link
Contributor

Internal

Node

At the epoch boundary of epochs 259 and 260 on Thursday 15th April block production stopped for over 20 minutes

Steps to reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

System info (please complete the following information):

  • OS: [e.g. Ubuntu]
  • Version [e.g. 20.04]
  • Node version

Screenshots and attachments

  • If applicable, add screenshots, config files and/or logs to help explain the problem.

Additional context
Add any other context about the problem here.

@LaurenceIO LaurenceIO added the bug Something isn't working label Apr 17, 2021
@chimaeraa
Copy link

image
image
At the epoch boundary of epochs 261 and 262. for 10 min relay connection lost and miss 235 block. it happen every epoch boundary
System info :
Ubuntu 20.04
Node version :1.26.2
system config : 6x3GHz cpu + 10GB ram

@erikd
Copy link
Contributor

erikd commented Apr 26, 2021

This was definitely an issue with 1.26.1 but should have been fixed by 1.26.2.

Are you sure you are actually running the version you think you are running? What does cardano-node --version say?

@chimaeraa
Copy link

chimaeraa commented Apr 26, 2021

Yes, I'm sure. I made it from source code. The same thing happened for this epoch and the previous one.
image
image

image
as you can see it happen for two last epoch.

@seriru
Copy link

seriru commented Apr 26, 2021

We have the same issue at XORN.

@kevinhammond
Copy link
Contributor

Thanks for the report. 1.26.2 significantly reduces execution costs at the epoch boundary, which has a major positive impact on both block producers and relay nodes.

Relays do behave differently from block producers - the epoch boundary load for relays is based on the number of connected peers. We're investigating this specifically and will get back to you when we have some information to share. You may find that reducing the number of connected peers (even just over the boundary) gives a smoother transition.

@chimaeraa
Copy link

thank you. i have 5 relay . 3 for BP. 2 for other reason. those 2 relays have 15 peer connections but they have same problem.
in every epoch boundary my relays lost their connection to other relays.

@seriru
Copy link

seriru commented Apr 26, 2021

At the epoch boundary we see a 25-30% drop in peer connections from our relays to other relays.

@gabacode
Copy link

gabacode commented May 6, 2021

Same thing here, it gets stuck around epoch 259. The socket file also disappears, and the only solution to make it reappear seems to be to delete the db folder, and start all over again. I been doing that for the last week, would be nice to find a fix!
On cardano-node 1.26.2 linux-x86_64 ghc-8.10 relay.

screencapture-206-189-55-193-3000-d-K4z3N37w0rk-monitor-cardano-node-2021-05-06-22_05_14

@AlterX76
Copy link

AlterX76 commented May 7, 2021

Same problem here...stuck on 130 for 5 days....
Btw why do you not provide a bootstrap DB for a fast sync?

@kingli-crypto
Copy link

kingli-crypto commented May 10, 2021

We are experiencing similar issue with our relay nodes. Normally see node out of sync with network for 20 minutes.

@tigrpoolcom
Copy link

tigrpoolcom commented Aug 26, 2021

I encountered this issue also on the V1.26.2 nodes, which i've upgraded during the last days from v1.25.1 to v1.26.1.
I've got noticed by my producing nodes that they are not able to speak to the relays.
Running kes-key creation on the node also went wrong as the node.socket was not available and the relay was not responding on the assigned port!

-> yes i've built with the latest cabal and ghc version.

grafik

Is this fixed by the newer version of V1.27 or might i just run into further issues, when installing that instead ?

@chimaeraa : does that mean that i have to delete the whole db after each epoch, to ensure that my relais keep running ?
@kingli-crypto : Do you also delete the db or does the problem resolve after 20 minutes (?). I have more than 20 peers connected on my relay nodes, does that mean that my nodes need more time to resolve the issue ?

@Jimbo4350
Copy link
Contributor

Closing this. If this is still relevant please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests