Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydra node crashed after a fork #560

Closed
pgrange opened this issue Oct 17, 2022 · 11 comments · Fixed by #796 or #1053
Closed

Hydra node crashed after a fork #560

pgrange opened this issue Oct 17, 2022 · 11 comments · Fixed by #796 or #1053
Assignees
Labels
bug 🐛 Something isn't working
Milestone

Comments

@pgrange
Copy link
Contributor

pgrange commented Oct 17, 2022

Context & versions

My hydra node was running a version around commit 979a561

Steps to reproduce

I can't reproduce right now. Here are the steps that I followed:

  1. Start my node
  2. let it run for several days
  3. observer that it has crashed

Actual behavior

The node crashed with the following message:

hydra-node: QueryException "AcquireFailurePointNotOnChain"

Expected behavior

To not crash

Extra Information

Looking at previous logs, it might be the case that the layer 1 switch to a fork after hydra has taken some block into consideration.
Then, querying layer 1 led to an exception. Here are some, probably significant logs:

# new tip taken into account by hydra-node:
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: [bc6522f8:cardano.node.ChainDB:Notice:66] [2022-10-14 15:49:41.10 UTC] Chain extended, new tip: 787b2e7bb45f5a77740c6679ca2a35332420eb14a795a1bedcff237126aa84ea at slot 5759381
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: {"timestamp":"2022-10-14T15:49:41.143911009Z","threadId":49,"namespace":"HydraNode-1","message":{"directChain":{"contents":{"newUTxO":{"3b54e9e04dd621d99b1b5c40d32d9d2c50f19c520ca37b788629d6bfce27a0f6#TxIx 0":{"address":"60edc8831194f67c170d90474a067347b167817ce04e8b6be991e828b5","datum":null,"referenceScript":null,"value":{"lovelace":100000000,"policies":{}}},"e9c45dd07cdafd96a399b52c9c73f5d9886bbd283b09b7eae054092e5b926b58#TxIx 1":{"address":"60edc8831194f67c170d90474a067347b167817ce04e8b6be991e828b5","datum":"a654fb60d21c1fed48db2c320aa6df9737ec0204c0ba53b9b94a09fb40e757f3","referenceScript":null,"value":{"lovelace":3654400,"policies":{}}}},"tag":"ApplyBlock"},"tag":"Wallet"},"tag":"DirectChain"}}
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: {"timestamp":"2022-10-14T15:49:41.143915519Z","threadId":49,"namespace":"HydraNode-1","message":{"directChain":{"point":"At (Block {blockPointSlot = SlotNo 5759381, blockPointHash = 787b2e7bb45f5a77740c6679ca2a35332420eb14a795a1bedcff237126aa84ea})","tag":"RolledForward"},"tag":"DirectChain"}}

# layer 1 swithes to a fork
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: [bc6522f8:cardano.node.ChainDB:Info:66] [2022-10-14 15:49:41.22 UTC] Block fits onto some fork: 8adda8ee035fb15aac00c0adbec662c2197cc9c1fbd4282486f6f6513c7700f5 at slot 5759381
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: [bc6522f8:cardano.node.ChainDB:Notice:66] [2022-10-14 15:49:41.22 UTC] Switched to a fork, new tip: 8adda8ee035fb15aac00c0adbec662c2197cc9c1fbd4282486f6f6513c7700f5 at slot 5759381
# ignoring some "closing connections" logs
# hydra node crashes with an exception:
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: hydra-node: QueryException "AcquireFailurePointNotOnChain"

The exception thrown, I think, would correspond to CardanoClient:295 in the current version of code.

@pgrange pgrange added the bug 🐛 Something isn't working label Oct 17, 2022
@pgrange
Copy link
Contributor Author

pgrange commented Oct 17, 2022

Find attached more logs for the day in case it's useful

2022-10-14.logs.last.txt

@ch1bo
Copy link
Collaborator

ch1bo commented Oct 21, 2022

@Quantumplation this might be similar to what you have seen? Any logs would be helpful.

@pgrange
Copy link
Contributor Author

pgrange commented Nov 7, 2022

Had a new crash last Friday. Was running hydra version 0.8.0. Not sure what happened and if it was really related to this same problem. I attach the logs here in case it could help.

2022-11-07.logs.txt

@pgrange
Copy link
Contributor Author

pgrange commented Nov 9, 2022

Had a new crash last nigh. Was still running version 0.8.0. I attache the logs here in case it could help. I've noticed this message from the hydra node:

hydra-node: QueryAcquireException AcquireFailurePointNotOnChain

2022-11-08.logs.txt

@pgrange
Copy link
Contributor Author

pgrange commented Nov 10, 2022

Happened again last night. I attach the logs here.

2022-11-10.logs.txt

@pgrange
Copy link
Contributor Author

pgrange commented Nov 14, 2022

Happened again the same day.

2022-11-10-2.logs.txt

@pgrange
Copy link
Contributor Author

pgrange commented Nov 14, 2022

Again today.
2022-11-14.logs.txt

@ch1bo
Copy link
Collaborator

ch1bo commented Dec 1, 2022

Might have been solved by #621, closing this until it re-appears.

@ch1bo ch1bo closed this as completed Dec 1, 2022
@pgrange pgrange reopened this Dec 7, 2022
@ffakenz
Copy link
Contributor

ffakenz commented Dec 7, 2022

Nodes are still crashing because of hydra-node: QueryAcquireException AFPointNotOnChain

@ch1bo ch1bo mentioned this issue Mar 14, 2023
4 tasks
@ch1bo
Copy link
Collaborator

ch1bo commented Mar 14, 2023

Maybe the queryTimeHandle could be the issue here. We are (still) requesting and rebuilding a time handle on each RollForward, which could fail by querying at a specific point .. if we happen to switch the change just in between:

modified   hydra-node/src/Hydra/Chain/Direct/TimeHandle.hs
@@ -16,7 +16,7 @@ import Hydra.Cardano.Api (
  )
 import Hydra.Cardano.Api.Prelude (ChainPoint (ChainPoint, ChainPointAtGenesis))
 import Hydra.Chain.CardanoClient (
-  QueryPoint (QueryAt),
+  QueryPoint (..),
   queryEraHistory,
   querySystemStart,
   queryTip,
@@ -107,8 +107,8 @@ mkTimeHandle currentSlotNo systemStart eraHistory = do
 queryTimeHandle :: NetworkId -> FilePath -> IO TimeHandle
 queryTimeHandle networkId socketPath = do
   tip <- queryTip networkId socketPath
-  systemStart <- querySystemStart networkId socketPath (QueryAt tip)
-  eraHistory <- queryEraHistory networkId socketPath (QueryAt tip)
+  systemStart <- querySystemStart networkId socketPath QueryTip
+  eraHistory <- queryEraHistory networkId socketPath QueryTip
   currentTipSlot <-
     case tip of
       ChainPointAtGenesis -> pure $ SlotNo 0

@ch1bo ch1bo added this to the 0.10.0 milestone May 11, 2023
@locallycompact
Copy link
Contributor

{"timestamp":"2023-08-29T19:38:43.440470208Z","threadId":95,"namespace":"HydraNode-\"Dan\"","message":{"node":{"by":{"vkey":"0f23a3124d401d89e8f6cfb724eb00ac073c74edc93a4db9b0889c999f2415fe"},"event":{"chainEvent":{"chainSlot":37654723,"chainTime":"2023-08-29T19:38:43Z","tag":"Tick"},"tag":"OnChainEvent"},"eventId":622,"tag":"BeginEvent"},"tag":"Node"}}
{"timestamp":"2023-08-29T19:38:43.440489848Z","threadId":95,"namespace":"HydraNode-\"Dan\"","message":{"node":{"by":{"vkey":"0f23a3124d401d89e8f6cfb724eb00ac073c74edc93a4db9b0889c999f2415fe"},"outcome":{"contents":{"chainSlot":37654723,"tag":"TickObserved"},"tag":"StateChanged"},"tag":"LogicOutcome"},"tag":"Node"}}
{"timestamp":"2023-08-29T19:38:43.440635265Z","threadId":95,"namespace":"HydraNode-\"Dan\"","message":{"node":{"by":{"vkey":"0f23a3124d401d89e8f6cfb724eb00ac073c74edc93a4db9b0889c999f2415fe"},"eventId":622,"tag":"EndEvent"},"tag":"Node"}}
{"timestamp":"2023-08-29T19:38:43.769747589Z","threadId":67,"namespace":"HydraNode-\"Dan\"","message":{"directChain":{"contents":{"tag":"BeginInitialize"},"tag":"Wallet"},"tag":"DirectChain"}}
{"timestamp":"2023-08-29T19:38:44.462112583Z","threadId":101,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Application Exception: 13.39.121.84:5001 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = \"SubscriptionWorker exiting\", seStack = []}","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.462252522Z","threadId":101,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Closed socket to 13.39.121.84:5001","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.462682122Z","threadId":102,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Application Exception: 13.37.150.125:5001 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = \"SubscriptionWorker exiting\", seStack = []}","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.462786868Z","threadId":102,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Closed socket to 13.37.150.125:5001","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.463957252Z","threadId":107,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Application Exception: 13.37.15.211:5001 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = \"SubscriptionWorker exiting\", seStack = []}","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.464034863Z","threadId":107,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Closed socket to 13.37.15.211:5001","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
hydra-node: QueryAcquireException AFPointNotOnChain

running on 9754dcc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
5 participants