Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonce calculation is broken for private networks #1999

Closed
iwooden opened this issue Aug 31, 2017 · 66 comments
Closed

Nonce calculation is broken for private networks #1999

iwooden opened this issue Aug 31, 2017 · 66 comments

Comments

@iwooden
Copy link

iwooden commented Aug 31, 2017

Currently, nonce calculation is broken for private networks. The main problem is that transaction history is retained for an account even after switching the RPC endpoint to a new network. This can cause the nonce on a new network to be too high, resulting in Metamask transactions never going through.

To reproduce:

  1. Get access to two different private network RPC endpoints.
  2. Create a new account/address in Metamask, with no previous transaction history.
  3. Connect Metamask to the first private endpoint (select "Custom RPC" from the network selection dropdown, enter the RPC URL).
  4. Submit and sign an Ethereum transaction. Note the transaction entry in the transaction history section.
  5. Connect Metamask to the second private endpoint. Note that the transaction entry is still in the transaction history section.
  6. Submit and sign an Ethereum transaction. Note that the transaction hangs in a "pending" state in the Metamask UI.
  7. If you have access to the geth console for one of the nodes servicing the second RPC endpoint, check the txpool. Note that the transaction submitted was submitted with a nonce of 0x1, when the transaction count for the address in question is 0. The transaction will never go through.

I can see in app/scripts/lib/nonce-tracker.js that you're taking the max value between the locally calculated nonce and the transaction count for the address in question. This works for public/test networks where the expected nonce for an account can never go down, but that assumption doesn't hold for private networks when you can switch from a chain where an account's transaction count is 8 to one where the transaction count is 0.

One possible fix is clearing the "Private Network" transaction history when switching RPC endpoints. This isn't totally ideal, but does result in the correct nonce being calculated, and doesn't require you to redo any of the current nonce calculation functions.

  • Expected Behavior: Transactions go through on a new private network
  • Actual Behavior: Transactions are submitted with nonces that are too high
  • Browser Used: Chrome, Metamask version 3.9.11
  • Operating System Used: Windows 10
@szerintedmi
Copy link

szerintedmi commented Sep 3, 2017

I'm struggling with the same issue while testing.
Have you found a workaround? Even restarting the browser doesn't help sometimes.

@danfinlay
Copy link
Contributor

Sorry for neglecting this over the long weekend, I'll be working to fix this soon.

@iwooden
Copy link
Author

iwooden commented Sep 5, 2017

No problem Dan, thanks for taking the time. I investigated a little further, and it looks like this issue has to do with network ID specification for private networks.

Account history is tied to network ID. So, when you switch to a private network with a different network ID, you will get a fresh account with the correct nonce. However, if you switch to a private network with the same network ID, the account history is the same and the nonce mismatch will occur.

So one workaround for right now is to ensure that the various private networks you interact with have different network IDs. However, it would still be really nice to have the option to delete the account's history from the UI, forcing the correct nonce calculation.

danfinlay added a commit that referenced this issue Sep 5, 2017
@ghost ghost assigned danfinlay Sep 5, 2017
@ghost ghost added the in progress label Sep 5, 2017
@danfinlay
Copy link
Contributor

Oh, I may have misunderstood the nature of this problem. I think private networks should be responsible that they have unique IDs, and we shouldn't work too hard to ensure identically-identified networks both work with MetaMask.

@szerintedmi
Copy link

My guess in my case that the problem is not with the network id but the fact that I use the same accounts for different networks while testing.

  1. I execute a tx with account 0x1.. on privatechain with networkid 999 - works fine
  2. I stop private chain and launch testrpc with networkid 888
  3. execution of tx with same account 0x1.. fails on testrpc: invalid nounce. Usual metamask vodoo (change network back and forth) doesn't help, neither browser restart.

@danfinlay danfinlay removed their assignment Sep 11, 2017
@danfinlay
Copy link
Contributor

My guess in my case that the problem is not with the network id but the fact that I use the same accounts for different networks while testing.

Lots of people do that, so it seems unlikely that if this were the bug, it would affect so few people.

I followed your reproduction steps (albeit with testrpc on both instances), and has no problem.

To clarify: You need to specify a distinct networkId and distinct chainId for MetaMask to identify a distinct network. Could you try that and let me know if it works, @szerintedmi?

@szerintedmi
Copy link

I retried and I still had the issue with one of my accounts. All other accs work.
(on MetaMask v3.9.13)

I played a bit and I have a workaround.

That's what I receive:

popup.js:83532 [ethjs-rpc] rpc error with payload {"id":9271578721185,"jsonrpc":"2.0","params":["0xf893198504a817c800831e848094f99564a5786fedef72ad45a7c85c3e7c9394522a880b1eb7ea25a00000a41509c8cf00000000000000000000000000000000000000000000000000000000000000008207f2a0666e7d3e499ee4a7cec79b1d112bcb6ded924a2dc2b35d246275f50030b3c9cda0445ebcf4e72e1f37280d3521cc4b22c1c93dcdcb9562afdcbde6bc38b8a45b57"],"method":"eth_sendRawTransaction"} Error: Error: the tx doesn't have the correct nonce. account has nonce of: 24 tx has nonce of: 25
    at runCall (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:69351:10)
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:11327:24
    at replenish (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:8420:17)
    at iterateeCallback (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:8405:17)
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:8380:16
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:11332:13
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64434:16
    at replenish (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64381:25)
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64390:9
    at eachLimit (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64314:36)
(anonymous) @ popup.js:83532

When I send a tx without MetaMask it works.

It seems the nonce increases with every tx I send from the non MetaMask window:

Error: the tx doesn't have the correct nonce. account has nonce of: 24 tx has nonce of: 29
Error: the tx doesn't have the correct nonce. account has nonce of: 26 tx has nonce of: 29

Once the nonce reaches the MetaMask nonce I can send tx again through MetaMask.

@danfinlay
Copy link
Contributor

I'm very tempted to think that the one affected account is having problems because it specifically had transactions sent on a private chain with identical IDs, just because that's the simplest explanation I can think of.

If that's not it, I'd love to cook up an edge case for our nonce-tracker that it's failing. You may have guessed that when we compose a transaction, we try to account for locally pending transactions, and so it's possible that on a new private chain with the same ID as another, we respect the highest nonce between the network-provided one and the locally-pending-tx derived one, and increment from there.

Let me know if you come up with any theories on this, I'm chipping away on another issue that I understand at the moment, but would be happy to fix any incorrect behavior here if we could identify it.

@atomical
Copy link

atomical commented Oct 8, 2017

This is happening to me too. What's the alternative for using Truffle?

@okwme
Copy link

okwme commented Oct 13, 2017

happens to me from time to time. no idea why. i uninstall and reinstall metamask to fix it.

TripleSpeeder added a commit to TripleSpeeder/StandingOrderDapp that referenced this issue Oct 18, 2017
…t get nonce tracking right with a fresh running testrpc when previous transactions had been performed with the same account and networkid. See discussion at MetaMask/metamask-extension#1999.
@aktary
Copy link

aktary commented Oct 24, 2017

@danfinlay you said

To clarify: You need to specify a distinct networkId and distinct chainId for MetaMask to identify a distinct network.

How does one set a new chain id?

I'm having this problem too. Using the same snapshot and account locally and on a dev server, the nonces get out of sync and I can't execute transactions on the local testrpc now.

@danfinlay
Copy link
Contributor

How does one set a new chain id?

It is dependent on the client you are using. I believe the flag is --chainId on geth, although they haven't documented this feature.

@onetom
Copy link

onetom commented Nov 13, 2017

I just got the same error message:

Error: the tx doesn't have the correct nonce. account has nonce of: 0 tx has nonce of: 1
   at runCall (/usr/local/lib/node_modules/truffle/build/chain.bundled.js:60148:10)
   at /usr/local/lib/node_modules/truffle/build/chain.bundled.js:12311:24
   at replenish (/usr/local/lib/node_modules/truffle/build/chain.bundled.js:9404:17)

I'm using

  • MetaMask 3.12.0
  • Chrome 64.0.3265.0 (Official Build) canary (64-bit)
  • truffle develop chain v4.0.1 with the hardwired "candy maple cake ..." mnemonic

If I understand correctly making a couple of transactions against an ephemeral chain, then restarting such chain will lead to this situation.

Reinstalling the MetaMask extension solved the issue "of course".

@danfinlay
Copy link
Contributor

There are two different issues people are reporting here:

  1. They are using private blockchains, and need to use the chainId parameter, so that MetaMask's EIP 155 compatibility works with it correctly.

  2. People are developing against a local blockchain, and they reset the blockchain while MetaMask is pointing at it. In this case, you can work around the problem by switching to another network and back again, no reinstallation needed.

@rheaplex
Copy link

To move this off of Twitter. :-) Switching between networks isn't working for me with Truffle 4's truffle develop (which uses the same mnemonic each time) and latest Metamask in either Chromium 62 or Firefox 57 on Debian Stretch. I may be doing it wrong but I can't work out how.

An example of a failed transaction in Firefox 57 with MetaMask 3.12.0 is:

Error: [ethjs-rpc] rpc error with payload {"id":9794961516034,"jsonrpc":"2.0","params":["0xf8ac088504a817c80083012c7e94345ca3e014aaf5dca488057592ee47305d9b3e1080b844a9059cbb000000000000000000000000f17f52151ebef6c7334fad080c5704d77216b73200000000000000000000000000000000000000000000000000000000000001f48222e2a06dee046890a2f3476238691be9bced035939f1c2f3e9d71dc585719412818d08a05a3c71c9227723b4321ac44e3a013a3d6a6907712e63dfa81d98739bf604a145"],"method":"eth_sendRawTransaction"} Error: Error: the tx doesn't have the correct nonce. account has nonce of: 4 tx has nonce of: 8 at runCall (/usr/lib/node_modules/truffle/build/chain.bundled.js:60148:10) at /usr/lib/node_modules/truffle/build/chain.bundled.js:12311:24 at replenish (/usr/lib/node_modules/truffle/build/chain.bundled.js:9404:17) at iterateeCallback (/usr/lib/node_modules/truffle/build/chain.bundled.js:9389:17) at /usr/lib/node_modules/truffle/build/chain.bundled.js:9364:16 at /usr/lib/node_modules/truffle/build/chain.bundled.js:12316:13 at /usr/lib/node_modules/truffle/build/chain.bundled.js:55231:16 at replenish (/usr/lib/node_modules/truffle/build/chain.bundled.js:55178:25) at /usr/lib/node_modules/truffle/build/chain.bundled.js:55187:9 at eachLimit (/usr/lib/node_modules/truffle/build/chain.bundled.js:55111:36)

@danfinlay
Copy link
Contributor

danfinlay commented Nov 17, 2017

Reviewing the comments in here, this issue is actually a little bigger than what I was suggesting for switching networks and back again. Sorry about the skim.

The problem here is that MetaMask calculates nonces locally on a per-network basis. That means if you connect to it to "the same network" (by ID) on two different endpoints, MetaMask will currently assume these are the same network, use its same history of successful transactions, check the nonce everywhere it can, assume the current node is behind (since MetaMask is aware of a newer tx), and it'll use the latest tx it knows of to calculate nonce.

Current Causes

What we need here is a way of detecting a new network, even when all the signs indicate it's the same network. Some of the signs that prevent MetaMask from noticing this today include:

  1. When the new network is added to the same address without notice.
  2. When the two networks share the same network ID.

Possible Solutions

  • Some kind of detection logic that works across clients.
  • Some specialized indication built into testrpc.

The second one should only be used if no good solution can be found for the first one.

Some detection strategies:

Periodically check network and chain IDs

This can only partially alleviate cause #1, because if the networks share IDs, it would go undetected.

Detect when checking nonce

Since people usually experience this when sending a TX, nonce calculation time would seem like a good time, but I'm scratching for a good, definite set of indications that the chain is different.

Tracking known blocks

A method for identifying a new network on demand could be fairly reliably implemented (as long as new chains were not identical to previous ones):

  • We could track both the genesis block and a recent block (say, 10 blocks back, to avoid forking issues)
  • When calculating nonce, we could ask the node if those two blocks have the same hashes.
  • If different hashes are returned, and node is not syncing => NEW NETWORK.

Since this method requires re-checking known blocks on the node, the question rises of when we do this. Nonce calculation is a nice failsafe, since this is when the problem first burns people, but it would be better if we could detect immediately, so we don't show wrong balance, show incorrect tx history, etc.

Re-Checking known successful transactions

There's a tricky bit around "confirmed transactions". Part of why we use locally confirmed transactions, is because MetaMask is sometimes pointed at RPC endpoints that might not be fully synchronized, and in these cases, we still want to generate valid nonces.

We could re-check our oldest known successful transaction by hash, and if it's unknown, that could also be used to signal a remote network change.

Times we could perform this procedure

  • Periodically
  • On switching connection to a provider.
  • After a period of non connection, on re-connection. (Need to determine how long)

Conclusion

I think I have a fairly actionable solution here, please add any improvements anyone can think of. It seems like we've had a big bump in developer experimentation recently, because this has definitely been behavior for the lifetime, but we've had a huge spike in actual complaints, so it seems like this feature is fairly important to some segment of users.

@danfinlay
Copy link
Contributor

danfinlay commented Nov 17, 2017

When detecting a new network with an identical ID, some special things will need to be done:

  • Deal w/ tx History
  • Restart provider & RPC cache (automatically does normal provider-switching things)

Dealing with TX history

Either:

  • Transaction history will be trashed for chain with that ID
  • We migrate to storing transaction history with a different identifier (maybe the blockchain://${genesisBlock}/${laterBlock} format, although deciding which laterBlock might be ambiguous..

Short term easy solution is to trash that chain's tx history.

@stefanhuber
Copy link

My workaround for now is:

  • Change network id in ganache ui and restart (sometimes ganache hangs up and i have to close and open it again)
  • truffle migrate changed contracts
  • go to browser and change to another network and afterwards back to the private network

It takes around 30-60 seconds each time i do that...

Wouldn't it be easiest to create a button clear history for accounts?

@danfinlay
Copy link
Contributor

We've published a new version that should auto-update soon, v3.14.1, which includes a history-clearing button in Settings that can be used as a workaround for this issue:

http://metamask.helpscoutdocs.com/article/36-resetting-an-account

reset account button

@rheaplex
Copy link

rheaplex commented Feb 2, 2018

That's awesome @danfinlay ! Thank you!

@elie222
Copy link

elie222 commented Feb 2, 2018 via email

@danfinlay
Copy link
Contributor

Reset account sounds scary though.

That's right, we don't want normal people doing it. It's scary on purpose.

Thanks to @brunobar79 for writing the change PR!

@danfinlay
Copy link
Contributor

a loser could lose some important data if he clicks it which doesn't seem to be the case?

The problem is that clearing a history with pending transactions can cause the user to submit transactions with an identical nonce, which can cause all kinds of confusion. It's better if users don't. We estimate nonces under normal conditions very well.

@wbt
Copy link
Contributor

wbt commented Feb 5, 2018

We estimate nonces under normal conditions very well.

I'm not quite sure of that. I am getting an error in which I use Metamask to attempt a transaction (against Ganache 1.0.1) that is (properly) reverted. The transaction count goes up one in Ganache but not in Metamask, and then the next time I attempt to send a transaction from MetaMask I get "Error: the tx doesn't have the correct nonce" where the account nonce is one larger than the transaction nonce.

The workaround is to reset the network, rerun all transactions, and rerun useless transactions to get the count up to where Metamask thinks it should be (but not higher). I can then run transactions again, but I notice that any transaction replaces the one before it at the top of the list of history for that account, instead of adding a new element to the list as it used to. I also notice that now, even without reverted transactions, Metamask no longer increases the nonce when the account has had other non-Metamask transactions, resulting in nonce mismatch errors like this:
Screenshot
Notice that the transaction number jumps quite a bit from the first to second row shown; that's because all the (successful and unsuccessful) transactions in between (done via MetaMask) were also in the first position and overwritten in the UI by the one that came after.

I suspect this is a separate issue but when looking for possible duplicates, “Nonce calculation is broken for private networks” [this issue] seems to be a pretty strong candidate. All of this testing is without the newly contributed “Reset Account” button (thanks @danfinaly), so that patch is not the cause.

@benjamincburns
Copy link

The transaction count goes up one in Ganache but not in Metamask

@wbt - to clarify, you observe this without restarting Ganache?

@wbt
Copy link
Contributor

wbt commented Feb 14, 2018

@benjamincburns Yes, that's without restarting Ganache.

@benjamincburns
Copy link

benjamincburns commented Feb 14, 2018

@wbt can you please try this with the current beta of ganache-cli (npm install -g ganache-cli@beta) ran with the --noVMErrorsOnRPCResponse flag? (e.g. ganache-cli --noVMErrorsOnRPCResponse).

If that works, then I assume you'll like running the next beta of the Ganache UI much better than the current beta or stable release.

My suspicion is that in Ganache we report the transaction failure as an RPC error when the transaction is submitted, and MetaMask (correctly) interprets that as the transaction being rejected prior to it entering the transaction pool. If that's the case, I'd regard that as a separate issue from this one, and I wouldn't discredit MetaMask's nonce tracking because of it. We're the ones breaking the "standard," there.

@benjamincburns
Copy link

benjamincburns commented Feb 14, 2018

@danfinlay incidentally when that happens we break the JSON-RPC spec and return both an error and a result field. If you wanted to make MetaMask more tolerant to our way of doing things there, you could check whether there's a result field and increment the nonce if so.

I wouldn't blame you if you don't want to do this, however.

@danfinlay
Copy link
Contributor

I wouldn't blame you if you don't want to do this, however.

It's not so much that I don't want to, but that we are already stretched so thin, we greatly appreciate each non-breaking change. I would encourage Ganache to return success on submitting a tx w/ on-chain error, and allow clients to identify failure the usual way, by querying tx by hash.

@benjamincburns
Copy link

@danfinlay it's my intention to make that the default behavior in the next major release.

@adamskrodzki
Copy link

Hi,
I've faced same issue today connectiong to

xDai chain (web3.eth.getNetwork() == 100) (https://dai.poa.network)

I've got nounce from Ethereum Main Network

uninstalling metamask and installing again helped.

@wbt
Copy link
Contributor

wbt commented Jan 31, 2019

I'm also getting this error ("[ethjs-rpc] rpc error with payload...") again today on a long-running private network that I can't reset. Resetting the account no longer helps.

@Gudahtt
Copy link
Member

Gudahtt commented Nov 27, 2020

I'm going to close this in favour of the numerous other open tickets about improvements to this workflow (#4254, #5067, #6559, #8081). I think the "Reset Account" button does serve as a workaround for the time being.

@wbt: The issue you're describing sounds distinct from the OP, so feel free to create a new ticket if you're still encountering it! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests