Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Fix segmentation fault during syncing #5834

Merged
merged 2 commits into from
Nov 20, 2019
Merged

Fix segmentation fault during syncing #5834

merged 2 commits into from
Nov 20, 2019

Conversation

halfalicious
Copy link
Contributor

@halfalicious halfalicious commented Nov 19, 2019

Fix #5832, #5633

I suspect the segmentation fault is occurring when a peer reconnects while its disconnect is
still pending (see here for a detailed description), which results in a live Session but no entry in
EthereumCapability::m_peers.

EthereumCapability::interpretCapabilityPacket directly accesses m_peers via the [] operator rather than using a lookup function, which can cause a segfault if the entry with the node ID has already been removed.

This fix will trigger an exception to be thrown when the peer being looked up is not present in EthereumCapability::m_peers. This isn't ideal, but is good enough for now until we can design a proper fix for the "reconnect during pending disconnect" issue (#5835)

@halfalicious halfalicious requested review from gumb0 and chfast November 19, 2019 05:25
@codecov-io
Copy link

codecov-io commented Nov 19, 2019

Codecov Report

Merging #5834 into master will decrease coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5834      +/-   ##
==========================================
- Coverage    64.1%   64.09%   -0.01%     
==========================================
  Files         362      362              
  Lines       30903    30902       -1     
  Branches     3429     3429              
==========================================
- Hits        19811    19808       -3     
+ Misses       9865     9863       -2     
- Partials     1227     1231       +4

@gumb0
Copy link
Member

gumb0 commented Nov 19, 2019

Do I understand correctly that the reason for crash is that new EthereumPeer is default-constructed by operator [] and then it tries to acccess null EthereumPeer::m_host pointer and crashes?

Then could we remove default constructor of EthereumPeer ?

@halfalicious
Copy link
Contributor Author

Ran a sync overnight on Linux for ~9 hours and didn't hit the AV

CHANGELOG.md Outdated Show resolved Hide resolved
@halfalicious
Copy link
Contributor Author

Do I understand correctly that the reason for crash is that new EthereumPeer is default-constructed by operator [] and then it tries to acccess null EthereumPeer::m_host pointer and crashes?

Then could we remove default constructor of EthereumPeer ?

@gumb0: Yes, that's correct - attempting to retrieve an element from the unordered map which doesn't exist (via the [] operator) results in it being default constructed and inserted, then returned. When we try to deref m_host we get the av.

Good idea about removing the default ctor, I'll do that.

@halfalicious halfalicious force-pushed the fix-sync-segfault branch 3 times, most recently from 5e789d7 to df96f20 Compare November 20, 2019 04:52
@halfalicious
Copy link
Contributor Author

Rebased and updated changelog

Segfault is occurring when a peer reconnects while its disconnect is
still pending, which results in a live Session but no entry in
EthereumCapability::m_peers.
EthereumCapability::interpretCapabilityPacket directly accesses m_peers
via node ID rather than using a lookup function, which is causing the
segfault.
@gumb0 gumb0 merged commit b148754 into master Nov 20, 2019
@gumb0 gumb0 deleted the fix-sync-segfault branch November 20, 2019 13:59
@gumb0 gumb0 restored the fix-sync-segfault branch November 20, 2019 18:01
@gumb0 gumb0 deleted the fix-sync-segfault branch November 20, 2019 18:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aleth 1.7.0 crash on Ubuntu 18.04 LTS
4 participants