-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rendezvous Certificates #7517
Rendezvous Certificates #7517
Conversation
This reverts commit 91d360a.
@@ -22,13 +24,16 @@ async def run(self): | |||
metadata_store_component = await self.require_component(MetadataStoreComponent) | |||
torrent_checker_component = await self.require_component(TorrentCheckerComponent) | |||
|
|||
rendezvous_db = RendezvousDatabase(db_path=self.session.config.state_dir / STATEDIR_DB_DIR / PopularityCommunity.RENDEZVOUS_DB_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require a separate Component
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's perfectly acceptable to create a database here. However, don't forget to close the database if it's necessary.
tribler/src/tribler/core/components/knowledge/knowledge_component.py
Lines 57 to 58 in 3b760ab
if self.knowledge_db: | |
self.knowledge_db.shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, congratulations on your first contribution to the Tribler codebase!
Regarding the PR — it looks great. I've only reviewed the community part so far and we've already discussed some points offline. I assume that addressing these points could lead to changes in the code, so I will return to the review process later (please request a review when you are ready).
Besides that, I have a comment about defining the database. You've used the old "metadata"-style approach to define separate entities, which is as follows:
self.MiscData = misc.define_binding(self.database)
self.Certificate = certificate.define_binding(self.database)
I could be mistaken, but it seems clear that this approach may not be convenient for the end-developer. We have transitioned to a different approach, as utilized in KnowledgeDB. Please see the following:
self.instance.bind('sqlite', filename or ':memory:', create_db=True) |
@kozlovsky am I right regarding the DB definition?
|
||
def __init__(self, *args, torrent_checker=None, **kwargs): | ||
community_id = unhexlify('9aca62f878969c437da9844cba29a134917e1649') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you certain that altering the community ID is necessary? As I understand it, this change results in a fork in the Popularity Community.
self.register_task("ping_rendezvous", self.ping_rendezvous, | ||
interval=PopularityCommunity.PING_INTERVAL_RENDEZVOUS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I believe it might be slightly better to use 'self' here, as well as in the line above (which I understand isn't your change). Using 'self' offers more flexibility for testing (this is preferable because it allows you to alter its value without meddling with the class namespace, which could potentially affect other tests.):
self.register_task("ping_rendezvous", self.ping_rendezvous, | |
interval=PopularityCommunity.PING_INTERVAL_RENDEZVOUS) | |
self.register_task("ping_rendezvous", self.ping_rendezvous, | |
interval=self.PING_INTERVAL_RENDEZVOUS) |
An example:
class A:
NUMBER = 1
def __init__(self):
print(A.NUMBER)
class ChildA(A):
NUMBER = 2
class B:
NUMBER = 1
def __init__(self):
print(self.NUMBER)
class ChildB(B):
NUMBER = 2
A() # prints 1
ChildA() # prints 1
B() # prints 1
ChildB() # prints 2
While the current design is functional, I believe it's not the most user-friendly as it doesn't provide an intuitive means for customizing the class's behavior through variables. I propose a slightly modified version which might be a bit more optimal:
class PopularityCommunity(RemoteQueryCommunity, VersionCommunityMixin):
...
PING_INTERVAL_RENDEZVOUS = 60 # seconds
def __init__(self, *args, torrent_checker=None, rendezvous_db=None,
ping_rendezvous_interval: float = PING_INTERVAL_RENDEZVOUS,
**kwargs):
...
self.register_task("ping_rendezvous", self.ping_rendezvous,
interval=ping_rendezvous_interval)
For instance, in the tests, you could easily pass a custom value to the class as follows:
community = PopularityCommunity(
self._ipv8_component.peer,
self._ipv8_component.ipv8.endpoint,
Network(),
ping_rendezvous_interval=0.1
)
@@ -35,22 +42,98 @@ class PopularityCommunity(RemoteQueryCommunity, VersionCommunityMixin): | |||
GOSSIP_POPULAR_TORRENT_COUNT = 10 | |||
GOSSIP_RANDOM_TORRENT_COUNT = 10 | |||
|
|||
community_id = unhexlify('9aca62f878969c437da9844cba29a134917e1648') | |||
PING_INTERVAL_RENDEZVOUS = 60 # seconds | |||
RENDEZVOUS_DB_NAME = 'rendezvous.db' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: it might be better to declare this variable within the component to align with the current methodology of separating responsibilities.
Components are more specific; they can interact with particular databases, specify particular file names, etc. On the other hand, communities are more abstract, and databases and other instances should ideally be passed to them.
Consequently, we would have two instances where communities are created:
- During runtime, they are created by
Components
. - During test execution, they are created by the
TestBase
provided by ipv8.
|
||
# Init version community message handlers | ||
self.init_version_community() | ||
self.rendezvous_cache = RendezvousCache() | ||
|
||
def send_introduction_request(self, peer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the logic for rendezvous_request
could be relocated to the on_introduction_response
function. This could simplify interactions by eliminating the need to extend introduction_request
. As a bonus, this change might allow us to retain the previous community ID.
def on_introduction_response(self, peer, dist, payload):
super().on_introduction_response(peer, dist, payload)
... # preform rendezvous_request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach is still compatible! Older peers will just ignore the extra bytes. I could drop this entire extra logic though. We can get it to work through only separate payloads.
else: | ||
# This nonce has been burned. | ||
self.rendezvous_cache.clear_peer_challenge(peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codacy is right :)
else: | |
# This nonce has been burned. | |
self.rendezvous_cache.clear_peer_challenge(peer) | |
# This nonce has been burned. | |
self.rendezvous_cache.clear_peer_challenge(peer) |
if not certificate: | ||
certificate = self.rdb.Certificate(public_key=peer.mid, counter=0) | ||
certificate.counter += 1 | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary return :)
return |
@@ -22,13 +24,16 @@ async def run(self): | |||
metadata_store_component = await self.require_component(MetadataStoreComponent) | |||
torrent_checker_component = await self.require_component(TorrentCheckerComponent) | |||
|
|||
rendezvous_db = RendezvousDatabase(db_path=self.session.config.state_dir / STATEDIR_DB_DIR / PopularityCommunity.RENDEZVOUS_DB_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's perfectly acceptable to create a database here. However, don't forget to close the database if it's necessary.
tribler/src/tribler/core/components/knowledge/knowledge_component.py
Lines 57 to 58 in 3b760ab
if self.knowledge_db: | |
self.knowledge_db.shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, congratulations also from my side on one of those rare phd code contributions to Tribler! Much appreciated 🚀 🥇 🚀
Goals is said to be: determining the online time of fellow peers
. Reading through the code I got inspiration for an "algorithm 1" type of innovation we require for solid publications. return RendezvousCertificate.get(public_key == pk).count()
this code calculates the count of rendezvous certificates. In future, use an "algorithm 1" type of approach to calculate the probability of this identity being a Sybil, given the volume, age, and IPv4 diversity of the "rendezvous DAG". This goes a bit beyond MeritRank or is equal to meritRank? Should be N log N complexity.
Great idea! This first version is just datacollection. This scoring will go beyond MeritRank as it will require multiple dimensions. The idea for this is as follows: run MeritRank on the metrics, individually, to achieve a score for each metric. Next we introduce implementation specific weights to converge all scores to a single score. |
I think the current approach used in this PR has some benefits; it allows PyCharm IDE to understand the type of expressions like In the If you want to combine the benefits of both approaches, you can define all entities in a single class RendezvousDatabase:
def __init__(self, db_path: Union[Path, type(MEMORY_DB)]):
self.database = Database()
self.Certificate, self.MiscData = self.define_binding()
...
def define_binding(self):
class Certificate(self.database.Entity):
...
class MiscData(self.database.Entity):
...
return Certificate, MiscData Then, all entities can be defined in a single file (if it is considered beneficial), and PyCharm understands the types of expressions like |
def get_count(cls, pk: bytes) -> int: | ||
return RendezvousCertificate.get(public_key == pk).count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the method is currently not used. It can probably be deleted.
If you want to keep the method, it should be fixed, as the code is incorrect: RendezvousCertificate.get(...)
call returns a single certificate object, and a single certificate object does not have the count()
method.
The correct code (if it is necessary) should probably looks like:
def get_count(cls, pk: bytes) -> int:
certificate = RendezvousCertificate.get(public_key=pk)
return 0 if certificate is None else certificate.counter
with db_session: | ||
certificate = self.rdb.Certificate.get(public_key=peer.mid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid possible database locking, it is better to either use here db_session(immediate=True)
or Certificate.get_for_update(public_key=peer.mid)
(the result is the same)
@InvictusRMC Can you address the comments and requested changes? {you will find out I guess that "production code polishing" is like eating oatmeal, brushing teeth, etc. 🤣 } |
btw Advise of @qstokkink is to test the impact on performance using minimal test Gumby experiment. A single UDP message triggering a database write/commit is scary 😨 Something we had in 2013 Dispersy times |
Comments addressed! Thank you for the reminder. |
Discussed with @qstokkink and @kozlovsky and they advised me to implement batching logic, as a large number of transactions would slow things down considerably. |
After getting a small lecture about MeritRank, my advice is as follows.
|
This task is now taking over 4 months. @qstokkink indicated it is possible to re-factor the Related work. This would make Tribler the first academically self-organising system with Sybil protection. See IPFS attack in a USENIX paper, DHT repair blog and DHT health reporting Let's make the work by @InvictusRMC plus @grimadas the key feature of the upcoming 7.14 release. Preparing for MeritRank Production usage! |
@qstokkink as you pointed out today: no ORM in IPv8 ❌ No database storage. Can you comment here a possible new API which would provide the signed certificates to the IPv8 community. Thus how can we request the rendezvous certificates (Pub-key-Them,Pub-key-OURS,nonce,signature-Them) from IPv8? |
@qstokkink rebased this into the ipv8 module of Tribler: #7630. Thank you for picking up the slack 🙏. Closing for now. |
This PR adds rendezvous certificates to Tribler. This serves as a method for determining the online time of fellow peers. The initial certificate piggybacks on the introduction message of the
PopularityCommunity
. Subsequent rendezvous pings are served through separate payloads. The implementation does not use a separate community in order to reduce overhead.The design works as follows: Peer
A
sends a ping message including a nonce serving as a challenge. PeerB
returns the signed nonce. PeerA
then increases its counter for PeerB
. Online time can be estimated by multiplying the number of pings by the interval between pings.