Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Rework GenesisNetworkInfo to use the genesis address book #16759

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

mhess-swl
Copy link
Member

In addition to replacing the roster with the genesis address book in GenesisNetworkInfo, this PR also makes a further attempt to standardize how the code creates/retrieves rosters.

Closes #16643

Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
@mhess-swl mhess-swl added this to the v0.57 milestone Nov 22, 2024
@mhess-swl mhess-swl self-assigned this Nov 22, 2024
@mhess-swl mhess-swl requested review from a team as code owners November 22, 2024 23:53
@mhess-swl mhess-swl changed the title Rework GenesisNetworkInfo to use the genesis address book fix: Rework GenesisNetworkInfo to use the genesis address book Dec 2, 2024
Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
@mhess-swl mhess-swl requested a review from a team as a code owner December 2, 2024 15:28
Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
@mhess-swl
Copy link
Member Author

We're determining if this work is still relevant

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 63.82%. Comparing base (6897a54) to head (80f7adb).
Report is 46 commits behind head on develop.

Files with missing lines Patch % Lines
...a/com/hedera/node/app/info/GenesisNetworkInfo.java 87.50% 1 Missing ⚠️
...a/com/swirlds/platform/roster/RosterRetriever.java 95.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             develop   #16759      +/-   ##
=============================================
+ Coverage      63.45%   63.82%   +0.36%     
- Complexity     20319    20558     +239     
=============================================
  Files           2545     2540       -5     
  Lines          94605    95131     +526     
  Branches        9878     9941      +63     
=============================================
+ Hits           60035    60714     +679     
+ Misses         30954    30800     -154     
- Partials        3616     3617       +1     
Files with missing lines Coverage Δ
...ervice/addressbook/impl/ReadableNodeStoreImpl.java 100.00% <100.00%> (ø)
...ddressbook/impl/schemas/V053AddressBookSchema.java 93.87% <100.00%> (-2.68%) ⬇️
...-app/src/main/java/com/hedera/node/app/Hedera.java 56.67% <ø> (-0.03%) ⬇️
.../com/hedera/node/app/info/DiskStartupNetworks.java 86.50% <100.00%> (-0.62%) ⬇️
...dera/node/app/roster/schemas/V057RosterSchema.java 97.87% <100.00%> (-0.09%) ⬇️
.../java/com/swirlds/platform/roster/RosterUtils.java 67.67% <100.00%> (+4.88%) ⬆️
...state/service/schemas/V057PlatformStateSchema.java 64.10% <100.00%> (-1.76%) ⬇️
...rlds/platform/system/address/AddressBookUtils.java 66.91% <100.00%> (+3.20%) ⬆️
...a/com/hedera/node/app/info/GenesisNetworkInfo.java 68.42% <87.50%> (-0.81%) ⬇️
...a/com/swirlds/platform/roster/RosterRetriever.java 93.33% <95.00%> (+1.78%) ⬆️

... and 165 files with indirect coverage changes

Impacted file tree graph

Copy link

codacy-production bot commented Dec 2, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.32% (target: -1.00%) 98.33%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (6897a54) 97370 63462 65.18%
Head commit (80f7adb) 97678 (+308) 63972 (+510) 65.49% (+0.32%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#16759) 60 59 98.33%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
Copy link
Member

@Neeharika-Sompalli Neeharika-Sompalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @mhess-swl

Copy link
Collaborator

@MiroslavGatsanoga MiroslavGatsanoga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
Signed-off-by: Matt Hess <matt.hess@swirldslabs.com>
@@ -219,4 +237,9 @@ private static Roster retrieveInternal(@NonNull final State state, @Nullable fin
}
return null;
}

private static boolean isValidServiceEndpoint(@NonNull final ServiceEndpoint endpoint) {
return endpoint.ipAddressV4().length() > 0
Copy link
Contributor

@edward-swirldslabs edward-swirldslabs Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope this kind of input validation is happening in the DAB transactions and that it is unnecessary here.

Copy link
Member Author

@mhess-swl mhess-swl Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my attempt to remember the following (current) code:

                    .gossipEndpoint(List.of(
                                    Pair.of(address.getHostnameExternal(), address.getPortExternal()),
                                    Pair.of(address.getHostnameInternal(), address.getPortInternal()))
                            .stream()
                            .filter(pair -> pair.left() != null && !pair.left().isBlank() && pair.right() != 0) // <----- this
                            .distinct()

So the code I wrote here is slightly different. I can replace this with the original filtering if needed.

final List<ServiceEndpoint> rosterOrdered = endpoints.size() > 1 ? endpoints.reversed() : endpoints;
// Filter out any invalid endpoints
final List<ServiceEndpoint> filtered = endpoints.stream()
.filter(e -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk of filtering this down to 0 valid endpoints?

Copy link
Member Author

@mhess-swl mhess-swl Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. What is the expected behavior in that case? I didn't see any code to handle that

Copy link
Contributor

@edward-swirldslabs edward-swirldslabs Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there is no code to handle that. Can't connect to anyone that has 0 gossip endpoints. Its fundamentally bad input to be able to reach this state. Validation and error throwing should be happening at time of input, first time the data hits our system.

Copy link
Contributor

@edward-swirldslabs edward-swirldslabs Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthony-swirldslabs wrote the original. If we have proper input validation, this filter should never catch anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both NodeCreateHandler and NodeUpdateHandler call AddressBookValidator#validateServiceEndpoint to verify a valid endpoint (here and here), so we should be good 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the validation should happen upstream somewhere. At the same time, I think that re-validating here shouldn't hurt.

However, ending up with zero endpoints is no good because such a RosterEntry would be unusable by the platform. In fact, it's likely to start throwing exceptions because the Roster specification states that there must be at least a single entry. And platform rightfully assumes so.

This code needs to throw an exception and fail hard if it ends up with zero endpoints.

Copy link
Member Author

@mhess-swl mhess-swl Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code needs to throw an exception and fail hard if it ends up with zero endpoints.

Again, if this is true, then why isn't the current code throwing a clear, explicit exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely, "we just missed it back then". Less likely, but not impossible, some unit tests might use bogus AddressBooks w/o any endpoints at all, in which case this code would fail to support them. If the latter is true, well, then we're good. If not though, I think it wouldn't hurt adding the exception while we're refactoring this code anyway.

Copy link
Member

@Neeharika-Sompalli Neeharika-Sompalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @mhess-swl

} catch (CertificateEncodingException e) {
throw new InvalidAddressBookException(e);
}

final List<ServiceEndpoint> endpoints = AddressBookUtils.endpointsFor(address);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpointsFor method returns endpoints in an order that is opposite to what the Roster expects. In the Roster, the external endpoint goes at index 0, followed by an optional internal endpoint. The internal endpoint may be skipped if it's identical to the external one.

The entire platform code relies on this particular order of endpoints in the Roster and will be broken if it's reversed. This is because we're going to deprecate internal endpoints from the Roster eventually and replace them with the node configuration. So we choose to use the entry at index 0 from the start to refer to an external endpoint that the platform normally uses.

It would be best to retain the existing conversion logic in the RosterRetriever class as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to retain the existing conversion logic in the RosterRetriever class as is.

The existing conversion logic is why we have had problems and confusion in functional testing to begin with. To be more specific, not having explicit and clear safeguards to maintain said ordering in all the platform code has led to this ordering logic leaking into various places. Now this leak has to be accounted for in disparate locations sprawled across both platform and services code (hence the functional testing issues).

I attempted to standardize/unify all the existing platform and services use cases while maintaining current functionality. Maybe I didn't do a great job, but these APIs and their usages should be consistent and cohesive across the entire codebase, instead of being evaluated solely from the viewpoint of platform alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully support the intent of this change per your description. And if this conversion logic must be re-written, that's okay, too.

Speaking of this particular line of code though, since this call occurs in a Roster-related class and we know that the AddressBookUtils would return the endpoints in an order that is opposite to the roster specification, it would be a lot more appropriate to flip the entries right here, instead of burying the flipping logic down the call stack in a completely different method. This would help ensure that the rest of the code in this class doesn't need to deal with flipping at all.

However, per a comment below in this PR, it does seem that there's an extra place which requires the flipping logic - in the RosterUtils. It would be ideal to find a way to have this flipping implemented literally once in a single place somehow, to follow the spirit of the intent of this fix.

final List<ServiceEndpoint> rosterOrdered = endpoints.size() > 1 ? endpoints.reversed() : endpoints;
// Filter out any invalid endpoints
final List<ServiceEndpoint> filtered = endpoints.stream()
.filter(e -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the validation should happen upstream somewhere. At the same time, I think that re-validating here shouldn't hurt.

However, ending up with zero endpoints is no good because such a RosterEntry would be unusable by the platform. In fact, it's likely to start throwing exceptions because the Roster specification states that there must be at least a single entry. And platform rightfully assumes so.

This code needs to throw an exception and fail hard if it ends up with zero endpoints.

Comment on lines +176 to +177
// Order the endpoints in the expected roster order
final List<ServiceEndpoint> rosterOrdered = filtered.size() > 1 ? filtered.reversed() : filtered;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw this after posting my original comment. I understand how you work around the issue. However, I don't find this particularly elegant or easy to maintain. I'd still prefer to retain the original code because the logic there is quite straightforward.

Also, this workaround can in fact still end up filtering out the external endpoint which the platform really needs, and only keeping the internal endpoint. This is an error condition from the platform perspective.

Again, I'd recommend keeping the existing conversion logic in this class as it was originally written.

Copy link
Member Author

@mhess-swl mhess-swl Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw this after posting my original comment. I understand how you work around the issue. However, I don't find this particularly elegant or easy to maintain. I'd still prefer to retain the original code because the logic there is quite straightforward.

I have to disagree. As a newcomer to this code, reading the current implementation–or changing it–were not straightforward. Converting into Pair objects and doing inline validation–essentially duplicating existing validation code elsewhere, I might add–is not only clunky, but error-prone. Furthermore, the roster was designed inside the boundaries of the platform code, and yet the roster itself isn't even consistent with platform's own address book ordering. How is unwieldy, error-prone, and inconsistent code straightforward (or elegant)?

Also, this workaround can in fact still end up filtering out the external endpoint which the platform really needs, and only keeping the internal endpoint. This is an error condition from the platform perspective.

If this is true, then why doesn't the current conversion code account for these potential errors with a clear, explicit exception either?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested above, it may make sense to move the flipping logic to where we obtain the list of endpoints in the first place, so that they're always in the correct order everywhere else in this roster-specific class.

As for the exception, that's a good question to which there's not a good answer other than "we missed it when we wrote it originally." Would adding the exception break any unit tests? If not, is there a good reason not to include it with this conversion logic overhaul while we're at it?

Comment on lines +378 to +380
// If we're reading metadata, the endpoint order SHOULD match the address book, which always has the
// internal endpoint at index 0
node.serviceEndpoint());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you cannot put it into the roster w/o flipping. A RosterEntry should always have the external endpoint at index 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this needs to be flipped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only proves my earlier point: passing around two service endpoints in a list, with no designation of internal vs. external, is at best difficult to account for. I recommend we change these APIs from an arbitrary list of ServiceEndpoint to something more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with the sentiment fully. However, as of today, the plan is to use the endpoints in the RosterEntry to represent various external endpoints, such as IP-address based, FQDN-based, and in the future, perhaps IPv6-based. While currently it's unclear how this array of endpoints is going to be actually used, it is certain that we don't have plans to include any internal endpoints in this list because they're generally not useful in the roster as a public data structure.

But as long as we do use this list, we need to have the external endpoint at index zero, so the quoted code needs to flip the endpoints here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GenesisNetworkInfo should use AddressBook, not Roster.
5 participants