Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploratory refactor-2 of libp2p + HTTP #103

Closed
wants to merge 2 commits into from
Closed

Conversation

gammazero
Copy link
Collaborator

This is another version of the libp2p + HTTP refactor in #102. The key difference is that a new NamespacedClient, having the publisher's address, is created for each sync within the per-sync Syncer.

This is another version of the libp2p + HTTP refactor in #102. The key difference is that a new `NamespacedClient`, having the publisher's address, is created for each sync within the per-sync Syncer.
@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2023

Codecov Report

Patch coverage: 85.93% and project coverage change: +0.15% 🎉

Comparison is base (52d79f6) 54.54% compared to head (3843e8b) 54.69%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
+ Coverage   54.54%   54.69%   +0.15%     
==========================================
  Files          65       65              
  Lines        5152     5209      +57     
==========================================
+ Hits         2810     2849      +39     
- Misses       2039     2054      +15     
- Partials      303      306       +3     
Files Changed Coverage Δ
dagsync/httpsync/publisher.go 66.19% <62.50%> (-0.47%) ⬇️
dagsync/httpsync/sync.go 78.43% <93.75%> (+3.58%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I agree with the idea of putting the httpHost in the sync and creating the http.Client internally with NewSyncer.

I added some notes (some for myself). But I have one API design question I'd like feedback on around server auth.

// libp2p streams, and multiaddrs to create listeners on.
publisherHost, err := libp2phttp.New(
libp2phttp.StreamHost(publisherStreamHost),
libp2phttp.ListenAddrs([]multiaddr.Multiaddr{multiaddr.StringCast("/ip4/127.0.0.1/tcp/0/http")}),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be is obvious, but I forgot to note this here: In production this MUST be an https endpoint. It's not here to make testing a bit easier.

for i := range peerAddrs {
//
// TODO: Replace arguments with peer.AddrInfo
func (s *Sync) NewSyncer(peerID peer.ID, addrs []multiaddr.Multiaddr) (*Syncer, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave a comment that peer.ID is optional for the HTTP transport. In general, auth is optional on HTTP.

sync: s,
}

if peerInfo.ID != "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peerInfo.ID may be nil to this argument. It means we won't cache the peer metadata in the http host.

I'm thinking I need to change this. We should be able to cache this based on the endpoint (scheme+host+port+sni).

dagsync/httpsync/sync.go Outdated Show resolved Hide resolved
}

func (s *Sync) newLibp2pSyncer(peerInfo peer.AddrInfo) (*Syncer, error) {
httpClient, err := s.clientHost.NamespacedClient(s.protoID, peerInfo)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the best go-libp2p api to expose optional server authentication to users?

My original idea was to determine if a user wanted to authenticate the server if peerInfo.ID was set. The problem is that this can be very annoying to users because:



* A user might not know if the multiaddr they are passing is an HTTP transport or Stream transport.



* A user might be passing multiple multiaddrs (both an HTTP transport and a stream transport), and stream transports require a server ID (libp2p streams MUST be authenticated). In the case that a user doesn't need server peer id auth, they have no way of expressing that. They'll pay for HTTP server auth because they passed it in (since it is required for the stream transports).









Now I'm thinking we introduce a new option to the RoundTripper/http.Client functions. Something like ServerMustAuthenticate. Users would pass this option to tell the http host that it MUST authenticate the server (either via HTTP Peer ID Auth or stream auth). If this is not passed in, we can assume the client doesn't care if the server's peer id has been authenticated and we can ignore the passed in peerInfo.ID.









Wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having an option to allow authentication to be bypassed works. This is also not an unfamailiar thing to do, as it is similar to tls.Config.InsecureSkipVerify

if err != nil {
return nil, err
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, if we see an httpath component we can tell the http host about this mapping via httpHost.AddPeerMetadata. That would save us a round trip for .well-known/libp2p if we have it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoPolo This would work for the /head path since it does not change (i.e. is well-known), but will not work for all the advertisement and entries paths, since they all end with a CID that is almost always different. It would be awesome if there were a way to match well-known path prefixes (e.g. /ipni/v1/ad/) to a protocol ID. Then the cache could do a longest prefix match to lookup the metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding something. In the current version of master we have the notion of a syncer's rootURL. This represents the prefix that the IPNI protocol is mounted at. If we want to GET the head we do rootURL + "/head" and if we want a cid we do rootURL + cid.String().

I'm noting here that we could save a round trip to .well-known/libp2p telling us the prefix by using the rootURL. We would do something like clientHost.AddPeerMetadata(peer.id, libp2phttp.WellKnownProtoMap{"/ipni/v1": rootURL}.

You probably need to do this for backwards compatibility as well as an optimization, since existing deployed publishers won't have set up a /.well-known/libp2p endpoint yet.

I don't quite understand why you'd want to do a prefix match, what am I missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably not full understanding the correct use. The code immediately following this comment is my attempt to use this feature. Maybe I need to omit the "head" or any other thing following the root prefix?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. I think I see. You don't actually need to do any of the below. The path in WellKnownProtocolMeta is the path prefix for an application protocol. So for IPNI it would be something like /ipni/1 (or just /ipni). It's the path prefix that defines where this application protocol is mounted on the server. Let me create a patch of what I mean.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/MarcoPolo/30c5c138a126473a949ab1879c16c37a

We learn about this prefix out-of-band by using the httpath in the multiaddr if it's present.


// libp2phttp
clientHost *libp2phttp.HTTPHost
protoID protocol.ID

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be a constant value here. It is simply some identifier for this protocol. Maybe /ipni-sync/1 ?

@rvagg
Copy link
Contributor

rvagg commented Aug 11, 2023

Currently I think I might be the only user of httpath, in Frisbii. With the new /ipni/v1/ad thing I'm much less concerned with having the flexibility to define a path for advertisements—my original reason for using it is that I'm using the http server for other things, not just IPNI advertisements, so needed to namespace it.

So with this change we would have have 3 tools for namespacing:

  • The fixed /ipni/v1/ad namespacing, freeing up /.
  • httpath for arbitrary pathing (I'm currently using /_ipni/).
  • /.well-known/libp2p mapping, requiring an additional round-trip.

Which is nice, and I'm less attached to httpath with these options.

But, with my use-case I reuse the http server, so using this libp2p solution means I have to reengineer my server to do everything through libp2p, right? @MarcoPolo, perhaps you could consider how https://github.com/ipld/frisbii is using this if you want to explore the usability of this (not asking for a PR, just that frisbii is very particular about its use of go-libipni in a way that's currently (probably) not compatible with this new API).

@MarcoPolo
Copy link

But, with my use-case I reuse the http server, so using this libp2p solution means I have to reengineer my server to do everything through libp2p, right? @MarcoPolo, perhaps you could consider how https://github.com/ipld/frisbii is using this if you want to explore the usability of this (not asking for a PR, just that frisbii is very particular about its use of go-libipni in a way that's currently (probably) not compatible with this new API).

My goal is that you don't have to re-engineer anything if you're already building on top of HTTP semantics. You don't even have to use the go-libp2p package to be compatible with clients who are using it. Using the go-libp2p package is mostly useful if:

  1. You need to work on top of other transports besides an HTTP native one (http/1.1, h2, h3).
  2. You need NAT hole-punching.
  3. You need to authenticate a peer by their peer id.
  4. You want to learn about a peer from their peer id.

(these are copied from the WIP HTTP spec)

I took a look at Frisbii, I think there's only one thing to add if you want to support the .well-known/libp2p endpoint. Which you probably should unless you expect every client to be able to learn about your protocols and their prefixes out-of-band (which is fine! just more work). The change is small since you already have go-libp2p as a dependency, only 4 lines of code. Here's the patch: https://gist.github.com/MarcoPolo/3fb6b479b25d8a8e43eaadc516c60c29.

@gammazero
Copy link
Collaborator Author

Closing in favor of #113

@gammazero gammazero closed this Aug 24, 2023
@gammazero gammazero deleted the lipp2phttp-gammazero branch September 20, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants