Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add an offline hint to contexts #4009

Closed
Stebalien opened this issue Jun 26, 2017 · 16 comments
Closed

[RFC] Add an offline hint to contexts #4009

Stebalien opened this issue Jun 26, 2017 · 16 comments

Comments

@Stebalien
Copy link
Member

Instead of manually constructing services without online components, it would be useful to have a context hint that tells services to avoid network requests. That is:

func Offline(ctx context.Context) context.Context {
	return context.WithValue(ctx, Offline, true)
}

func IsOffline(ctx context.Context) bool {
	value, ok := ctx.Value(Offline).(bool)
	return ok && value
}

One could then define offline service adapters as follows:

func OfflineDAGService(ds DAGService) DAGService {
	&offlineDAGService{ds}
}

type offlineDAGService struct {
	online DAGService
}

func (ds *offlineDAGService) Get(ctx context.Context, n Node) (Node, error) {
	return ds.online.Get(Offline(ctx), n)
}

func (ds *offlineDAGService) Add(n Node) (*cid.Cid, error) {
	return ds.online.Add(n)
}

func (ds *offlineDAGService) Remove(n Node) error {
	return ds.online.Remove(n)
}

// etc...
@whyrusleeping
Copy link
Member

I like using contexts this way. 👍 here, though we will have to clearly define expectations; what exactly is meant by a 'network call' and how do we behave in contexts where everything is a networked call? the hard part here is bitswap, how would bitswap behave when given an 'offline' context?

@whyrusleeping
Copy link
Member

cc @Kubuxu @kevina @hsanjuan @lgierth and anyone else who has opinions here

@Stebalien
Copy link
Member Author

the hard part here is bitswap, how would bitswap behave when given an 'offline' context?

I assume it would refuse to work.


For context, I want to be able to easily construct arbitrary offline variants of services but don't want to add more functions like GetOfflineLinkService to every service.

@whyrusleeping
Copy link
Member

For context, I want to be able to easily construct arbitrary offline variants of services but don't want to add more functions like GetOfflineLinkService to every service.

Yeah, agreed. We just need to be careful and make sure we don't miss adding the offline checking logic to any service that needs it.

@kevina
Copy link
Contributor

kevina commented Jun 27, 2017

I am not sure I like the idea of a "hint". For somethings (like maybe the GC) I greatly prefer a hard guarantee that the service will not make network calls because there is no code for it in the service used, rather than a hint that could be ignored.

@kevina
Copy link
Contributor

kevina commented Jun 27, 2017

I do agree that GetOfflineLinkService is a hack, but I think there should be a more uniform way to get at a service that does not use the network rather than ask an existing service to avoid network requests.

In addition to the my dislike of a "hint" that can be ignored, it will add complexity to existing services as they now will need a check for this "hint" and take appropriate action.

@Stebalien
Copy link
Member Author

but I think there should be a more uniform way to get at a service that does not use the network rather than ask an existing service to avoid network requests.

Unfortunately, I think this would require a GetOffline$ServiceName in every service interface. We could do that, it just get noisy (which is why I initially discarded the idea).

I am not sure I like the idea of a "hint". For somethings (like maybe the GC) I greatly prefer a hard guarantee that the service will not make network calls because there is no code for it in the service used, rather than a hint that could be ignored.

I agree. Hints like this do make me feel a bit uneasy. Ideally libp2p interfaces would take a context so we could enforce this at the networking level (we could even use contexts to help with bitswap accounting) but the reader/writer interfaces don't accept contexts.

Actually, on further inspection, this might run into problems where not all functions on networked interfaces accept contexts (e.g., exchange.Interface.HasBlock(block)). However, on the other hand, this hint isn't about enforcing no network activity, it's more about avoiding waiting on network requests (or at least that's my take on it).


We can actually enforce the "is-online" check using the type system but that would add a lot of complexity and is very ugly:

import "context"

type OnlineContext interface {
	context.Context
	_OnlineContext()
}

type onlineContext struct {
	context.Context
}

func (c onlineContext) _OnlineContext() {}

func Offline(ctx context.Context) context.Context {
	return context.WithValue(ctx, "offline", true)
}

func WithOnline(ctx context.Context) *OnlineContext {
	if value, ok := ctx.Value("offline").(bool); ok && value {
		return nil
	} else {
		var c OnlineContext
		c = onlineContext{ctx}
		return &c
	}
}

func needsContext(ctx OnlineContext) {}

func main() {
	ctx := context.Background()
	ctx = Offline(ctx)
	onlineCtx := WithOnline(ctx)
	if onlineCtx == nil {
		panic("need online context")
	}
	needsContext(*onlineCtx) // If you forget the check, this panics.
}

We can avoid the panic with a helper function:

ifOnline(ctx, func(ctx OnlineContext) {
    // do something online.
})

@kevina
Copy link
Contributor

kevina commented Aug 19, 2017

Perhaps we are taking the wrong approach here, although it would be useful it is not really necessary in order to eliminate the hackish GetOffline method.

The only reason GetOfflineLinkService exists is because it replaced this code in 721df36 (p.r. #3255):

 func GC(ctx context.Context, bs bstore.GCBlockstore, ls dag.LinkService, pn pin.Pinner, bestEffortRoots []*cid.Cid) (<-chan key.Key, error) {
 	unlocker := bs.GCLock()
 
-	bsrv := bserv.New(bs, offline.Exchange(bs))
-	ds := dag.NewDAGService(bsrv)
-	ds.LinkService = ls
+	ls = ls.GetOfflineLinkService()

that is to replace code that manually constructed an offline service. I addedGetOfflineLinkService since it seamed like the best solution at the time. Perhaps it would be better that the GC method just takes an Offline link service to start with. P.R. #4152 does exactly this.

[Edits for clarity and to remove quote from @whyrusleeping since it may of been out of context or misinterpreted]

@Stebalien
Copy link
Member Author

@kevina's version has now been implemented. The next step is to decide if we do, in fact want the context hint.

@kevina
Copy link
Contributor

kevina commented Jan 31, 2018

Actually my fix was not implemented. What was implemented was basically how things where before I created the LinkService. I did not create the LinkService as a means of catching or an optimization but rather as a means for being able to perform garbage collection when some of the leaf nodes are not available.

More to the point, in the filestore data for leaves may become inaccessible at any time if the backing file has changed or is removed. If the node is pinned, then this could brake garbage collection. Fortunately, the filestore does not need to read the backing file to determine what the links are, hence the LinkService was created. This original filestore code at https://github.com/ipfs-filestore/go-ipfs was able to store both Protobuf and Raw leaves. For protobuf leaves it also stored the non-data part in the database so it could tell that it was a protobuf leaf and hence had no links without having to open the backing file. With Raw leaves you can tell from just the Cid that it is a leaf so this case does not need to be handled by the filestore. The basic filestore implementation that got merged can only handle raw leaves so our existing optimization, of not consulting the blockstore if the Cid indicated a raw leaf, is sufficient to prevent problems with the GC if the backing file is changed or removed.

So for now the change in #4610 will not cause a problem, however in the future it could cause a problem, if we get a more functional filestore implementation that can also store protobuf leaves, or even if there are just cases when only the links are available and the nodes may not be.

@Stebalien
Copy link
Member Author

@kevina sorry, I misread your post above as the fix (well, a fix).

I did not create the LinkService as a means of catching or an optimization but rather as a means for being able to perform garbage collection when some of the leaf nodes are not available.

Got it. I was informed it was an for future optimizations.

So for now the change in #4610 will nor cause a problem, however in the future it could cause a problem, if we get a more functional filestore implementation that can also store protobuf leaves, or even if there are just cases when only the links are available and the nodes may not be.

There was already going to be a problem as I just pulled this code out of the current dagService's implementation of GetOfflineLinkService. Regardless, I still prefer this hack over the hack in that PR. They're both hacks to get this moving forward but at least this one doesn't allow us to accidentally start depending on other methods exposed by MerkleDAGService.

a more functional filestore implementation that can also store protobuf leaves

Kind of off topic, but I don't really see why one might want to do this. Isn't the that purpose of the filestore to deduplicate between the repo and the user's filesystem?

@kevina
Copy link
Contributor

kevina commented Jan 31, 2018

Kind of off topic, but I don't really see why one might want to do this. Isn't the that purpose of the filestore to deduplicate between the repo and the user's filesystem?

This is off topic so I don't want to spent too much time on it but consider that a leaf that is not raw contains more than just data from the backing file as it is protocol buffer. Or put in simplified terms (and not 100% accurate) non-raw leafs contain a header to indicate what they are then the data itself. The header is stored in the database as well as how to retrieve the data component from a file. When a request is made for the node the data is retrieved from the file and then recombined with header in the database to reconstruct the original protocol buffer node. With Raw nodes there is no header so all that needs to be done is retrieve the data from the file.

@Stebalien
Copy link
Member Author

I see. That's really cool!

@kevina
Copy link
Contributor

kevina commented Feb 1, 2018

Thanks. And since the header is stored in the database it is possible for the filestore to know that it is a protobuf leaf node without having to touch the disk, but with the current abstraction before LinkService there was no way for it to communicate this to the GC. That is reason I created the LinkService.

Getting back on topic, note that I am not against adding an offline hint to context, I just wish there was a better way, but so far I have not been able to think of one.

@magik6k
Copy link
Member

magik6k commented May 14, 2019

With #5654 this is probably not going to be useful. Feel free to reopen if you disagree.

@magik6k magik6k closed this as completed May 14, 2019
@Stebalien
Copy link
Member Author

I've still wanted it a few times internally but it's likely more trouble than it's worth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants