Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In datastote.NewKey calling path.Clean is dangerous #2601

Closed
kevina opened this issue Apr 25, 2016 · 15 comments
Closed

In datastote.NewKey calling path.Clean is dangerous #2601

kevina opened this issue Apr 25, 2016 · 15 comments
Labels
backwards incompatible: repo kind/bug A bug in existing code (including security flaws) status/in-progress In progress

Comments

@kevina
Copy link
Contributor

kevina commented Apr 25, 2016

[I am not sure the best place to post this as this issue is related to both go-ipfs and go-datastore, I can move this to this to go-datastore if necessary.]

I may be completely missing something here but it seams datastore keys are represented via binary []byte strings, possible prefixed with a "/" and are converted to a datastore.Key by calling datastore.NewKey().

However, datastore.NewKey() also calls path.Clean(). This is a dangerous thing to do with binary strings. For example the binary []byte representation of the hash might just happen to contain a "//" which path.Clean will convert to a "/". In fact it seams that all of datastore.Key is written with the assumption the keys are text and not binary blobs.

Here is the relevant code in go-datastore/key.go:

// NewKey constructs a key from string. it will clean the value.
func NewKey(s string) Key {
    k := Key{s}
    k.Clean()
    return k
}

// Clean up a Key, using path.Clean.
func (k *Key) Clean() {
    k.string = path.Clean("/" + k.string)
}

The probability of a []byte string for a multihash containing something that path.Clean() will remove or change is low, which is why I image this issue has not come up yet.

I am not sure the best way to fix this. Maybe it might be better to keep the mutihash in its b58 encoding to mesh better with how datastore.Key was (supposedly) designed. Or as a quick fix, maybe it would be better to remove the call to path.Clean in the datastore.Key.Clean() method.

@whyrusleeping
Copy link
Member

@kevina yeah, this is a known issue: #994

Its the cause of the following failure (and a few others i'm not finding right now).
https://github.com/ipfs/go-ipfs/blob/master/test/sharness/t0080-repo.sh#L112

I'd say we can keep this issue open as it will add more pressure towards us getting this fixed.

@whyrusleeping
Copy link
Member

I should also note that currently this is much less of an issue than you would think. Since the mangling of keys is consistent, we can always access the keys we like we expect. The only noticeable issue this causes is that when running a gc, any key affected by this bug won't be removed. (also you won't be able to see keys affected by this in an ipfs refs local)

@kevina
Copy link
Contributor Author

kevina commented Apr 26, 2016

There is also the small possibility that this bug will create a collision when there normally would not be one, although the probability will be low. For example lets's assume we have these two (8-byte) keys:

key1: 0x34 0x7b '/' '/' 0x7e 0xf3 '/' 0x23
key2: 0x34 0x7b '/' 0x7e 0xf3 '/' '/' 0x23

Both keys will get mangled to 0x34 0x7b '/' 0x7e 0xf3 '/' 0x23 and thus create a collusion even though the keys are different. I am not sure how IPFS will behave in this situation.

The probability of this happening with 32 byte keys is extremely low and I doubt I could create a collision. But the possible still exists and it is certainly higher than the probability of a SHA-256 collision.

@kevina
Copy link
Contributor Author

kevina commented Jun 21, 2016

@whyrusleeping would there be any performance degradation to just using the base58 encoding for datastore keys? This will mesh much better with how go-datastore was designed.

The flatfs datastore might need some special case code to decode the base58 encoding and re-encoding as a hex string.

This will of course introduce a repo change so I image that we should coordinate the change with a major release.

I am against just remove the call to path.Clean because mixing ASCII paths with binary keys is just a bad idea in my view. It could be made to work if we are very careful, but it not something I would be happy with at all.

@whyrusleeping
Copy link
Member

@kevina there is a huge performance degradation using base58 encoding for the datastore keys.
I think the solution should be to just disallow binary paths in the datastore code in the first place. Using something like hex or base64 (both of which would not incur a perf hit) to encoded keys.

The question is at what layer do we do the encoding? Do we force users to sanitize their inputs? or do we force something on all inputs?

I'm personally in favor of requiring the user to sanitize all inputs, otherwise, if we always encode inputs, we won't be able to easily see paths in the keys, /blocks/ABC/ becomes something weird like f7a6db728c9eff5d

@kevina
Copy link
Contributor Author

kevina commented Jun 21, 2016

I am in favor of using base64 encoding then as it will be shorter and can be used directly in the leveldb (and filestore) datastore. It will only be the flatfs datastore that really would need to concern itself with how the keys are encoded.

I am not 100% sure I am following you but I would think that datastore.NewKey() should fail if it sees binary data (anything in the range 0x00 to 0x31). Is this what you mean by force users to sanitize their inputs?

@whyrusleeping
Copy link
Member

@kevina Yeah, that would be one way to force users to sanitize input. Although that has the annoying side effect of making NewKey return an error (requiring a lot of code churn). We could also enforce this in places where it will hurt us (like flatfs) and document it clearly elsewhere

@kevina
Copy link
Contributor Author

kevina commented Jun 21, 2016

@whyrusleeping Binary keys are used all over the place, not just in the /blocks prefix. It is leading to clear bugs in flatfs, but it could lead to problems elsewhere. Here is some of the prefixed used with binary data:

/ipns/<binary>
/local/pins/indirect/keys/<binary>
/pk/<binary>

As a temporary measure we could panic instead to catch all possible binary encoding in addition to having flatfs failing when it sees a key not in base64 encoding.

@whyrusleeping
Copy link
Member

@kevina all of those are places where we are passing a hash value into the key name. Those should be pretty easy to find and eliminate. I think that is a fairly easy step forward, but it will require a repo version bump and another migration.

@kevina
Copy link
Contributor Author

kevina commented Jun 21, 2016

@whyrusleeping I agree. Its the migration step that would be the biggest blocker.

Do you think base64 is better than hex? The only place hex would be better would be that the flatfs datastore won't need to reencode, but it would still need to check the keys are in hex.

For the filestore I would prefer base64. We can use the encoding/base64 package.

I can likely help with this once we decide on what to do.

@whyrusleeping
Copy link
Member

Unless we encounter significant opposition, heres the plan moving forward:

  • We keep the path.Clean in datastore.NewKey, path based keys are nice to work with and reason about, and also map nicely to bucket oriented backends.
  • We require users to use valid 'path' characters in the strings they use as keys. This can be enforced in one of two ways:
    • in NewKey. I don't like this as much, but it might be cleaner overall. This will require changing all current invocations to also return an error.
    • In datastore implementations that care about this via some sort of key.Validate() method. This is the lower effort and quickest integration route (quick and dirty?)
  • Change all places in go-ipfs where we create binary keys to first base64 encode the keys before passing into NewKey
  • write a migration for the new on disk pathing format. This will be a little difficult because due to the path.Clean bug, we cant directly just 'move' values.
    • for /blocks/ We will have to read each block off disk, check its hash, and then write it to the new datastore.
    • the /pk/ namespace can be done similarly.
    • For the /ipns/ namespace, it gets a little trickier. I don't remember if we have an easy way to check the correct key from the stored record. We may have to just toss values that are affected by the bug (this is okay, they expire every so often anyways)
    • for pins, its going to get interesting. We may have to go through every bug affected key in the datastore and build a mapping between the actual key and its base58 mangled counterpart and use that to correctly transfer pins.
  • The new migration will need to be built and pushed up to dist.ipfs.io

Very nice to have:

  • set up ipfs to automatically download and run migrations for the user. Its inconvenient to have to require our users to go download some binary from our webpage and run it on their own.
  • If 'fully automatic' isnt the way to go, we should at the very least have a yes/no prompt with detailed documentation on whats going on.

@whyrusleeping
Copy link
Member

cc @kevina @Kubuxu @jbenet

@kevina
Copy link
Contributor Author

kevina commented Jun 21, 2016

Note: As an implementation detail we should use base64.RawURLEncoding for the base64 encoding. (1) The Standard encoding (see https://tools.ietf.org/html/rfc4648#section-4) includes '+' and '/' in the alphabet, the URLEncoding (often called the "base64url" encoding, see https://tools.ietf.org/html/rfc4648#section-5) uses '-' and '_' instead. (2) The Raw form does not include the unnecessary '=' padding character at the end. This is implied as optional in section 5 of the RFC, also goggling "base64url" will show that the padding is not used.

@whyrusleeping
Copy link
Member

@kevina good point. I just noticed that i used the standard encoding in my other PR just now.

@whyrusleeping whyrusleeping added this to the Ipfs 0.4.3 milestone Jun 21, 2016
@Kubuxu Kubuxu added kind/bug A bug in existing code (including security flaws) backwards incompatible: repo labels Jun 21, 2016
@whyrusleeping
Copy link
Member

whyrusleeping commented Jun 22, 2016

wow. The key encoding change was literally a one-liner.

EDIT: okay, maybe there were two lines we needed. But still, way simpler than i expected

Kubuxu added a commit that referenced this issue Jun 22, 2016
It causes AllKeysChan not to return all keys, which breaks bloom filter.

License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protonmail.ch>
Kubuxu added a commit that referenced this issue Jun 23, 2016
It causes AllKeysChan not to return all keys, which breaks bloom filter.

License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protonmail.ch>
@Kubuxu Kubuxu added the status/in-progress In progress label Jun 24, 2016
@kevina kevina mentioned this issue Jun 30, 2016
2 tasks
ariescodescream pushed a commit to ariescodescream/go-ipfs that referenced this issue Oct 23, 2021
Fixes ipfs#2601

Also bump version to 0.4.3-dev

License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
laurentsenta pushed a commit to laurentsenta/kubo that referenced this issue Feb 25, 2022
Fixes ipfs#2601

Also bump version to 0.4.3-dev

License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
laurentsenta pushed a commit to laurentsenta/kubo that referenced this issue Feb 25, 2022
Fixes ipfs#2601

Also bump version to 0.4.3-dev

License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
ariescodescream pushed a commit to ariescodescream/go-ipfs that referenced this issue Apr 7, 2022
Fixes ipfs#2601

Also bump version to 0.4.3-dev

License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards incompatible: repo kind/bug A bug in existing code (including security flaws) status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

3 participants