In datastote.NewKey calling path.Clean is dangerous #2601

kevina · 2016-04-25T00:48:23Z

[I am not sure the best place to post this as this issue is related to both go-ipfs and go-datastore, I can move this to this to go-datastore if necessary.]

I may be completely missing something here but it seams datastore keys are represented via binary []byte strings, possible prefixed with a "/" and are converted to a datastore.Key by calling datastore.NewKey().

However, datastore.NewKey() also calls path.Clean(). This is a dangerous thing to do with binary strings. For example the binary []byte representation of the hash might just happen to contain a "//" which path.Clean will convert to a "/". In fact it seams that all of datastore.Key is written with the assumption the keys are text and not binary blobs.

Here is the relevant code in go-datastore/key.go:

// NewKey constructs a key from string. it will clean the value.
func NewKey(s string) Key {
    k := Key{s}
    k.Clean()
    return k
}

// Clean up a Key, using path.Clean.
func (k *Key) Clean() {
    k.string = path.Clean("/" + k.string)
}

The probability of a []byte string for a multihash containing something that path.Clean() will remove or change is low, which is why I image this issue has not come up yet.

I am not sure the best way to fix this. Maybe it might be better to keep the mutihash in its b58 encoding to mesh better with how datastore.Key was (supposedly) designed. Or as a quick fix, maybe it would be better to remove the call to path.Clean in the datastore.Key.Clean() method.

The text was updated successfully, but these errors were encountered:

whyrusleeping · 2016-04-25T16:24:54Z

@kevina yeah, this is a known issue: #994

Its the cause of the following failure (and a few others i'm not finding right now).
https://github.com/ipfs/go-ipfs/blob/master/test/sharness/t0080-repo.sh#L112

I'd say we can keep this issue open as it will add more pressure towards us getting this fixed.

whyrusleeping · 2016-04-25T17:02:06Z

I should also note that currently this is much less of an issue than you would think. Since the mangling of keys is consistent, we can always access the keys we like we expect. The only noticeable issue this causes is that when running a gc, any key affected by this bug won't be removed. (also you won't be able to see keys affected by this in an ipfs refs local)

kevina · 2016-04-26T22:13:01Z

There is also the small possibility that this bug will create a collision when there normally would not be one, although the probability will be low. For example lets's assume we have these two (8-byte) keys:

key1: 0x34 0x7b '/' '/' 0x7e 0xf3 '/' 0x23
key2: 0x34 0x7b '/' 0x7e 0xf3 '/' '/' 0x23

Both keys will get mangled to 0x34 0x7b '/' 0x7e 0xf3 '/' 0x23 and thus create a collusion even though the keys are different. I am not sure how IPFS will behave in this situation.

The probability of this happening with 32 byte keys is extremely low and I doubt I could create a collision. But the possible still exists and it is certainly higher than the probability of a SHA-256 collision.

kevina · 2016-06-21T02:04:57Z

@whyrusleeping would there be any performance degradation to just using the base58 encoding for datastore keys? This will mesh much better with how go-datastore was designed.

The flatfs datastore might need some special case code to decode the base58 encoding and re-encoding as a hex string.

This will of course introduce a repo change so I image that we should coordinate the change with a major release.

I am against just remove the call to path.Clean because mixing ASCII paths with binary keys is just a bad idea in my view. It could be made to work if we are very careful, but it not something I would be happy with at all.

whyrusleeping · 2016-06-21T02:16:56Z

@kevina there is a huge performance degradation using base58 encoding for the datastore keys.
I think the solution should be to just disallow binary paths in the datastore code in the first place. Using something like hex or base64 (both of which would not incur a perf hit) to encoded keys.

The question is at what layer do we do the encoding? Do we force users to sanitize their inputs? or do we force something on all inputs?

I'm personally in favor of requiring the user to sanitize all inputs, otherwise, if we always encode inputs, we won't be able to easily see paths in the keys, /blocks/ABC/ becomes something weird like f7a6db728c9eff5d

kevina · 2016-06-21T02:28:33Z

I am in favor of using base64 encoding then as it will be shorter and can be used directly in the leveldb (and filestore) datastore. It will only be the flatfs datastore that really would need to concern itself with how the keys are encoded.

I am not 100% sure I am following you but I would think that datastore.NewKey() should fail if it sees binary data (anything in the range 0x00 to 0x31). Is this what you mean by force users to sanitize their inputs?

whyrusleeping · 2016-06-21T02:32:54Z

@kevina Yeah, that would be one way to force users to sanitize input. Although that has the annoying side effect of making NewKey return an error (requiring a lot of code churn). We could also enforce this in places where it will hurt us (like flatfs) and document it clearly elsewhere

kevina · 2016-06-21T02:41:05Z

@whyrusleeping Binary keys are used all over the place, not just in the /blocks prefix. It is leading to clear bugs in flatfs, but it could lead to problems elsewhere. Here is some of the prefixed used with binary data:

/ipns/<binary>
/local/pins/indirect/keys/<binary>
/pk/<binary>

As a temporary measure we could panic instead to catch all possible binary encoding in addition to having flatfs failing when it sees a key not in base64 encoding.

whyrusleeping · 2016-06-21T03:34:58Z

@kevina all of those are places where we are passing a hash value into the key name. Those should be pretty easy to find and eliminate. I think that is a fairly easy step forward, but it will require a repo version bump and another migration.

kevina · 2016-06-21T03:45:17Z

@whyrusleeping I agree. Its the migration step that would be the biggest blocker.

Do you think base64 is better than hex? The only place hex would be better would be that the flatfs datastore won't need to reencode, but it would still need to check the keys are in hex.

For the filestore I would prefer base64. We can use the encoding/base64 package.

I can likely help with this once we decide on what to do.

whyrusleeping · 2016-06-21T04:56:56Z

Unless we encounter significant opposition, heres the plan moving forward:

We keep the path.Clean in datastore.NewKey, path based keys are nice to work with and reason about, and also map nicely to bucket oriented backends.
We require users to use valid 'path' characters in the strings they use as keys. This can be enforced in one of two ways:
- in NewKey. I don't like this as much, but it might be cleaner overall. This will require changing all current invocations to also return an error.
- In datastore implementations that care about this via some sort of key.Validate() method. This is the lower effort and quickest integration route (quick and dirty?)
Change all places in go-ipfs where we create binary keys to first base64 encode the keys before passing into NewKey
write a migration for the new on disk pathing format. This will be a little difficult because due to the path.Clean bug, we cant directly just 'move' values.
- for /blocks/ We will have to read each block off disk, check its hash, and then write it to the new datastore.
- the /pk/ namespace can be done similarly.
- For the /ipns/ namespace, it gets a little trickier. I don't remember if we have an easy way to check the correct key from the stored record. We may have to just toss values that are affected by the bug (this is okay, they expire every so often anyways)
- for pins, its going to get interesting. We may have to go through every bug affected key in the datastore and build a mapping between the actual key and its base58 mangled counterpart and use that to correctly transfer pins.
The new migration will need to be built and pushed up to dist.ipfs.io

Very nice to have:

set up ipfs to automatically download and run migrations for the user. Its inconvenient to have to require our users to go download some binary from our webpage and run it on their own.
If 'fully automatic' isnt the way to go, we should at the very least have a yes/no prompt with detailed documentation on whats going on.

whyrusleeping · 2016-06-21T04:58:24Z

cc @kevina @Kubuxu @jbenet

kevina · 2016-06-21T06:26:17Z

Note: As an implementation detail we should use base64.RawURLEncoding for the base64 encoding. (1) The Standard encoding (see https://tools.ietf.org/html/rfc4648#section-4) includes '+' and '/' in the alphabet, the URLEncoding (often called the "base64url" encoding, see https://tools.ietf.org/html/rfc4648#section-5) uses '-' and '_' instead. (2) The Raw form does not include the unnecessary '=' padding character at the end. This is implied as optional in section 5 of the RFC, also goggling "base64url" will show that the padding is not used.

whyrusleeping · 2016-06-21T06:29:31Z

@kevina good point. I just noticed that i used the standard encoding in my other PR just now.

whyrusleeping · 2016-06-22T20:03:37Z

wow. The key encoding change was literally a one-liner.

EDIT: okay, maybe there were two lines we needed. But still, way simpler than i expected

It causes AllKeysChan not to return all keys, which breaks bloom filter. License: MIT Signed-off-by: Jakub Sztandera <kubuxu@protonmail.ch>

Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <why@ipfs.io>

kevina mentioned this issue May 1, 2016

[WIP] Towards Issue #875, avoid duplicating files added to ipfs #2600

Closed

whyrusleeping mentioned this issue Jun 21, 2016

Write providers to disk to avoid memory leaks #2860

Merged

whyrusleeping added this to the Ipfs 0.4.3 milestone Jun 21, 2016

Kubuxu added kind/bug A bug in existing code (including security flaws) backwards incompatible: repo labels Jun 21, 2016

Kubuxu added a commit that referenced this issue Jun 22, 2016

Disable Bloom Filter as for #2601

17876d9

It causes AllKeysChan not to return all keys, which breaks bloom filter. License: MIT Signed-off-by: Jakub Sztandera <kubuxu@protonmail.ch>

whyrusleeping mentioned this issue Jun 23, 2016

Add ARC caching and bloom filter for blockstorage #2885

Merged

Kubuxu added a commit that referenced this issue Jun 23, 2016

Disable Bloom Filter as for #2601

e3d9380

It causes AllKeysChan not to return all keys, which breaks bloom filter. License: MIT Signed-off-by: Jakub Sztandera <kubuxu@protonmail.ch>

Kubuxu added the status/in-progress In progress label Jun 24, 2016

whyrusleeping mentioned this issue Jun 24, 2016

encode keys to datastore with base32 standard encoding #2903

Merged

4 tasks

kevina mentioned this issue Jun 30, 2016

pinning is broken #1051

Closed

2 tasks

whyrusleeping closed this as completed in 0782c4d Jul 1, 2016

jbenet mentioned this issue Aug 28, 2016

Code Review 0.4.2..0.4.3-rc3 #3138

Closed

58 tasks

kevina mentioned this issue Aug 31, 2017

Provide Way of Representing CidV0 objects in an alternative base ipfs/go-cid#34

Closed

kevina mentioned this issue May 8, 2018

User-friendly keys ipfs/go-ds-flatfs#40

Closed

ariescodescream pushed a commit to ariescodescream/go-ipfs that referenced this issue Oct 23, 2021

encode keys to datastore with base32 standard encoding

7c91e7b

Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <why@ipfs.io>

laurentsenta pushed a commit to laurentsenta/kubo that referenced this issue Feb 25, 2022

encode keys to datastore with base32 standard encoding

c2b2970

Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <why@ipfs.io>

laurentsenta pushed a commit to laurentsenta/kubo that referenced this issue Feb 25, 2022

go-ipfs-config: encode keys to datastore with base32 standard encoding

d9791e8

Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <why@ipfs.io>

ariescodescream pushed a commit to ariescodescream/go-ipfs that referenced this issue Apr 7, 2022

encode keys to datastore with base32 standard encoding

0c87745

Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <why@ipfs.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In datastote.NewKey calling path.Clean is dangerous #2601

In datastote.NewKey calling path.Clean is dangerous #2601

kevina commented Apr 25, 2016 •

edited

Loading

whyrusleeping commented Apr 25, 2016

whyrusleeping commented Apr 25, 2016

kevina commented Apr 26, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016 •

edited

Loading

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

whyrusleeping commented Jun 22, 2016 •

edited

Loading

In datastote.NewKey calling path.Clean is dangerous #2601

In datastote.NewKey calling path.Clean is dangerous #2601

Comments

kevina commented Apr 25, 2016 • edited Loading

whyrusleeping commented Apr 25, 2016

whyrusleeping commented Apr 25, 2016

kevina commented Apr 26, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016 • edited Loading

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

kevina commented Jun 21, 2016

whyrusleeping commented Jun 21, 2016

whyrusleeping commented Jun 22, 2016 • edited Loading

kevina commented Apr 25, 2016 •

edited

Loading

kevina commented Jun 21, 2016 •

edited

Loading

whyrusleeping commented Jun 22, 2016 •

edited

Loading