Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove hex encoding from flatfs #39

Merged
merged 3 commits into from
Jun 27, 2016
Merged

remove hex encoding from flatfs #39

merged 3 commits into from
Jun 27, 2016

Conversation

whyrusleeping
Copy link
Member

with the changes being made to go-ipfs to base64 encode all keys (previously they were binary). It is safe to assume that inputs to flatfs are path safe.

This PR isnt strictly necessary, but it avoids an extra layer of encoding

cc @Kubuxu @kevina

@kevina
Copy link
Contributor

kevina commented Jun 23, 2016

@whyrusleeping base64url encoding will unfortunately not work on case insensitive filesystems. You could just hex encode the raw key but that will lead to an overly long file name. This is the one place in the datastore that I believe should be special cased. It should decode the base64url key and reencode it to hex. The end result should be that the filenames will remain unchanged unless the block was effected by the path.Clean bug.

@Kubuxu
Copy link
Member

Kubuxu commented Jun 23, 2016

Then we can use base32 encoding in go-ipfs to make sure that paths are safe for case insensitive file systems or do hex in go-ipfs.

base32 saves 20% over hex (hex gives increase of 200% and base32 gives 160%).

@kevina
Copy link
Contributor

kevina commented Jun 23, 2016

I would prefer we special case it here, rather than increasing the key size everywhere with base32. (base64 only increases the size by 133%) This could have an impact if we use lots of keys in the datastore such as with ipfs/kubo#2860. The datastore key size could also impact the size of the filestore database.

The flatfs filesystem is already special cased in a way to expect block keys, so I don't see it as a major problem if we special case it to expect base64url encoded block store keys. This will also have a side effect of enforcing that the keys are are in the correct format.

@Kubuxu
Copy link
Member

Kubuxu commented Jun 23, 2016

Yeah, but we could encode them in base32 in go-ipfs and save them straight to disk in that form. No re-encoding.

If we do base64 to hex on disk we have 160% * 200% = 320% increase, if we do base64 decode re-encode to base32 or hex we are encoding two times.

I can't see how little bit of RAM usage, while saving the key (+20%, which gives 20 bytes more on stack on average) outweighs fact that we would have to do encode, decode, and encode again.

@kevina
Copy link
Contributor

kevina commented Jun 23, 2016

@whyrusleeping what do you think? Your pull ipfs/kubo#2860 request will be most likely effected by the key size increase. I have stated my preference and so has @Kubuxu.

The wiki article (https://en.wikipedia.org/wiki/Base32) nicely sums up the advantages and disadvantages over Base32 vs Base64 or hex.

@whyrusleeping
Copy link
Member Author

@Kubuxu @kevina. Do you both agree to the changes here? where flatfs will not apply any extra encoding to the passed in keys?

@whyrusleeping
Copy link
Member Author

This PR is blocking pretty much everything else at this point ( blocking the migrations code which is blocking the encoding fixing which is blocking a bunch of other PRs) So lets make a decision ASAP

@Kubuxu
Copy link
Member

Kubuxu commented Jun 26, 2016

I think it is good idea. We might want to have it check that path contains only valid characters if Debug mode is active.

@kevina
Copy link
Contributor

kevina commented Jun 27, 2016

I don't particular agree that base32 is the best way to go. But I don't strongly object either. I can understand the simplicity of avoiding the re-encoding in the flatfs datastore; however, you still have to manipulate the key to get ride of the leading '/'.

My preference would of been to avoid a massive renaming of flatfs files, but it is not that big of a deal.

I also don't particular like the inclusion of the pad character, but it seams the standard go library doesn't not provide an option to encode it without. Also I looked it up and '=' is a valid character in windows filename (but someone should double test this). The '=' might cause problems if unquoted in Unix.

For reference here are the key sizes of a sha256 multihash in various encoding:

base16 = 68 bytes
base32 = 55 + 1(pad) = 56
base64 = 46 + 2 (pad) = 48
raw = 34 bytes

@Kubuxu
Copy link
Member

Kubuxu commented Jun 27, 2016

@kevina I also don't like fact that key are bigger but we have to encode them to something. It can be base32 or Hex, in memory most keys are stored in binary, only on the datastore side they will be stored in base32.

In FAT* filesystems = is reserved. We should just remove the padding.

@daviddias
Copy link
Member

subscribing to this issue, so that I can migrate js-ipfs-repo to do the same.

@kevina
Copy link
Contributor

kevina commented Jun 27, 2016

@Kubuxu like I said, I don't strongly object to just using base32 (as oppose to base64url).

To get rid of the padding we will either need to write our own encode/decoder for base32 or remove and add back the '=' manually. If we don't add the '=' back we will get a "error: illegal base32 data at input byte 32".

@kevina
Copy link
Contributor

kevina commented Jun 27, 2016

@whyrusleeping if we decide we want to remove the '=' in the base32 encoding, I can handle that later today. I already spent some time thinking about what needs to be done.

@whyrusleeping
Copy link
Member Author

The failing tests are because some of the datastore impls dont have everything vendored.

@Kubuxu
Copy link
Member

Kubuxu commented Jun 27, 2016

I didn't do vendoring for packages for testing so we would have to install packages from go for testing.

@whyrusleeping
Copy link
Member Author

the flatfs code is vendored properly

@Kubuxu
Copy link
Member

Kubuxu commented Jun 27, 2016

Tests pass locally.

@whyrusleeping
Copy link
Member Author

RFM?

@Kubuxu
Copy link
Member

Kubuxu commented Jun 27, 2016

RFM, I will look into fixing tests in separate PR.

@whyrusleeping whyrusleeping merged commit ba62308 into master Jun 27, 2016
@whyrusleeping whyrusleeping deleted the feat/unhex-flatfs branch June 27, 2016 19:33

func (fs *Datastore) encode(key datastore.Key) (dir, file string) {
safe := hex.EncodeToString(key.Bytes()[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping, note that the point of key.Bytes()[1:] this was to remove the '/' prefix'...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants