Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for identity hashes. #4910

Closed
wants to merge 5 commits into from
Closed

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Apr 3, 2018

Still a work in progress.

Tested by hand.

Needs shareness tests.

Closes #4697

I am breaking this p.r. into two parts. For part one see #5117, part two is pending.

@kevina kevina requested a review from Kubuxu as a code owner April 3, 2018 08:19
@ghost ghost assigned kevina Apr 3, 2018
@ghost ghost added the status/in-progress In progress label Apr 3, 2018
@@ -189,14 +190,16 @@ func setupNode(ctx context.Context, n *IpfsNode, cfg *BuildCfg) error {
opts.HasBloomFilterSize = 0
}

cbs, err := bstore.CachedBlockstore(ctx, bs, opts)
wbs, err := bstore.CachedBlockstore(ctx, bs, opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does w in wbs stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"wrapped"

}

func (b *idstore) PutMany(bs []blocks.Block) error {
var good []blocks.Block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd preallocate this to be the size of len(bs) as in most cases most blocks will be in this array. I'd also call it toPut or something like that, good sounds weird to me in this context

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
// idstore wraps a BlockStore to add support for identity hashes
type idstore struct {
bs bls.Blockstore
}

func IdStore(bs bls.Blockstore) bls.Blockstore {
func IdStore(bs bls.Blockstore) *idstore {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for exporting the implementation structure instead of the interface? It is generally better to export the interface as then it is clear what contract it follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Changed back.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
@kevina
Copy link
Contributor Author

kevina commented Apr 5, 2018

Note: ipfs add --hash=id is problematic due the implicit creation of directory entry:

$ ipfs add --hash=id 64bytes
zB7EoQcn4hmgrzo7SKs4Dn9JxnqRnzMqopdjakoog5EQPgziUnDKQrKCFEkytjoN8TgZnXGt8ndvMwkmzQEXJxq8bgDkQ
Error: identity hash of 87 bytes longer than maximum size of 64 bytes

The error is for the wrapping directory entry, not the hash for the 64byte file. This works:

$ ipfs get zB7EoQcn4hmgrzo7SKs4Dn9JxnqRnzMqopdjakoog5EQPgziUnDKQrKCFEkytjoN8TgZnXGt8ndvMwkmzQEXJxq8bgDkQ > out

In particular for the ability to use the identity hash when the data
is less than a certain threshold.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
@kevina
Copy link
Contributor Author

kevina commented Apr 13, 2018

@Stebalien @whyrusleeping since id-hashes are primary useful as an optimization to inline tiny files I want ahead and added an option to add. This change touched a lot of files since it involved changing passing around the cid Prefix with a new CidOpts that included an option to switch to use the id-hash if the contents was less than a certain size. Then instead of using Prefix.sum() CidOpts.Sum() was used.

The add option now has a --id-hash-limit option to control when the id hash is used, if you like it I can add it to ipfs files command also.

If you are okay with the Prefix to CidOpts part I would like to get that in ASAP to avoid conflicts, or worse, before some of that code gets factored out into a separate package.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this. The CidOpts struct really rubs me the wrong way but I can't think of a better way to thread this through while keeping it configurable. Maybe push it down lower (into go-block-format)?

Does this only apply to raw blocks? Ideally, it would apply to all IPLD objects.

@@ -14,6 +14,7 @@ import (
dag "github.com/ipfs/go-ipfs/merkledag"
dagtest "github.com/ipfs/go-ipfs/merkledag/test"
mfs "github.com/ipfs/go-ipfs/mfs"
cide "github.com/ipfs/go-ipfs/thirdparty/cidextra"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is temporary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More or less, I would like to get something working first, merge this p.r. and then worry about finding a better place for the two new packages in thirdparty.

@@ -119,6 +121,7 @@ You can now check what blocks have been created by:
cmdkit.BoolOption(fstoreCacheOptionName, "Check the filestore for pre-existing blocks. (experimental)"),
cmdkit.IntOption(cidVersionOptionName, "CID version. Defaults to 0 unless an option that depends on CIDv1 is passed. (experimental)"),
cmdkit.StringOption(hashOptionName, "Hash function to use. Implies CIDv1 if not sha2-256. (experimental)").WithDefault("sha2-256"),
cmdkit.IntOption(idHashLimitOptionName, "Id hash maxium size. -1 disables. (experimental)").WithDefault(-1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I wouldn't make this this configurable. I'd have a fixed option and set the size to 64 bytes. I doubt there are any valid reasons for picking a non-standard value here (other than, maybe choosing between 32 bytes and 64 bytes). I would not allow more than 64 bytes as CIDs are supposed to be small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will add an "--inline" option and keep this option, but set the default to 64.

Note there is an internal check that prevents this value from being set to the maximum id-hash size which is currently 64.


type Opts struct {
cid.Prefix
idHashThres int // 0 disables, otherwise 1 + limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the +1? Zero initialize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to allow for zero initialization. As the type is exported there is nothing preventing the user from direct initialization.

return &idstore{bs}
}

func extractContents(k *cid.Cid) (bool, []byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably want to move this (and maybe a few other things) to, e.g., go-block. We'll want this stuff usable outside of go-ipfs.

Copy link
Contributor Author

@kevina kevina Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier comment, I want to first worry about getting something working and then we can move the my two new thirdparty packages to a better location.

@kevina
Copy link
Contributor Author

kevina commented Apr 16, 2018

Does this only apply to raw blocks? Ideally, it would apply to all IPLD objects.

It also applies to protobuf objects.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
@Kubuxu
Copy link
Member

Kubuxu commented Apr 17, 2018

We are cutting down thirdparty packages. The CID security was a huge exception from that because we needed to have a possible security hole fixed before the release and libp2p dependency tree was causing problems.

I need to object introduction of code to thirdparty.

@kevina
Copy link
Contributor Author

kevina commented Apr 17, 2018

@Stebalien @Kubuxu the intent of both thirdparty packages where for them to be short lived. Both of them I intent to merge in another package outside of go-ipfs, but I thought it would be easier to both code and review if all the relevent changes are in one p.r. rather than spread out over several. If you will not accept or even review this p.r. since it introduces thirdparty packages please let me know and I will first get those changes into an appropriate external package.

@mib-kd743naq
Copy link
Contributor

@kevina in the meantime should it be possible for me to checkout f0a4f29 and build it locally to verify whether the web gateway will read my 3rd-party blocks with identity data mixed in?

Basically asking whether I should expect that specific commit to just work as far as reading identity-containing blocks is concerned?

@kevina
Copy link
Contributor Author

kevina commented Apr 17, 2018

@mib-kd743naq yes it should be. Before this commit id-hashes where actually supported, they just where not optimized.

@kevina
Copy link
Contributor Author

kevina commented Apr 17, 2018

@Kubuxu @whyrusleeping @Stebalien since id hashes are possible to use without this p.r. there is the very real possibility that blockes with an id-hash could be stored in the datastore and never removed. I made the Delete always pass though the request so the g.c. should in theory remove them if they are not pinned, but this needs testing. For simplicity I propose we add a config option DisableIdHashOpt that is intended to only be used for testing, I can then use this config option to write a sharness test that inject an id hash and then enable the option and see if the GC can remove it. The alternative is to write a unit test that somehow bypasses the id-store and then calls the GC to see if it is somehow removed, this will likely be more work and I imagine that the config option would be useful in other contexts.

@magik6k
Copy link
Member

magik6k commented Apr 18, 2018

Why not just add that option on the idstore struct and write a small go test for it?

@kevina
Copy link
Contributor Author

kevina commented Apr 25, 2018

@magik6k I could do that, but I think I a better way is to just remove all id hashes from the datastore as part of a migration, see #4766.

@mib-kd743naq
Copy link
Contributor

@kevina in the meantime should it be possible for me to checkout f0a4f29 and build it locally to verify whether the web gateway will read my 3rd-party blocks with identity data mixed in?

@mib-kd743naq yes it should be. Before this commit id-hashes where actually supported, they just where not optimized.

Great stuff! You are indeed correct that identity hashes already work. E.g. this is a single block:
https://ipfs.io/ipfs/zdjA8o9D3n7NhdXGP5QZRftEyATq8pJamtFVQhQN4VHriUgR8

There is a bit of work to be done around the CLI abort however. Observe:

~/inlines$ ipfs ls zdjA8e3nEZNdas7TMDHEJuQB9r1RDQt9NstfH8AtYmetNzQCd
zdjA8dEswZqPMYnazeqZfj679FAH5dSti2R4zMq54AxHXT4k7 122 IllegalOverlargeInline/
zdjA8o9D3n7NhdXGP5QZRftEyATq8pJamtFVQhQN4VHriUgR8 32  LegitInlines/
~/inlines$ ipfs get zdjA8e3nEZNdas7TMDHEJuQB9r1RDQt9NstfH8AtYmetNzQCd/LegitInlines; echo "ExitCode: $?"
Saving file(s) to LegitInlines
 122 B / 122 B [=========================================================================================================================================================================================] 100.00% 0s
ExitCode: 0
~/inlines$ ipfs get zdjA8e3nEZNdas7TMDHEJuQB9r1RDQt9NstfH8AtYmetNzQCd/IllegalOverlargeInline; echo "ExitCode: $?"
Saving file(s) to IllegalOverlargeInline
 279 B / 279 B [=========================================================================================================================================================================================] 100.00% 0s
ExitCode: 0
~/inlines$ ls -la
total 0
drwxr-xr-x  3 ipfs ipfs  26 Apr 25 12:04 .
drwxr-xr-x 18 ipfs ipfs 435 Apr 25 12:03 ..
drwxr-xr-x  2 ipfs ipfs  50 Apr 25 12:04 LegitInlines

The ipfs get did not exit with non-zero, yet it ( correctly ) did not create a local copy due to an invalid 122byte inline.

Thoughts?

@kevina
Copy link
Contributor Author

kevina commented Jun 1, 2018

@Stebalien @Kubuxu

We'll probably want to move this (and maybe a few other things) to, e.g., go-block. We'll want this stuff usable outside of go-ipfs.

go-blocks doesn't exist. I will likely combine the two thirdparty into a single package go-ipfs-idstore.

@Stebalien
Copy link
Member

@kevina sorry, I meant go-block-format. However, a separate package is fine.

@kevina
Copy link
Contributor Author

kevina commented Jun 7, 2018

@Stebalien I voted against a separate package and instead added the idstore to go-ipfs-blockstore (see ipfs/go-ipfs-blockstore#4). The parts needed for supported id hashes automatically I think best belong in go-cid although I can be convinced otherwise. I am not sure either part belong in go-block-format.

@Stebalien
Copy link
Member

Fair enough (given that I've reviewed that, I should have remembered this...).

The parts needed for supported id hashes automatically I think best belong in go-cid although I can be convinced otherwise.

A ExtractInlineBlock(cid) Block function would have to go in go-block-format but a simple ExtractInlineData(cid) []byte could go in go-cid. Up to you.

@kevina
Copy link
Contributor Author

kevina commented Jun 7, 2018

I didn't think about creating those functions because they are not really needed (but can be useful). What I think should go in go-cid is a renamed Opts (see https://github.com/ipfs/go-ipfs/blob/f0a4f297fc095b14aef58ded32864e7fe89f3974/thirdparty/cidextra/cidextra.go) to automatically create inline identity-hashes when below the threshold. This relates to by comments in ipfs/go-cid#23 (comment). I am now think that maybe this should be called Format.

@Stebalien
Copy link
Member

Ah. Yeah, good point. And making this a part of Format or whatever we end up calling that removes my one reservation with this patch.

@kevina
Copy link
Contributor Author

kevina commented Jun 14, 2018

Closing in favor of #5117.

@kevina kevina closed this Jun 14, 2018
@ghost ghost removed the status/in-progress In progress label Jun 14, 2018
@Stebalien Stebalien deleted the kevina/idhash branch February 28, 2019 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants