[WIP] Filestore Implementation #2634

kevina · 2016-05-06T05:36:05Z

Closes Issue #875, avoid duplicating files added to ipfs

NOT READY FOR MERGE

Rebased #2600 on master.

Quicklinks: Code, README,

TODO to get this merged:

Rebase on master turning the chain of commits into a reasonable ChangeSet
Agree on major infrastructure changes
- Multi-blockstore (Discussion: Support for multiple blockstores. #3119)
Separate out non-filestore bits of infrastructure change into there own pull request
- Multi-blockstore: WIP: Add support for multiple blockstores #3257
- DAGService.GetLinks: ~~Add DAGService.GetLinks() method and use it in the GC and elsewhere. #3255~~
- Others: see milestone Filestore implementation
Code review
Merge

Note The filestore is very basic right now, but it is functional. I will likely continue to improve and submit new pull requests for the enhanced functionally but right now I fell it is important a basic implementation gets in so that it will get used, it can be labeled as an experimental feature and disabled by default, but available for those that want to use it. I consider the code production ready.

Resolves #875

whyrusleeping · 2016-05-06T16:16:37Z

blocks/blocks.go

@@ -19,48 +20,61 @@ type Block interface {
 	Loggable() map[string]interface{}
 }

-// Block is a singular block of data in ipfs
-type RawBlock struct {
+type BasicBlock struct {


aww, renaming this from RawBlock to BasicBlock makes the diff so noisy... I was hoping to avoid that by getting that change merged separately

Sorry, I missed that change as it was in a separate commit. Let me know if you know if you want me to do anything about it, like submitting this change separately.

@whyrusleeping, this rename is only in one file, it is exported because I might eventually refactor my Filestore specific changes to blocks/blocks.go into another file.

kevina · 2016-05-10T04:56:03Z

I imagine that this is going to take a while before it gets merged. So for now I think it will be best to maintain this as a separate fork while the goal of eventually merging everything in.

@whyrusleeping I will still like to work with you on API issues and work to get major API changes merged first to avoid stagnation.

I have created a README for the new filestore available here https://github.com/ipfs-filestore/go-ipfs/blob/kevina/filestore/filestore/README.md some notes on my fork are available here https://github.com/ipfs-filestore/go-ipfs/wiki.

@jefft0 if you are interested I could use some serious testing now. Others all of course welcome to test it out.

whyrusleeping · 2016-05-10T22:09:42Z

commands/files/readerfile.go

 }

-func NewReaderFile(filename, path string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile {
-	return &ReaderFile{filename, path, reader, stat}
+func NewReaderFile(filename, path, abspath string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile {


this can probably be changed so that it only accepts the absolute path instead of a filename, path and absolute path.

(and a similar change can be made throughout this part of the codebase)

@whyrusleeping I added absolute path, because I was unsure how FullPath was used in the code base. Is it acceptable for full path to be an absolute path? I am unsure how FullPath and "add -r" interact.

To be honest, i'm not entirely sure either. But that said, our tests in that area are quite thorough, If you change something, go into test/sharness, run make deps and then run ./t0040-add.sh, and maybe some of the other t004* tests too to make sure nothing weird happened

All right I will start by trying to change FullPath to an absolute path add see what happens when used with "add -r". I will make sure the contents of the directory objects are correct.

It turns our that when adding directories the directory is part of the filename. For example after adding some debug output here is the result of one of the test:

expecting success: mkdir mountdir/planets && echo "Hello Mars!" >mountdir/planets/mars.txt && echo "Hello Venus!" >mountdir/planets/venus.txt && ipfs add -r mountdir/planets >actual Filename: planets/mars.txt Path: mountdir/planets/mars.txt Filename: planets/venus.txt Path: mountdir/planets/venus.txt ok 42 - 'ipfs add -r' succeeds

Thus we still need to pass in both a filename and a path. It seams okay to just make the path absolute, so I will do that I remove the change that adds an absolute path to the various parts in the files package.

We could extract the filename from the passed in path though

I tried that and it is not going to work. This test will fail. The filename needs to be "planets/mars.txt" if you try to extract it it will be just "mars.txt". You will get this:

expecting success: mkdir mountdir/planets && echo "Hello Mars!" >mountdir/planets/mars.txt && echo "Hello Venus!" >mountdir/planets/venus.txt && ipfs add -r mountdir/planets >actual ok 42 - 'ipfs add -r' succeeds expecting success: PLANETS="QmWSgS32xQEcXMeqd3YPJLrNBLSdsfYCep2U7CFkyrjXwY" && MARS="QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN" && VENUS="QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4" && echo "added $MARS planets/mars.txt" >expected && echo "added $VENUS planets/venus.txt" >>expected && echo "added $PLANETS planets" >>expected && test_cmp expected actual > diff -u expected actual --- expected 2016-05-11 03:20:12.742743658 +0000 +++ actual 2016-05-11 03:20:12.466738156 +0000 @@ -1,3 +1,2 @@ -added QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN planets/mars.txt -added QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4 planets/venus.txt -added QmWSgS32xQEcXMeqd3YPJLrNBLSdsfYCep2U7CFkyrjXwY planets +added QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN mars.txt +added QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4 venus.txt not ok 43 - 'ipfs add -r' output looks good # # PLANETS="QmWSgS32xQEcXMeqd3YPJLrNBLSdsfYCep2U7CFkyrjXwY" && # MARS="QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN" && # VENUS="QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4" && # echo "added $MARS planets/mars.txt" >expected && # echo "added $VENUS planets/venus.txt" >>expected && # echo "added $PLANETS planets" >>expected && # test_cmp expected actual #

Here is a quick patch to try, it might not apply cleanly for you though.

diff --git a/commands/files/readerfile.go b/commands/files/readerfile.go index 2508fe1..d7f5443 100644 --- a/commands/files/readerfile.go +++ b/commands/files/readerfile.go @@ -4,6 +4,7 @@ import ( "errors" "io" "os" + gopath "path" ) // ReaderFile is a implementation of File created from an `io.Reader`. @@ -17,7 +18,11 @@ type ReaderFile struct { baseInfo ExtraInfo } -func NewReaderFile(filename, path string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile { +func NewReaderFile(_, path string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile { + filename := path + if path != "" { + filename = gopath.Base(path) + } return &ReaderFile{filename, path, reader, stat, 0, PosInfo{0, path}} }

whyrusleeping · 2016-05-10T22:31:30Z

@kevina Yeah, theres a LOT here, so its going to take some time. I don't disagree with you maintaining a fork in the meantime. That said, as I find time my plan is to pick out parts of the code that can be merged up easily. I think the next candidate for that is the commands/files code refactor to use absolute paths.

kevina · 2016-05-19T22:28:15Z

@whyrusleeping, One idea I had which might avoid some special casing is to replicate what is done with add --only-hash

                if hash {
                        nilnode, err := core.NewNode(n.Context(), &core.BuildCfg{
                                //TODO: need this to be true or all files
                                // hashed will be stored in memory!
                                NilRepo: true,
                        })
                        ...
                        n = nilnode
                }

That is create a special "node" for adding files to the filestore. This could help avoid special case code in at least a few places. It will at minimal avoid the special case code in the blockstore package as I can use my own implementation.

Do you think this is an idea worth pursuing? I image the special node will share many of the data structures from the real node to avoid any data race or related issues.

whyrusleeping · 2016-05-20T01:12:08Z

@kevina I think that might be a good idea, my only concern is that in order to read the blocks out of the filestore would you still need that custom created node?

kevina · 2016-05-20T01:34:25Z

@whyrusleeping, no the custom node will be for adding only. We will still need some sort of multi-datastore for reading blocks. See my comments for #2747.

kevina · 2016-05-23T03:28:56Z

@whyrusleeping there is a lot going on in an Ipfs Node. What I need is not a mock node like the code that --only-hash uses but rather create a new Node that acts like a view into the original node except that add's are handled differently. I was not confident I could pull this if with a full Ipfs Node.

With that in mind, I refactored the adder code a little to bundle all the needed services into the DataServices struct see commit c41a47e. A proper interface that exports just methods needed by the adder would be nice, but I think this is a good start. This new interface could be used to simply a lot of code elsewhere, including the code for just computing the hashes, but I don't fully understand what is going on there so I left the code for add --hash-only alone.

I use this new interface to to create a DataServices view in commit e6bfbfa that has uses an alternative Blockstore and leaves the existing Blockstore and caching Blockstore wrapper alone. In commit b54f2f4 I factor out the filestore specific code from the DagServices and instead install small hooks.

In commit f9a51b1 I completely eliminated the AdvReader code you didn't like and instead computed the offset's in the DAG builder code (the Balanced builder to be specific, the Trickle builder doesn't support the filestore just yet).

All in all I cleaned up a lot of the special case code.

I you are okay with the changes in c41a47e I can separate that out into it's own pull request.

whyrusleeping · 2016-05-31T04:43:31Z

@kevina from irc, we can move towards making the NewAdder signature be: NewAddr(context.Context, bs bstore.Blockstore, pin pinning.Pinner, ds dag.DAGService) which eliminates the need for the partial initialization stuff

kevina · 2016-05-31T14:43:29Z

@whyrusleeping f5fb3c9 basically does what you ask for.

None of the other methods in the measure package return this error, instead they only call RecordValue() when the value is []byte. This change makes batch Put consistent with the other methods and allows non []byte data to be passed though the measure datastore. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

The datastore has an optional "advanced" datastore that handles Put requests for non []byte values, a "normal" datastore that handles all other put requests, and then any number of other datastore, some of them that can be designated read-only. Delete requests are passed on to all datastore not designed read-only. For now, querying will only work on a "normal" datastore. Note: Only tested in the case of just a "normal" datastore and the case of an "advanced" and "normal" datastore. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Create a new AdvReader interface that returns an ExtraInfo object that contains extra information about the the filehandle. This includes the current offset and the absolute path of a file. Also modify chunk.Splitter to return a Bytes struct in the NextBytes() method that in addition to the raw bytes returns the ExtraInfo object from the AdvReader that contains the current offset in the file (amoung other information). License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

…port. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Some Filestore clean operations are broken. Needs more tests. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Add --full-key option to "filestore ls" command. Add "ls" format option to "filestore verify". License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Also fix bug discovered in "verify". License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

This reverts commit 241d10d. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

rht · 2017-01-30T18:30:33Z

This sure is a feat, pulled singlehandedly on filestore, 👏 👏, @kevina! If this PR is carefully splitted into 10 PR's consisting of 10 commits each (if 50% have already been merged), this should be doable within a week (or two the longest).

whyrusleeping · 2017-03-06T19:46:29Z

🙌 🎆 #3629 has been merged, basic filestore functionality is in master. Go try it out :)

This was referenced May 6, 2016

[WIP] Towards Issue #875, avoid duplicating files added to ipfs #2600

Closed

avoid duplicating files added to ipfs #875

Closed

kevina force-pushed the kevina/filestore branch 2 times, most recently from 7edf207 to da43103 Compare May 6, 2016 09:28

whyrusleeping reviewed May 6, 2016
View reviewed changes

kevina force-pushed the kevina/filestore branch from da43103 to 9df24d3 Compare May 9, 2016 06:24

whyrusleeping reviewed May 10, 2016
View reviewed changes

whyrusleeping added kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress labels May 13, 2016

kevina force-pushed the kevina/filestore branch 2 times, most recently from e8c4a1b to 2d09e9f Compare May 18, 2016 02:56

This was referenced May 18, 2016

Migrating away from Godeps #2722

Closed

Discussion: Support for Multiple Datastores #2747

Closed

kevina mentioned this pull request May 24, 2016

Cannot restart daemon after files API ops and repo gc #2698

Closed

kevina force-pushed the kevina/filestore branch from f5fb3c9 to 506bc95 Compare June 1, 2016 21:25

kevina mentioned this pull request Jun 1, 2016

Stable bits from Pull Request #2634 #2792

Merged

kevina force-pushed the kevina/filestore branch from 506bc95 to b0ce621 Compare June 1, 2016 22:11

kevina added 3 commits June 1, 2016 22:17

kevina added 16 commits November 4, 2016 01:41

Filestore: Move custom version of filepath.Clean() into its own package.

77bf0c3

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Use the CidToDsKey form ds-help package.

f3d5399

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Implement Key struct for future multi DataObj per hash sup…

3a2b2a0

…port. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Move snapshot related code into its own file.

b028088

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Remove support for old format and Upgrade command.

c69cd67

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Prep work for multi DataObj per hash support.

cdd0b69

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Enhance Key structure.

b0e2d0d

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Basic support for multiple DataObjs per hash

4bd48f8

Some Filestore clean operations are broken. Needs more tests. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Fix maintenance commands.

4da34e4

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Remove seperate veirfy-post-orphan command.

6d292b8

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: enhance "ls" and "verify" command

dfc5342

Add --full-key option to "filestore ls" command. Add "ls" format option to "filestore verify". License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Enhance "rm" command.

4bd3d40

Also fix bug discovered in "verify". License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Documentation updates and improvements.

12d62f4

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore "ls": Enhance "w/type" format.

c64c0e6

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Filestore: Improve shareness test descriptions.

ae0f4a9

License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

Revert "Filestore: Disable config.Filestore.APIServerSidePaths for now."

9215a1e

This reverts commit 241d10d. License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>

kevina mentioned this pull request Nov 18, 2016

WIP: Basic Filestore implementation. #3368

Closed

3 tasks

This was referenced Jan 17, 2017

Story: Register a filesystem Directory in IPFS without Duplicating Content ipfs-inactive/archives#92

Closed

Sprint: January 16-23 ipfs/team-mgmt#309

Closed

Story: [ipfs-pack] Use a Single Pack to Track Many Files Across an OS ipfs-inactive/archives#129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Filestore Implementation #2634

[WIP] Filestore Implementation #2634

kevina commented May 6, 2016 •

edited by Kubuxu

Loading

whyrusleeping May 6, 2016

kevina May 6, 2016

kevina May 10, 2016

kevina commented May 10, 2016 •

edited

Loading

whyrusleeping May 10, 2016

whyrusleeping May 10, 2016

kevina May 10, 2016

whyrusleeping May 10, 2016

kevina May 10, 2016 •

edited

Loading

kevina May 11, 2016

whyrusleeping May 11, 2016

kevina May 11, 2016

whyrusleeping commented May 10, 2016

kevina commented May 19, 2016 •

edited

Loading

whyrusleeping commented May 20, 2016

kevina commented May 20, 2016

kevina commented May 23, 2016

whyrusleeping commented May 31, 2016

kevina commented May 31, 2016

rht commented Jan 30, 2017

whyrusleeping commented Mar 6, 2017

[WIP] Filestore Implementation #2634

[WIP] Filestore Implementation #2634

Conversation

kevina commented May 6, 2016 • edited by Kubuxu Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina commented May 10, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina May 10, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented May 10, 2016

kevina commented May 19, 2016 • edited Loading

whyrusleeping commented May 20, 2016

kevina commented May 20, 2016

kevina commented May 23, 2016

whyrusleeping commented May 31, 2016

kevina commented May 31, 2016

rht commented Jan 30, 2017

whyrusleeping commented Mar 6, 2017

kevina commented May 6, 2016 •

edited by Kubuxu

Loading

kevina commented May 10, 2016 •

edited

Loading

kevina May 10, 2016 •

edited

Loading

kevina commented May 19, 2016 •

edited

Loading