Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I would like a s3 datastore for this #135

Closed
justinmchase opened this issue May 25, 2017 · 29 comments
Closed

I would like a s3 datastore for this #135

justinmchase opened this issue May 25, 2017 · 29 comments
Assignees
Labels
exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue P1 High: Likely tackled by core team if no one steps up

Comments

@justinmchase
Copy link
Contributor

justinmchase commented May 25, 2017

I would like to use the js-ipfs library on my backend but I would like to swap out the repo location from filesystem to s3.

I've been digging through the code and I saw some examples of how to maybe use a different repo than the filesytem but none of them seemed to be relevant anymore. Based on digging around in the current code it looks like I would need to create a module that implemented the same interface as the ipfs/js-datastore-fs module but worked with s3 api's instead.

Then if I did that I could bind it together like this:

import Ipfs from 'ipfs'
import IPFSRepo from 'ipfs-repo'
import S3Store from 'datastore-s3'
import { S3 } from 'aws-sdk'

const accessKeyId = 'AKID'
const secretAccessKey = 'SECRET'
const client = new S3({ accessKeyId, secretAccessKey })
const fs = S3Store
const fsOptions = { client }
const repo = new IPFSRepo('.ipfs', { fs, fsOptions })
const node = new Ipfs({ repo })
// ...

My first question is, does that sound right? I didn't see anything in the js-datastore-fs code that couldn't probably be implemented 1:1 in s3.

Second question has to do with api stability and your likliness to accept patches to this project? Also, would you be interested in pulling in the repo for datastore-s3 into the ipfs org? I'm a little worried about creating this module and then have the js-ipfs-repo change its api without also updating the related datastores. What would be the right way for me to proceed with this?

@dignifiedquire
Copy link
Member

Hey thanks for posting. It would be very nice to have an s3 backend for this. So we are definitely open to pulling in a repo into this org.

Then general idea you proposed looks about right, but there might be an easier solution. Using https://github.com/loune/s3leveldown together with https://github.com/ipfs/js-datastore-level could accomplish what you want without needing to reimplement anything.

If you look at the options for js-datastore-level, it accepts any leveldown compatible backend as db option. So you can pass that through in a similar way we pass level-js in the browser here

@justinmchase
Copy link
Contributor Author

Ok, I'll check that out. My understanding from looking in the code though was that level was for the browser while the fs was for the server, is that not so?

@dignifiedquire
Copy link
Member

No, in node, both level and fs are used, (level for the storage of key - value pairs that ipfs needs for operation, like pinsets etc, and fs is used for storing the actual blocks of data on disks). In the browser indexeddb is used to store all of this, but abstracted away through level.
The interfaces that datastore-level and datastore-fs provide are identical though, so you can switch in whatever you want. So you can use datastore-level for all storage with the s3 backend

@justinmchase
Copy link
Contributor Author

justinmchase commented May 26, 2017 via email

@justinmchase
Copy link
Contributor Author

It seems like the s3leveldown repo is out of date perhaps. Their example code out of the box doesn't work with latest version of leveldown and the datastore-level repo seems to be compatible with level-js while s3leveldown expects leveldown.

It seems like just implementing datastore-s3 may be the least complex and most direct way to get the functionality.

Also, if both options.fs and options.level are expected to jsut be datastores with the same api but storing different kinds of data in them perhaps it would make more sense to rename them on the options object to be named after what kind of data they store rather than where they are stored? For example:

let options = {
  blocks: s3,
  datastore: s3
}

Also based on your description there I'm wondering if it wouldn't make more sense to use s3 for blocks and Dynamo for the key value pairs instead? How does that sound?

@daviddias
Copy link
Member

@justinmchase just merged your PR #136, thanks for that.

it wouldn't make more sense to use s3 for blocks and Dynamo for the key value pairs instead? How does that sound?

It depends, DynamoDB is a powerhouse and our level usage is simply to store ephemeral stuff like DHT records and something more persistent such as the pinset root hash. I would argue that storing all of these in S3 would be advantageous, but putting a LRU cache in front of the datastore one would give you the most bang for the buck, otherwise you will be doing tons of tiny reads and writes.

@daviddias
Copy link
Member

Before releasing, I double checked and test js-ipfs against js-ipfs-repo master and I started getting errors when using the daemon through the HTTP-API

On tests:

  1) HTTP API "before all" hook:
     Error: Uninitalized repo
      at maybeOpenRepo (src/core/boot.js:101:21)
      at series (src/core/boot.js:47:18)
      at node_modules/async/internal/parallel.js:39:9
      at node_modules/async/internal/once.js:12:16
      at iterateeCallback (node_modules/async/internal/eachOfLimit.js:44:17)
      at node_modules/async/internal/onlyOnce.js:12:16
      at node_modules/async/internal/parallel.js:36:13
      at waterfall (/Users/koruza/code/js-ipfs-repo/src/index.js:153:7)
      at /Users/koruza/code/js-ipfs-repo/node_modules/async/internal/once.js:12:16
      at next (/Users/koruza/code/js-ipfs-repo/node_modules/async/waterfall.js:21:29)
      at /Users/koruza/code/js-ipfs-repo/node_modules/async/internal/onlyOnce.js:12:16
      at parallel (/Users/koruza/code/js-ipfs-repo/src/index.js:172:16)
      at /Users/koruza/code/js-ipfs-repo/node_modules/async/internal/parallel.js:39:9
      at /Users/koruza/code/js-ipfs-repo/node_modules/async/internal/once.js:12:16
      at iteratorCallback (/Users/koruza/code/js-ipfs-repo/node_modules/async/eachOf.js:58:13)
      at /Users/koruza/code/js-ipfs-repo/node_modules/async/internal/onlyOnce.js:12:16
      at /Users/koruza/code/js-ipfs-repo/node_modules/async/internal/parallel.js:36:13
      at get (/Users/koruza/code/js-ipfs-repo/src/version.js:53:18)
      at ReadFileContext.store.get (/Users/koruza/code/js-ipfs-repo/src/version.js:29:18)
      at ReadFileContext.callback (/Users/koruza/code/js-ipfs-repo/node_modules/graceful-fs/graceful-fs.js:78:16)
      at FSReqWrap.readFileAfterOpen [as oncomplete] (fs.js:359:13)

  2) HTTP API "after all" hook:
     TypeError: Cannot read property 'stop' of undefined
      at series (src/http-api/index.js:128:26)
      at node_modules/async/internal/parallel.js:31:39
      at replenish (node_modules/async/internal/eachOfLimit.js:64:17)
      at node_modules/async/internal/eachOfLimit.js:68:9
      at eachOfLimit (node_modules/async/eachOfLimit.js:39:36)
      at node_modules/async/internal/doLimit.js:9:16
      at _parallel (node_modules/async/internal/parallel.js:30:5)
      at series (node_modules/async/series.js:83:26)
      at HttpApi.stop (src/http-api/index.js:127:5)
      at Context.after (test/http-api/index.js:32:14)

On regular spawn daemon

> jsipfs daemon & 
> jsipfs swarm peers

events.js:163
      throw er; // Unhandled 'error' event
      ^
OpenError: IO error: lock /Users/koruza/.jsipfs/datastore/LOCK: Resource temporarily unavailable
    at /Users/koruza/code/js-ipfs-repo/node_modules/levelup/lib/levelup.js:125:34
    at /Users/koruza/code/js-ipfs-repo/node_modules/leveldown/node_modules/abstract-leveldown/abstract-leveldown.js:39:16

#136 doesn't look like it was the cause of the problem, it might be a dependency that got updated and it is causing this.

Investigating why this happens now.

@justinmchase
Copy link
Contributor Author

I didn't try that, I'll give it a shot locally and let you know what happens.

@daviddias
Copy link
Member

Got it! Now the new scheme is adding .data to all the files in the root level:

image

And so the detection is failing.

@justinmchase
Copy link
Contributor Author

justinmchase commented Jun 4, 2017

I get the same but interestingly it seems to work fine that way with a newly initialized repo. Something in my PR caused the file names to change?

@daviddias
Copy link
Member

got a fix: #137

@daviddias
Copy link
Member

+ ipfs-repo@0.13.2
[19:05:23] Published to npm.

Thanks @justinmchase ! :)

@dignifiedquire
Copy link
Member

@diasdavid I don't believe that fix will be enough, please revert these changes for now, also this should have been a minor version bump as this is a very breaking change in the options interface #138

@justinmchase
Copy link
Contributor Author

Hi @dignifiedquire, sorry about this. What steps should I take to test it fully? I agree with the version bump comment also.

@daviddias
Copy link
Member

@justinmchase yeah, just unpublished it. I had a false positive for no breaking changes as explained here: #138 (comment)

@daviddias
Copy link
Member

Any update on the S3 Datastore, @justinmchase ?

@justinmchase
Copy link
Contributor Author

Oh yeah I fixed my fork. Should I open a new PR?

@daviddias
Copy link
Member

yes please :)

Do note that the S3 DataStore should not be a default, but just something that is pluggable

@daviddias daviddias added the status/deferred Conscious decision to pause or backlog label Jul 3, 2017
@daviddias
Copy link
Member

Hi @justinmchase! Now that all the ipfs-repo has been refactored (thank you for the CR btw!), do you think you can look back into that s3 datastore?

@justinmchase
Copy link
Contributor Author

justinmchase commented Jul 17, 2017

I've been working on it, one issue I'm encountering is that the Key class in interface-datastore is making some assumptions about keys being file paths. I was trying to do it on windows at first also but there is quite a bit of work to fix it up for that.

I'm trying to work with it as is without any sweeping changes but right now my thinking would be, if there were to be sweeping changes, that the key should be changed to accept uri's instead of make assumptions about file paths. For example all current systems would be changed to be file://... based for file systems. Then for what I'm working on it would accept s3://<bucket>/<key>

Each of the repos needs to be fixed up also to remove hard coded paths (e.g. hello/world.data -> path.join('hello', 'world.data')) and the assumption that keys should start with a / is not right. Right now I'm thinking the key should be modified to basically wrap a urijs module which would be used to parse strings into objects and vice versa. The actual contents of the key are up to the user so long as they are valid uris.

Here is my repo so far:
https://github.com/justinmchase/js-datastore-s3

I cloned the js-datastore-fs repo then re-pushed it to this new repo and am in the process of changing it to pass the existing tests but be implemented as s3. I'm using a local s3 server running in docker for tests. My thought was that at some point you could fork it and keep the fork as the origin. The main ipfs repo would have a peerDependency on that module and optionally allow it to be loaded via configuration, if installed.

Thoughts welcome.

@daviddias daviddias added status/ready Ready to be worked exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue P1 High: Likely tackled by core team if no one steps up and removed status/deferred Conscious decision to pause or backlog labels Oct 17, 2017
@jacobheun jacobheun self-assigned this Mar 30, 2018
@jacobheun
Copy link
Contributor

jacobheun commented Apr 4, 2018

I have an initial implementation of this ready for feedback. All methods from the Spec are functioning.

Right now it's expecting an S3 instance to be passed to it via the storageBackendOptions. The other thought was to have it take the options and create the instance internally. The thought behind having that as a passed in value was to keep the S3 instance creation decoupled from the store.

Update: The repo has been moved to the ipfs org: https://github.com/ipfs/js-datastore-s3

@jacobheun jacobheun added status/in-progress In progress and removed status/ready Ready to be worked labels Apr 4, 2018
@jacobheun
Copy link
Contributor

I just fixed a bug and added an examples folder. It currently only contains the ipfs 101 example with a repo fully backed by s3 (not that it makes sense to do each layer on s3). I've run this against my own s3 bucket as well.

@ghost ghost removed the status/in-progress In progress label Apr 6, 2018
@daviddias
Copy link
Member

@jacobheun where is that example?

@jacobheun
Copy link
Contributor

@daviddias
Copy link
Member

@jacobheun I'm thinking of having an example in js-ipfs repo https://github.com/ipfs/js-ipfs/tree/master/examples that shows users how to plug different repo configurations to js-ipfs

@jacobheun
Copy link
Contributor

@diasdavid I think that would be very helpful for people who are interested in playing around more in depth with custom repo configuration

@daviddias
Copy link
Member

@jacobheun would you mind tackling that example?

@jacobheun
Copy link
Contributor

@diasdavid I can do that. I think I will get the locker config/option #161 working and then add that into an example in js-ipfs.

@jacobheun
Copy link
Contributor

@diasdavid opened a new PR for the example ipfs/js-ipfs#1303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

4 participants