-
Notifications
You must be signed in to change notification settings - Fork 11
Experiment: Setting up an Ubuntu mirror on IPFS #18
Comments
Success, total time: 12 hours
Although when trying to publish the name, get an error:
|
(the last one is cause your ipfs node is offline) This is super cool! @magik6k @Stebalien @Kubuxu - what could we do differently to make this ...less slow. =] |
What filesystem do you use? Are the disks in raid 1 or 0 or independent? 12h for 1.18TB seems to be around 28MB/s, which probably isn't anywhere near to what the drives can get to. I'd recommend initializing the node with badger datastore - |
It also looks like the CPU usage is pretty low. We may need to finally implement parallel hashing. |
Name successfully published, thanks for the tip @momack2
@magik6k looks like the setup defaulted to raid 1, will try with badger next |
Re-ran against the same
Same CID produced, this time it took 18 hours:
/data/.ipfs/ -> 1.8G (previous run without badger resulted in 2.4G) |
Now updating the mirror about 60 hours after the first rsync:
Have started another add, this time with the exiting badgerds .ipfs directory to see how long it takes to check and add the ~7,438 changed files. |
Failed after 3 hours 30 mins:
|
Interestingly, adding the updated non-badger .ipfs directory also failed, on a similar file in a different path:
|
@warpfork suggested doing a run with |
The file was probaby updated. Filestore expects files to be immutable once added, which can be problematic in this case. I'm not sure what is the best workaround in this case, but you need to somehow make ipfs not add those files to filestore. We could change filestore to only add read-only files (possibly with a flag, and add rw content to normal blockstore) |
@magik6k ideally this would happen transparently to the users, so they don't have to declare which files may change, then keeping an IPFS mirror up to date with rsync would be easily scriptable without needing to know exactly the expected behaviour of future updates from rsync |
I thought I'd try again with a slightly smaller dataset, the https://clojars.org/ maven repository, which has an rsync server and is about 60GB. Kicking off the same set of commands with a fresh I guess its due to the differing folder structure, clojars is has many more folders at the top level? |
I'd say many small files too. Can you run: find . -type d | wc -l
find . -type f | wc -l on both datasets? |
clojars has 2.7x folder count and 1.8x files than apt |
I've documented some of the blockers found here in this PR: https://github.com/protocol/package-managers/pull/21/files |
In the mean time, I've kicked off a fresh If that successfully completes, I'll put together a blog post for https://blog.ipfs.io detailing how to use apt-transport-ipfs and how to set up your own IPFS mirror of an ubuntu/debian mirror. |
Update on the current attempt at not using the filestore to mirror apt, the initial offline import took around 36 hours, completed successfully and as expected After an rsync run to update I've now ran a third rsync update (pulling in only 12 hours worth of changes), I also started the ipfs daemon this time, although still passing the That seems like it's possibly successful enough to actually be usable as an apt mirror 🎉 Also going to test to see how much slower it is without the --offline flag after this last run. |
Third offline ipfs add (with daemon running) completed in 3 hours 20 minutes. Another rsync run done (minimal updates since 3 hours ago), going to attempt an ipfs add without |
In the meantime, you can browse the mirror here: https://cloudflare-ipfs.com/ipfs/QmThJ4k554iT3B7SZonmVMwspHALiHGThhqaem7nij1wQD n.b. using cloudflare as I'm getting |
Updating and adding without Have published an ipns for it: https://cloudflare-ipfs.com/ipns/QmTeHfjrEfVDUDRootgUF45eZoeVxKCy3mjNLA8q5fnc1B Now working on a docker file for a consumer using https://github.com/JaquerEspeis/apt-transport-ipfs and a script to keep the ipfs mirror up to date with regular rsyncs. |
Just published a docker image that uses the IPFS mirror I've got running to install things: https://github.com/andrew/apt-on-ipfs 🎉 Example output from It's not very quick, and the IPFS daemon is very chatty in the logs but it's working! |
Moved apt-on-ipfs into the ipfs-shipyard org: https://github.com/ipfs-shipyard/apt-on-ipfs Also added a mirror.sh script that is currently running every 4 hours via cron to keep the mirror up to date: https://github.com/ipfs-shipyard/apt-on-ipfs/blob/master/mirror.sh |
Does anyone have data on client bandwidth, for installing single package over IPFS versus the same package over apt? |
Strange, I'm also getting As for the IPNS hash, I'm personally getting I also used this tool to check for the IPNS availability, but it seems like no one's able to get it. |
@NatoBoram I've been off sick for a couple weeks and it looks like the ipfs daemon fell over on the mirror server at some point, I've restarted it now, that ipns name should start resolving again after the next successful cron job run within a few hours. |
FYI I'm going to be turning off the machine that was running the experimental mirror on 2019-07-01 |
@andrew what's the status of this experiment? Could we (literal 'we', not royal 'we', lol) summarize, share, and archive, or is there more here to do? |
"summarize, share, and archive" sounds like a good next step, this will form the basis for a lot of bench-marking and testing of file-system based package managers this quarter. |
I thought it'd be interesting to try and replicate what was attempted in this thread just over a year ago to see if similar performance problems exist when in adding a large file-system based package manager to IPFS.
Steps outlined below:
Spun up a "Store-1-S" Online.net dedicated server in France with Ubuntu 18.04:
Followed https://wiki.ubuntu.com/Mirrors/Scripts to rsync a mirror into
/data/apt
:Output from rsync:
At this point
/data/apt
is about 1.2TBThen installed ipfs:
Based on notes from ipfs/notes#212, made the following config changes:
Then ran the following command to add
/data/apt
to IPFS:I then took the dog for a walk 🐩 🚶
Output from
dstat
at various times over the next few hours:(sorry about the colours here)
It got "stuck" here around 30% (~345GB), progress bar didn't show any changes, slowly writing to disk.
/data/.ipfs
dir was 441Mb for 345gb uploadedHad some lunch 🍜
Came back to life whilst I was at lunch
/data/.ipfs
dir was 862Mb for 532GB uploaded after about 3 hours.Status as of 3pm:
/data/.ipfs
: 1.3Gb3:15pm progress slowed again, lots of writing, no reading
4:15pm back to full speed again:
/data/.ipfs
: 1.6Gb5:30pm progress slowed again, lots of writing, no reading, seems to be happening every hour
/data/.ipfs
: 1.9Gb6:20pm back to full speed again:
/data/.ipfs
: 2.1Gb7:20pm stuck at 100% similar disk writing pattern, no CID returned yet:
/data/.ipfs
: 2.3Gb8:15pm: still stuck at 100%, similar disk writing pattern, no CID returned yet
Will update this issue as it continues.
Drop a comment with any commands you'd like to see the output for during or after.
The text was updated successfully, but these errors were encountered: