-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Mirroring #2
Comments
S3 for storage w/ Cloudfront for CDN? Long expiration in CDN should be very effective. Having the data at rest in S3 reduces HTTP requests to the backend servers and should prove much simpler than mirrors. |
The upside of using mirrors is, though, that we can defray hosting costs. Our budget so far is $0. |
I can provide rudimentary mirrors through DigitalOcean in all regions at no cost. |
I can throw several TB/month into the pool. |
Linux distros use rsync for mirror synchronization - here is how Arch Linux does it. Something similar to the process they use could enable about anybody willing who has a HTTP server running somewhere to donate bandwidth and storage, which at times is easier than donating money. |
I have to run and give a talk, but the Internet Archive is happy to host freely distributable content on their servers, which includes all FOSS/CC licensed KSP mods. They have an S3-alike API that's described here. Yes, the Internet Archive is super-awesome. <3 |
I can donate some of my bandwidth for a mirror. I can't promise much speed. But I am willing to help. |
Internet Archive might be really great for this. For our own mirroring options, I can spare some bandwidth too. Not on the order of what KerbalStuff was using by itself, but I have unused quota each month on my hosting, since my own websites use less than 5% of what I'm permitted, last I looked. If we get enough mirrors involved, each one's bandwidth requirement would be fairly small. |
github pages can store it well |
I took a look at my Linode account, and I'd have to increase my plan to have enough disk storage for all of the current file data (since there's 62 GB of it). In terms of monthly bandwidth allowance, I have tons of room to spare. It's the disk that's really tight. I'll hold off from doing anything until we know whether we need the mirrors. |
I'm offering up part of my unlimited 250 mbit dedicated in Europe (its in Roubaix, France). |
I have 500GB of hard drive space with a 1gbps uplink on a dedicated server (with CloudFlare as a CDN in front of it). Would love to help out as well. |
I'll be working on the best solution for this case, as it seems that we have lots of people willing to offer mirrors + the Internet Archive. @phmayo have you come to any conclusions? |
Using the IA requires work on the backend, either the website, or an uploading cron job. @ThomasKerman and VITAS have plenty to do as it is for the moment. So, much as I want this, we need to focus on getting an easy way for mirrors to be activated and made available first, preferably without any impact on CKAN at all. If that isn't possible, well, we'll deal with that when it's time. Making SD resilient to failure is our top priority, so we don't get another outage like on Monday.
For a mirror, 100GB storage, 10 TB transfer should be plenty to get us started. |
@phmayo just had a conversation with VITAS and got to a pretty nice design for the solution plus the fact that it's not a priority. I'll get a workflow during the next days. Will look into SyncThing, thank for sharing! |
rsync is pretty easy of course. Another possibility is to roll our own process to push out whole files when new ones are added, and nightly (or every 48 hours or whatever) rsync to catch anything that was missed/dropped. |
Hi you might wish to check https://about.maniacdn.net/ it is a community driven cdn built for another game. and the sources are public. It uses rsync to sync the files. Anyone can contribute with their server. |
How about web caching, a solution that requires no control over the mirror server, and requires no cronjobs or daemons? |
The main concern here is the following: anyone can poison files with ease, as these are not signed. @dries007 I don't believe that would be a solution as it wouldn't help with transfer bandwidth nor having distributed data in case of failure. |
Well, you are setting up a deliberate man-in-the-middle structure, but how hard would it be to add hashes to the page to so at least people who want to can check them, that is if you are only serving the download files out of the CDN and not also the main page. @sikian I disagree, I'll be using nginx as an example here, but I'm pretty sure you'd be able to apply this to most web server software: |
@oliverde8 If multiple people have multiple caches running, you can distribute the load and the main server would only have to supply the user specific data, and new files the cache doesn't have yet. |
@dries007 FYI you can configure caching rules in CloudFlare to tell it what to store short term or long term (it will listen for standard cache control headers). |
@brandonwamboldt I thought that was premium only, good to know. |
@dries007 There is a limit for the free account (although it will always follow cache headers so you can just set it up via Nginx/Apache). However, if SpaceDock goes with CF I've volunteered to sponsor the premium plan. |
The text was updated successfully, but these errors were encountered: