This repo is a demonstration of using the R {targets}
framework along with the min.io S3-compatible server to have
linked version control between code, data, and code-generated objects.
{targets}
can use AWS S3 storage to store objects generated, and with S3 bucket
versioning enabled, it can store all versions of targets.
If you then version the _targets/meta/meta
file with your code, targets will fetch
object versions matching your code.
However, for cost, security, or connectivity reasons, it may not make sense to store your data on AWS. min.io is an open-source, free server that serves files over an S3-compatible REST interface. You can run it on your own server within an organization network or locally on your machine.
This repo demonstrates using min.io for versioning on your local machine.
-
Install minio server. Follow instructions for your OS at https://docs.min.io/minio/baremetal/quickstart/quickstart.html. It's just a
brew
,apt
, or binary download. -
Clone this repository
-
Bootstrap packages and data. Start a session and run:
renv::restore() # Install relevant packages piggyback::pb_download("minio_storage.zip") # Fetch the min.io data unzip("minio_storage.zip")
This creates a
minio_storage
folder (gitignore'd) which will contain the contents of your local S3 bucket
Now, start a min.io server:
mserver <- processx::process$new(
command = "minio", args = c("server", "minio_storage", "--console-address", "localhost:9090"),
stdout = "", stderr = "2>&1"
)
This serves an S3 endpoint at local port 9000. You can the visit the web interace for your min.io server at http://localhost:9090/. The default login
and password are both minioadmin
. In this case, the server already has a bucket with versioning turned on ("targets-versioned"), and a set of credentials ("testcreds"/"testcreds") as server configuration was downloaded along with data.
If you were starting your own new project, you could set admin credentials with 'MINIO_ROOT_USER' and 'MINIO_ROOT_PASSWORD' environment variables, create your own bucket, and create a more secure set of credentials under Identity > Service Accounts
.
In the _targets.R
file, which defines the project workflow, we use tar_option_set()
to
use the local min.io S3 endpoint and the "targets-versioned" bucket to store all our
objects.
Now, build your projects in the R console:
targets::tar_make()
All targets should be skipped and the pipeline should be complete.
We can do the same moving to a different commit. In the shell, go to an old version of the code:
git checkout f9d9
Now run targets::tar_make()
again. The targets should still skip!
Return to the HEAD:
git checkout HEAD
Now you can try modifying the _targets.R
pipeline, building and committing
_targets/meta/meta
. That file stores references to the AWS files and versions
served by the min.io server:
Stop your server when you are done:
mserver$kill()
If you also want to share your targets data via piggyback, you can pull and push it like so. You'll need your own repository to create releases to attach data to.
zip("minio_storage.zip", list.files("minio_storage", recursive=TRUE, all.files = TRUE, full.names = TRUE))
piggyback::pb_new_release(repo = "YOUR_NAMESPACE/YOUR_REPO", "YOUR_TAG") # Only need to do this once
piggyback::pb_upload("minio_storage.zip", repo = "YOUR_NAMESPACE/YOUR_REPO")
piggyback
attaches the data to GitHub releases, which have a 2GB size limit.
As the storage directory stores every version of your objects, it can get quite large and it may not
be practical to share it this way. You can prune your versions via the min.io
web
interface.
However, if you have a shared server, you can use min.io
on it so your
team can share object versions without pushing or pulling or connecting externally to AWS.