Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easy Docker Compose setup for SPARQL endpoint and RDF browser for a small knowledge base #96

Closed
KonradHoeffner opened this issue Sep 23, 2022 · 8 comments · Fixed by #97

Comments

@KonradHoeffner
Copy link
Contributor

KonradHoeffner commented Sep 23, 2022

Docker and Docker Compose are well suited to provide an easy and repeatable way to setup the infrastructure for a typical Semantic Web project where you have an ontology and a small knowledge base and want to share that over a SPARQL endpoint and an RDF browser.

I propose a Docker Compose setup with the following properties:

  • No additional effort required, "docker compose up --build" does everything. No going into the conductor and manually uploading files or calling virt_load scripts. This means that the container can be removed and rebuild as often as one likes with no loss of data and little effort for migration to another server. Just git clone ... and docker compose up --build and that's it. Killing and starting the container is also the method to
  • Small ontology and knowledge base living in a Git repository as source of truth, in any other places the data is read only.
  • Low memory and CPU utilization as many (virtual) servers have very little hardware and this saves money.
  • CORS enabled by default so that web applications have no problems using the SPARQL endpoint.

Explicit non-goals:

  • large knowledge base
  • frequent data changes
  • SPARQL endpoint as source of truth (then you need persistence and backups, which makes it more complicated)

Now is a good time do to that because the official OpenLink Virtuoso Docker image recently provided functionality + documentation to achieve that. Previously, one would typically use an alternate Docker image together with a wrapper script.

I will provide a pull request however I'm not sure where to put the files (docker-compose.yml and example knowledge base), would a "docker" folder be acceptable?

@madnificent
Copy link

Hi, good idea!

We've been running https://github.com/mu-semtech/mu-project for some point a starting point for semantic.works applications. Part of this is a docker Virtuoso setup using redpencilio/virtuoso (which the creators of tenforce/virtuoso are maintaining) for specifically such purposes.

The SPARQL endpoint can be configured for various memory constraints https://github.com/mu-semtech/mu-project/blob/master/config/virtuoso/virtuoso.ini . Although persistence is not of your concern we have experienced that users find it frustrating if data is accidentally removed. Mounted volumes make that easy to understand. Aside from hard-killing the Virtuoso instance, data should be persisted in this setup.

This image does not need to be built. We have runners for this when new releases are made.

Data can be loaded into the endpoint by placing them in the data/db/toLoad folder at first start (but that will not load data on a next run as you may update data afterwards). Updating the store after can be managed through https://github.com/mu-semtech/mu-migrations-service/ which is something users will likely expect after.

Perhaps it makes most sense to strip the unneeded services from mu-project and use that as a starting point?

@KonradHoeffner
Copy link
Contributor Author

KonradHoeffner commented Sep 23, 2022

Thanks! I used tenforce/virtuoso in the past as a base as well, but the developers of Virtuoso advised against it, which caused me to switch to the official image as soon their simpler import mechanism, which doesn't need a wrapper script became documented.

Pull request #97 implements this with just three files: docker-compose.yaml, initdb.d/setup.sql and rdf/example.ttl.
It also uses environment variables instead of mounting virtuoso.ini, which I think fits the goal of EasyRDF better and seems to generally be recommended, where mounting the configuration file would probably be better suited to a more complex use case where many of the values are overwritten.

You are right, there should be some mechanism to prevent accidental removal of data.

@madnificent
Copy link

@dbooth-boston
Copy link
Collaborator

I like this idea! Does the software included in the PR meet the RD-LAMP criteria for inclusion -- FOSS, etc.?

@KonradHoeffner
Copy link
Contributor Author

KonradHoeffner commented Sep 23, 2022

@dbooth-boston: The software included in the PR consists of three parts:

  1. Docker: As far as I know open source, included in https://github.com/moby/moby (anyways, it is just the container to run the other two)
  2. OpenLink Virtuoso in its Open Source edition
  3. RickView: open source under MIT license.

All three are used in real world applications.
While 2 and 3 are run in a Linux container, Docker itself runs under Windows and Mac OS. To be honest I have never used a server that isn't some form of Linux but I think it should work under Windows and Mac OS the same way because that is the goal of Docker, it would probably just use more disc space under Windows to emulate Linux. I don't know if that conforms to the criteria of EasierRDF or if containerization counts as "cheating". If so, I could compile RickView under Windows and look for a Windows version of Virtuoso.

However as the author of RickView I must admit 3. is absolutely not the most popular RDF browser and is not actively supported and used in the RDF community at all as it is still only used by our own institute right now. I chose it because it was developed for exactly this use case (optimized for small knowledge bases, easy setup, high performance, low resource utilization) as I couldn't find an existing RDF browser that fits all those criteria.

RickView uses the design from LodView, which is much more popular, so if you want I could switch it out for that. LodView uses a lot more RAM however and is a bit trickier to setup though. Pubby would be possible as well but it's design is a bit too basic for my taste. I am open for other suggestions as well for the RDF browser.

@dbooth-boston
Copy link
Collaborator

Very good. If RickView is not actively supported then that alone disqualifies it. So yes, if you could please switch that to a different browser that would be good. Also, what's the best way to incorporate @TallTed's proposed changes to PR #97 ?

@KonradHoeffner
Copy link
Contributor Author

Well RickView is under active development and support but not by the community, so I will switch it out. This way it will also be able to handle large knowledge bases. Do you have a preference for which other RDF browser I should choose?

I completely agree with the proposed changes from @TallTed and approved all of them.

@dbooth-boston
Copy link
Collaborator

Ideally we should choose the RDF browser that best meets the inclusion criteria. I'd say make your best guess, and if others think a different choice would be better, hopefully they'll speak up and explain why.

dbooth-boston pushed a commit that referenced this issue Sep 26, 2022
* Add Docker Compose template for SPARQL endpoint and RDF browser. Resolve #96.

* Update docker/README.md

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

* Improve formatting of docker/rdf/example.ttl

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

* Update docker/README.md

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

* Use LodView instead of RickView.

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants