Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add okh JSON schema #1

Merged
merged 1 commit into from
Jan 15, 2024
Merged

Conversation

devhawk
Copy link
Contributor

@devhawk devhawk commented Dec 19, 2023

No description provided.

@Jbutler-helpful
Copy link

Please Take a look at the $comments in this Schema. As we put this together we had a number of question and use the comments as an opportunity to point a finger at them.

@touchthesun touchthesun merged commit 6fac437 into iop-alliance:main Jan 15, 2024
@hoijui
Copy link
Collaborator

hoijui commented Jan 19, 2024

I had a look at it (mostly at the $comment parts).

History

I personally also think, having a JSON-Schema (A widely used industry standard) for the OKH Spec is the most sensible thing to have, and should be the single-/original-source of truth of the spec, as long as the manifest file is in YAML, TOML or JSON format.
I created such a JSON-Schema about 2 years ago, proposed it, and it was rejected. The reason was, that is already a schema for the spec in a custom format that a member of the consortium invented, which only the custom web interface for OKH v1 manifests understands. I also find various issues with that custom schema spec - as in: parts where it does not match the human-readable spec which is considered the source of truth.
So in OKH-LOSH, we created a JSON-Schema for each: OKH v1, and OKH LOSH, both to be found in the respective repo.
We used our version of the OKH v1 JSON Schema to verify all publicly known OKH v1 manifests at the time (~ 2 years ago), which were ~ 2000 manifests. After filing a lot of PRs on the mainifests, and later also applying on-the-fly fixes to some of the manifests (after downloading them), where they did not conform to the spec as is, we were able to successfully validate all of them.

End of September this year, I had a chat with (mostly) @Jbutler-helpful about how to work together and unite the different OKH versions. In that meeting I also introduced him to JSON-Schema, as I saw that Helpful also uses their own schema format. As he suggested, I also created an issue on one of their projects regarding this whole unification topic:
helpfulengineering/project-data-platform#60
In this chat, and in this issue, I referred to our version of the JSON-Format for OKH v1, but apparently that info got lost, and Helpful created their own from scratch, to form the basis of this PR here.

Feedback

Unlike our version, this helpful version does not represent OKH v1, but includes various adjustments as they saw fit, as explained in the $comments in the (helpful) Schema itsself. Thanks a lot for that! makes things much easier; good idea!
Some of those changes make total sense to me (like the Contact ones), and others less, or say, would need discussion.
Disregarding the details of the changes though, I think it makes no sense to introduce a new spec-format and at the same time include changes to the spec within it.

My Suggestion

From the IoPA perspective, I think the best course of action would be, to start hosting the spec on a platform where everyone can easily suggest changes (like GitHub or GitLab), and then switch to a single format of truth, which is machine- and (at least somewhat) human readable, and from which all other necessary and desired formats can and will be generated.
At that point, we would have an optimal point of reference on which to suggest and discuss changes.
Part of the process to choose that single source format, would have to be the RDF discussion in my eyes. It is hard to understand the real world benefits of RDF without using it (at least for tech people), but all within the OKH spectrum who went through that learning process, clearly understand the benefits and the necessity of a format with such benefits for a distributed data standard, as is OKH. So far, speaking of only the tech people, that is only Max Wardeth and me, to my knowledge.
In my dream scenario, we would use an RDF Ontology as the single source of truth, and generate from it:

  1. a human-oriented Markdown version
    (mostly trivial)
  2. a JSON-Schema for verifying data provided in JSON, YAML, TOML or similar
    (this will need some more fiddling in the beginning)
  3. a SHACL/SHex spec for verifying RDF data, whether it conforms to the ontology
    (am working on a software for that, also required in other projects)
  4. different RDF formats
    (trivial, software for this already exists)
  5. Visual, graph based representations
    (very basic (bad) ones already exist; better ones would be very good to have but need making)

This whole system should prove invaluable for other standards as well (like OKW, Valueflows, OSH-Ontology and many more).
I can say from both my experience with OKH v1 and OKH LOSH, that we will never manage to keep different formats of the spec in sync, if not using a single source plus auto-conversion. Using that approach does not mean, that everyone necessarily has to suggest changes in the source version, mind you! We should encourage that, but it is also trivial for anyone to suggest changes on the MD version for example, or even in a visual representation, and for someone else to then port them to the source version. That is still much more comfortable and less work for the maintainers, then trying to sync different source formats. It might even be possible to auto-convert back changes from the generated formats to the source format, which would make this work even simpler.

If we get there, we can finally start to move forward together, bring in Helpfuls changes, LOSH changes, anyone elses, and have a clean process of handling these change-requests.
I think this work is the most important that IoPa can do; more important then the OKH and OKW themselves: To come up with a good process of handling an open, distributed data standard, available in the optimal formats (open, human- & machine-readable, industry standards, editable by non-tech people).

@hoijui
Copy link
Collaborator

hoijui commented Jan 19, 2024

I would recommend to undo this merge, and link to our JSON Schema of OKH v1 instead, which conforms to the current state of OKH v1, unlike this schema.

https://github.com/OPEN-NEXT/LOSH-OKH-JSON-Schemas/

@MakerNetwork
Copy link

@hoijui thanks for all the detail and context on this, it helps quite a bit to have the background you've shared here, and I value your opinions on this matter. You make a lot of very good points regarding the necessity of a single source of truth if we are ever to have any hope of keeping our versions in sync.

I have no personal experience working with RDF, so I can't speak to any of the existing tooling available for such things, but I understand that there are python libraries like rdflib, and we can probably build most of what we need with that if there isn't something already built. I would love to discuss this with you in more detail, to put together a more concrete roadmap and figure out what kind of technical resources would be needed to make something like your vision become a reality.

My main concern with this plan is that it will make the perfect the enemy of the good, and require several more years of effort before we have something that enables design sharing, supply chain mapping and all the rest. So if we can work out what a minimal prototype of this proposed system would require, that should answer most of the questions I have.

Regarding the proposal of undoing this merge and instead linking to the LOSH schema, I have no particular problem with doing that, but I would like to have a plan for what to do with the Helpful Engineering version and their contributions. If, for example, I undo this merge and instead link to the LOSH schema, what should then be done with the Helpful version to capture their contributions?

@hoijui
Copy link
Collaborator

hoijui commented Feb 13, 2024

Woohoo! :-)
thanks Nathan, I am very happy to read all of this!

I will answer in random order

JSON Schemas

With have both the OKH v1 Schema, and the LOSH Schema in one repo. They are kept as similar as possible, so the diff is minimal. So at first, I would recommend to link to the OKHv1 schema, or even copy that file into your repo (-> overwrite the helpful schema with it). With that, you would be in a sane state again: not defining different standard versions/states in different formats within a single repo.
This is where I would keep it then, for now.
In theory, we could then discuss including LOSH changes (using the diff), one by one or all at once, and then each time, adapt the other versions of the spec accordingly - e.g. the human readable one. And/Or Helpful could rebuild their schema on top of the OKHv1 schema, and then we could do the same with their changes. This would be a lot of work (including lots of manual, error prone one), that I would rather do once we get to a single source of truth, to save us all time, errors and nerves.

RDF

Yes, there are RDF libraries in all the big languages (I would guess, the 50 most used ones). It is by no means an new thing or niche-tech.
It is very different then what one is used to before knowing it for most people; it was so for me, and without having references, understanding it takes a while, but when one does, one understands the huge potential and benefits. Max Wardeth also knows it, and he would surely advocate for it too. I would compare it to git, as it requires similar levels of rethinking things one knows, and has similar benefits and potential to change the world.
The big hurt with RDF is, that apart from the basics (like the format, parser libraries and DBs -> all we really need), most software is written by academics, and thus - very likely - long dead when you find it.
Comparing RDF with JSON/YAML/TOML/XML - which are almost the same thing, just using different syntax:
JSON (from now on I will use this to include the other, similar formats) is basically just a lump of data. One can use JSON-Schema to define it, allow data validation, and describe the fields to some extend, but it is very limited; it shows in many places, that this spec was developed after the fact, and because a JSON-Schema is written in JSON itsself (and for other reasons), it will always stay very limited. It is also something that was invented after the fact (of JSON), and it is not well known (compared to JSON).
in RDF (or rather OWL -> a higher level language within RDF). Things are quite different. For people like us, dealing with data (vs logical systems), Instead of thinking of RDF as lumps of data, it makes more sense to think of it as a distributed database. That database has a DB-Schema, which is called Ontology (sometimes also Vocabulary). Then one can store data fits this schema in a query optimized RDF DB, or in files. Similar to the JSON&co - but fully convertible into each other - one can choose between different RDF syntaxes for these files: XML or JSON, but also RDF native formats like the more human-readable Turtle format.
anyone can take a number of such files, store them in an RDF DB, and run queries on them.
In RDF (unlike JSON), interlinking "any" data is a first-class/native feature. It is also called a "Linked-Data format", and in our case that means, that it would be trivial for anyone to extend our/any Schema we come up with, and annotate projects with that data, without causing any issues for us or anyone else, while at the same time there is an intrinsic pull towards uniting Schemas(=Ontologies), or say, combining efforts and proposing to upstream even, vs having parts duplicated all over.

... I could go on and on, but really, you would have to dive into it. As with git, it takes time and brain-remodeling to get it. The inventor of RDF is the same guy that previously invented HTTP. He obviously has a very social, interconnected dream, and RDF gets us much closer to it then HTTP. Once you get it, it will be as important to you as is git (vs SVN).

I warmly recommend the video on this page as an intro to RDF, by its originator; it is good for both tech- and non-tech-people; it transports all the cool things about RDF to you, in 10min:
https://inkdroid.org/2010/06/04/the-5-stars-of-open-linked-data/

Perfection vs good

That is a very good point, thank you!
It made me rethink things a bit, and I think it leads to a much better plan:
One thing we did totally wrong in OKH-LOSH, was that we focused on the software, and never considered the hosting infrastructure. That was left as an after-though, as in: "Once we have all software and we gathered the data, we will setup a server and host it, with a nice GUI on top."
I propose to setup a server first, and to have a plan how it can stay on for a long time. Then, many of us (let's say, all the tech-folks involved) should have full access to it, and everyone else should have read access. I would propose to have at least these services running:

  1. an RDF triple store (== queriable RDF DB server)
  2. a simple web-server for hosting different versions of the ontology
  3. (optional at first) a SOLID server

Do you have an idea where/how we could host that?

This would then be our playground, and we can put data there. The cool thing with RDF then is, that the data always references its Ontology (~= Schema), and it does so through the hosting URL of the Ontology, and that URL contains the version (often the release-date, but can be a semver or commit SHA). That means, we can store data following different ontology versions in the same DB. When querying, we can choose a single version, or combine data from different versions. That allows for fast and stress-free iteration: No need to coordinate all team members and external groups when changing the Ontology!
Of course, we can not store all the data of all projects on the web for each little change to the ontology. I rather imagine a comprehensive set of a few, representative projects we use all the time, and crawl/store a lot of projects for major & (some?) minor releases only.
Keeping the data for old releases is crucial, as it gives any consumer of the data the ease of mind, that their front-end will keep working even if there is a new ontology release. Of course, this sounds a lot like a centralized system again, and in fact it is, but this is just the proof-of-concept/development system. We need that, to make it easy for everyone to do actual work, without having to setup their own thing and reconfigure and trouble-shoot all the time, when working with each other. Apart from the hosting of the ontology itself, which is only limited by the fact it has to be available under a stable URL, everything can be run distributed without problems.
If we go this route, we could start off with the OKH-LOSH ontology right away, or we could have an over simplistic ontology that only stores a projects name and repo URL. Both of these approaches would be very easy to do, and nowhere near perfect, but pretty good.
People also need a fast, trouble-free place to try out the whole system, before considering to setup their own node. We saw that need over and over again with OKH-LOSH, where up to this day, we do not have the data hosted, and while there are many that want to browse the data, they can't without setting up their own system, and nobody does that just to try.
Once the data is hosted, it can directly be queried with SPARQL (RDFs equivalent to SQL, very similar) by generic data-browsing clients, native or web-based, or by any Web UI specific to our ontology we come up with, that uses SPARQL in the background.

Sync

TODO (I'll write about this in an other comment)

@hoijui
Copy link
Collaborator

hoijui commented Feb 16, 2024

(This is my proposal only)

Sync

In theory, with the above setup, we could keep working in parallel on OKH v1, OKH LOSH and OKH Helpful, on the same infrastructure, using the same base technology, and of course we could come up with a completely new schema(=ontology) for this project.
Of course, that would be bad, and we should merge. The question is, how to best do that.
In the best case, we could first decide on the base technology, which could be made up of these:

  1. An RDF/OWL or AtomicData Ontology as the single source of truth (the Spec+Schema),
    which includes all machine- and human-readable data.
  2. Auto-Derived formats like a JSON-Schema, a human readable Markdown specification document, an SHACL/ShEx validation schema (for RDF validation), ...
  3. Data directly in RDF, or in JSON&co. which could be auto-converted to RDF
  4. a Crawler that either outputs RDF directly or JSON

And in a second step, we would choose a starting point (on which later to propose changes).
Politically, the first thing that comes to mind as a starting pint within IoPA, would be OKH v1.
Technically, OKH LOSH makes more sense, because it already uses RDF and has a crawler, and because much more work and refining has gone into it (not the least, lots of issues reported on OKH v1 have specifically been addressed within OKH LOSH, of both technical bugs and schema logic issues). The only issue I ses with OKH LOSH, is that it has been worked on amongst a smaller group of people, even though they (or say, Martin Haeuer as part of that group) was in close contact with many stakeholders, while doing so.
I personally have no big preference on the LOSH schema over the v1 schema, other then those fixes, but I know it to be vastly better, technology- and bug-wise, simply for all the hours of work that have gone into it, without making it more complex, over all.

From then on, we can proose changes. If we start off with OKH v1, I would first go though the OKH LOSH changes. If we start off with OKH LOSH, I would go straight to the "atomization" of the BoM, as proposed in helpfulengineering/project-data-platform#60, probably by first creating a simple, Open BoM Spec (<- see this repo, where we could already now start working on it).
In the end, we will want to have something more sophisticated then BoMs, yet it makes a lot of sense to start with those anyway, if we are on the path of inclusion of most OSH that is out there, vs providing the most sophisticated and complete solution that could ever possibly be required by any OSH project. Having well structured, standardized BoMs will already bring us a long way towards that "end-goal", and allow future refinement processes in that domain to go much smoother.

Summary

  1. Decide on base technology
    (e.g. RDF + derived formats)
  2. Choose starting point
    (e.g. OKH v1, OKH LOSH, something else)
  3. Go through changes
    (propose, discuss, adjust, accept/deny)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants