Working on paper #406

michael-kotliar · 2021-06-22T14:14:50Z

To put all required material for the paper, to leave comments, etc.

codecov · 2021-06-22T14:25:22Z

Codecov Report

Merging #406 (a947776) into main (2cc8a93) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #406   +/-   ##
=======================================
  Coverage   78.98%   78.98%           
=======================================
  Files          18       18           
  Lines        3231     3231           
  Branches      872      872           
=======================================
  Hits         2552     2552           
  Misses        441      441           
  Partials      238      238

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cc8a93...a947776. Read the comment docs.

mr-c · 2021-06-22T15:14:25Z

paper/draft.md

+Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation.
+
+### Mentions
+> Put here recently submitter CWL paper. The title of this section should be changed into something meaningful.


https://arxiv.org/abs/2105.07028

mr-c · 2021-06-22T15:15:03Z

paper/draft.md

+> Put here recently submitter CWL paper. The title of this section should be changed into something meaningful.
+
+### Examples
+> I think it's would be great to put here some of the schema-salad examples (if we are still within 1000 words limit)


@pjotrp can you say a few works about the use of schema-salad in PubSeq?

Yes. Let me have a look.

I think the reader would be best served with a few examples. Show a JSON record and a mapping and the RDF or JSON-LD output. I mean the strength of schema salad is a simple translation with error checking. To be honest, I only understood what schema salad was about when I ran something. That is how you can help the interested user. Does that make sense? We can use examples from PubSeq for sure. Does that make sense? If you inject the images as figures they don't count as words.

It would be nice to have PubSeq examples in this paper. We can also mention PubSeq in the Mentions section (this section will be renamed later, but it's for a representative set of past or ongoing research projects using the software and recent scholarly publications enabled by it). As for the figures, I don't think they count them as words. Here is an example of how they add figures into the papers

Figures

Figures can be included like this:
$Caption for example figure.\label{fig:example}$
and referenced from text using \autoref{fig:example}.

Figure sizes can be customized by adding an optional second parameter:
{ width=20% }

I am proposing to turn examples (YAML/JSON/RDF) into figures, so they don't count as words.

michael-kotliar · 2021-06-27T18:57:56Z

paper/draft.md

+## Schema Salad: A bridge between document and record oriented data modeling and the Semantic Web
+
+### Summary
+Salad is a schema language for describing structured linked data documents in JSON or YAML documents. A Salad schema provides rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad builds on JSON-LD and the Apache Avro data serialization system and extends Avro with features for rich data modeling such as inheritance, template specialization, object identifiers, and object references. Salad was developed to provide a bridge between the record oriented data modeling supported by Apache Avro and the Semantic Web.


Just some thoughts to not forget about.

Assuming I'm a person who read it for the first time. When I see the sentence

Salad was developed to provide a bridge between the record-oriented data modeling supported by Apache Avro and the Semantic Web

I have questions "What exactly is that bridge and why do I need it? Where do I use record-oriented data and how that bridge will make my life easier?" We need to make sure we answer these questions somewhere in the paper.

Also, based on this, Record-Based Data Model has three subtypes.

Hierarchical Data Model - this looks like the best fit for JSON

Network Data Model

Relational Data Model

When in the paper we say record-oriented data modeling, do we mean all three subtypes or only Hierarchical Data Model?

Why do people like the Hierarchical Data Model - simplicity, data integrity, easy availability of expertise. Why do people don't like the Hierarchical Data Model - lack of standards (and here we are with Schema Salad), lack of querying facility, inflexibility.

In my opinion the README is not great. It does not explain in layman's terms why we have schema salad and why we need it.

You can argue the README is for experts. But I think the JOSS paper is an opportunity to explain things well.

michael-kotliar · 2021-06-27T19:05:35Z

paper/draft.md

+### Statement of need
+The JSON data model is a popular way to represent structured data. It is attractive because of its relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity comes at the cost that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces.
+
+JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs but is not itself a schema language. Without a schema providing a well-defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct.


It seems like we should add some final sentence at the end giving the idea "And that's why we developed schema-salad". Otherwise, the thoughts go from JSON to JSON-LD and its limitations and then break off.

mr-c · 2021-09-15T16:08:00Z

Lets also mention the codegen feature, used to create https://github.com/common-workflow-lab/cwljava and https://github.com/common-workflow-language/cwl-utils/blob/main/cwl_utils/parser_v1_2.py

Star to collect materials for paper

a947776

michael-kotliar marked this pull request as draft June 22, 2021 14:14

michael-kotliar mentioned this pull request Jun 22, 2021

publish about schema_salad in JOSS #390

Open

mr-c reviewed Jun 22, 2021

View reviewed changes

mr-c requested a review from tetron June 22, 2021 15:15

michael-kotliar commented Jun 27, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working on paper #406

Working on paper #406

michael-kotliar commented Jun 22, 2021

codecov bot commented Jun 22, 2021 •

edited

Loading

mr-c Jun 22, 2021

mr-c Jun 22, 2021

pjotrp Jun 22, 2021

pjotrp Jun 23, 2021

michael-kotliar Jun 27, 2021 •

edited

Loading

pjotrp Jun 30, 2021

michael-kotliar Jun 27, 2021 •

edited

Loading

pjotrp Jun 30, 2021

pjotrp Jun 30, 2021

michael-kotliar Jun 27, 2021 •

edited

Loading

mr-c commented Sep 15, 2021

Working on paper #406

Are you sure you want to change the base?

Working on paper #406

Conversation

michael-kotliar commented Jun 22, 2021

codecov bot commented Jun 22, 2021 • edited Loading

Codecov Report

mr-c Jun 22, 2021

Choose a reason for hiding this comment

mr-c Jun 22, 2021

Choose a reason for hiding this comment

pjotrp Jun 22, 2021

Choose a reason for hiding this comment

pjotrp Jun 23, 2021

Choose a reason for hiding this comment

michael-kotliar Jun 27, 2021 • edited Loading

Choose a reason for hiding this comment

Figures

pjotrp Jun 30, 2021

Choose a reason for hiding this comment

michael-kotliar Jun 27, 2021 • edited Loading

Choose a reason for hiding this comment

pjotrp Jun 30, 2021

Choose a reason for hiding this comment

pjotrp Jun 30, 2021

Choose a reason for hiding this comment

michael-kotliar Jun 27, 2021 • edited Loading

Choose a reason for hiding this comment

mr-c commented Sep 15, 2021

codecov bot commented Jun 22, 2021 •

edited

Loading

michael-kotliar Jun 27, 2021 •

edited

Loading

michael-kotliar Jun 27, 2021 •

edited

Loading

michael-kotliar Jun 27, 2021 •

edited

Loading