Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Separated Values support #31

Open
domel opened this issue Mar 13, 2024 · 4 comments
Open

Unicode Separated Values support #31

domel opened this issue Mar 13, 2024 · 4 comments
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial

Comments

@domel
Copy link

domel commented Mar 13, 2024

I propose an enhancement to how quoted triples/triple terms are currently handled within the context of our data representation, specifically regarding their integration into CSV and TSV formats. As you are aware, both CSV and TSV are flat data formats that are doing poorly to support the nested nature of quoted triples/triple terms. This limitation poses a significant challenge in representing hierarchical data structures in a tabular form, which is a crucial requirement for various data exchange and processing scenarios.

Quoted triples/triple terms, given their nested character, require a more flexible and inherently hierarchical format to be represented efficiently in a tabular manner. To address this, I propose the adoption of Unicode Separated Values (USV) as a new result format for handling such cases.

Unicode Separated Values (USV) is a data format (the IETF is currently working on the appropriate RFC) designed for exchanging and converting data between various spreadsheet programs, databases, and streaming data services. The key advantage of USV over traditional flat formats like CSV and TSV is its ability to define groups, which can more naturally represent the structure of quoted triples/triple terms within a tabular context. This enhancement would facilitate a more intuitive and effective method of data representation and exchange, especially in applications involving complex graph-based data structures.

See also:

@domel domel added the spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial label Mar 13, 2024
@domel
Copy link
Author

domel commented Mar 13, 2024

For example:

x<US>quoted<ESC>
"Alice"<US><RS>http://example/alice<US>http://example/knows<US>http://example/bob<RS><ESC>
"Bob"<US><RS>http://example/bob<US>http://example/knows<US>http://example/alice<RS><ESC>
"Carol"<US><RS>http://example/carol<US>http://example/says<US>""Hello world, my name is """"Alice"""".""<RS><ESC>

@afs
Copy link
Contributor

afs commented Mar 15, 2024

It looks like a useful format for results transmission machine-to-machine where signifcant literal strings are involved. However, it's still a draft. What is the uptake?

An approach could be to define an abstraction of resultset-vars-rows-cells then map this to CSV, TSV, USV and other realisation formats (not affect JSON and XML because they exist already and compatibity matters).

This would be a framework for binary forms e.g. protobuf which is significantly faster to process due to the length-indicators on strings.

@domel
Copy link
Author

domel commented Mar 16, 2024

The concept you've described, particularly focusing on the abstraction of resultset-vars-rows-cells for mapping to various realization formats like CSV, TSV, and USV, and its potential application to binary forms such as protobuf, presents a forward-thinking approach to data serialization and interchange. The idea of establishing a high-level abstraction for data representation that could seamlessly adapt to multiple formats is indeed promising.

@domel
Copy link
Author

domel commented Mar 16, 2024

USV seems to be quite popular (for its age), including its other variants ASCII Separated Values ​​(ASV) a.k.a. DEL (Delimited ASCII).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial
Projects
None yet
Development

No branches or pull requests

2 participants