Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on DataTypes #89

Closed
barbieri opened this issue Jul 14, 2021 · 15 comments
Closed

Discussion on DataTypes #89

barbieri opened this issue Jul 14, 2021 · 15 comments

Comments

@barbieri
Copy link
Contributor

Introduction

Currently the specification and tools support the traditional C-like datatypes such as uint8 and also some common complex types such as string and byteBuffer (blob), as well as array of these by adding [] suffix.

Demand/Problem

The vss-tools introduces a new, unspecified data type UNIX Timestamp: https://github.com/GENIVI/vss-tools/blob/master/vspec/model/constants.py#L117-L143 which defines a point in time since Jan 1st 1970 UTC. The PR #79 aims to add two more data-types common in a vehicle: DateTime (full time and date with timezone, as per ISO8601) and HSVColor (color defined in Hue-Saturation-Value, but could be in any format).

While primitive data types can be used to build any kind of data structure, it's common in every language to define higher-level types based on them, defining a formal structure and operations. For instance we could define DateTime as a simple string and document somewhere it's ISO8601, but that would leave users with the manual task to convert that to the native programming Date object or equivalent. Likewise we could define the color as 3-double and specify somewhere that it's a hue-saturation-value tuple in order, from 0.0-1.0

List of Data Types

The following table showcases possible Data Types in the VSS Context:

DataType Purpose in a Vehicle
Color ambient lights, cluster elements and many IVI customizations. Could be RGB, HSV, Lab...
DateTime with timezone time synchronization, alarms/scheduling, calendar, key element of logging. Time zone is important, particularly in communications since remote peers may be in another zone
GPS Latitude, Longitude and Altitude, eventually Heading - Used by navigation, location synchronization and even part of logging "where did it happen?"

These are all multi-dimensional data, some are not a simple array of the same primitive data type (ie: DateTime).

➡️ NOTE: This list is an initial set of data types we've reached in our use cases. Please list more in the comments

Approaches

There are different approaches to allow these:

  1. allow a name to be defined and used, which its specification being done elsewhere. This is the case with GraphQL Scalars where the schema only define its name scalar SomeName, the actual specification is done elsewhere. Since it's just a symbol registry, it's easy to implement, yet it is an improvement over the simple documentation, which would require manual mapping of the basic type to something else.
  2. introduce some structure definition, this is a superset of the name registry as it still keeps the valid symbols, but formally defines how to parse the contents. This is done by every data serialization platform such as Protobuf, FrancaIDL - 5.1.5 Structures and others.

Discussion

Please leave a comment with your point and rationale. These will be discussed in the VSS weekly meeting.

Core Question: Should VSS allow for custom/private data types or they all must be specified and present in the vspec/model/constants.py?

If to allow custom/private data types:

  • Is a simple name registry enough (approach 1 above, similar to GraphQL scalars)?
  • Or do we need a structure definition infrastructure similar to Franca/Protobuf (approach 2)?
  • Or any other approach?

If they all must be specified and in the vspec/model/constants.py: Will complex data types be ever allowed or we should stick with primitive data types only?

  • If not allowed, just primitive data types: what's the recommendation to "specify" them? (Ab)Use unit? Just write in the description?
  • If allowed, let's define a process documenting the rules (self-check, submission, approval rules)
@barbieri
Copy link
Contributor Author

My position on the matter: they should be all specified in the VSS, allowing complex data types given they are of general use.

The process could be based on a PR adding the new type to the enumeration and documentation (until it's auto-generated), the proposal should be clear on the usage, the data type should be general and point to at least 2-3 widely know languages that provide such data type to showcase this.

For instance, instead of DateTime without timezone, we use the version with that. So it covers all use cases and we can avoid introducing types for just that (ie: local datetime, utc date time, datetime with timezone).

Same for Color, we may have various colorspaces, all with each details. We should pick one and block others from being merged. However this requires the types to be well-though, say we pick RGB, instead of a simple triad of uint8, go with double from 0.0-1.0 as it will not be bound to 24 bits precision (which may be low in some rare circumstances). OR leave the type as an open concept and each platform defines its own needs and encoding.

@Floix
Copy link

Floix commented Jul 14, 2021

Thanks for this great explanation.

I wonder why we need to make very specific datatypes (GPS, Color) first class citizens of our datatype system. So far this is mostly primitives (except for UNIX Timestamp which we should maybe migrate to Uint64?). The beauty of the VSS model is, that you can bascially create an reuse any sort of type yourself by introducing sub trees, which you can reuse on certain points.

Taking the example of the color, which may consist of Hue, Saturation and Value, you could just create a file color.vspec, with a color branch on top, and three leaf nodes (Hue, Sat, Val) each being a Uint8 datatype, right?

I would keep the meta-model (type, datatypes, etc.) as simple as possible and try to introduce complex structures within the spec. Just my five cents.

@barbieri
Copy link
Contributor Author

@Floix I believe that's also @gunnarx PoV (from the calls, but let's wait for his comment here).

We started just like you said, for some (ie: GPS) we still have that approach. However for others it became really cumbersome to use, it becomes clear when you write the apps and server. In particular the date with all of its components, but also, the color components are basically meaningless on their own.

It's also the reason why GraphQL offered the custom scalars in addition to their own basic types (they also have branches -- called types -- and primitive types -- called scalars), but still every public schema I've seen define DateTime and many define types such as URL (which is a string, however in a particular format and use) and those dealing with User Interface often provide a Color (usually a string in #rrggbb format).

In our case, for instance, given how cumbersome it was to use the split components, shall these data types not be accepted, we'll likely encode them as strings and document that, ie datatype: string, description: ISO8601 encoded date & time; datatype: string, description: hsv(...) (or just RGB).

@SebastianSchildt
Copy link
Collaborator

I strongly believe we should keep a small set of core datatypes.

We should not

  • allow any kind of compound datatypes
  • adding some very specific datapoints
  • add an "easy mechanism" to add your own

Rationale: Every datatype means some special handling in almost all tooling (Code generators, VISS servers), and we would quickly move away form being "simple". (You can always model via branches,if you really need to, e.g. see GPS)

That being said, we might discuss careful extensions of the datamodel, where I feel "URL" or "timestamps" might be discussed (event though I do not think those are needed), but HSVColor would really be out of scope unhelpful and way too specific.

I am not sure (this might be not clever), whether there should be "best" (worst?) practices where we say: Hey a "string" can contain any json object if you like, or alternatively whether we should introduce some kind of "binary"/"blob" datatype (like base64 in XML, in those CDATA sections). (We do have in our documentation a byteBuffer thingy, I do not think it has ever been used?)
While that would not be expressive in VSS directly, at least it would give the requirements to tools, such as VISS servers, Code generators to "deal with it", so it would become more obvious how to get "arbitrarily structured" data form a single VSS sensors/actor across the VSS landscape into an application.

@gunnarx
Copy link

gunnarx commented Jul 20, 2021

A few replies:

In particular the date with all of its components

A perfect example. Even with the features discussed below (aggregate) I propose to make "intelligent" definitions in any standard catalog. DateTime seems like it should perhaps not be split into components/leaves because it is a clearly aggregate type, and it seems very rare to need only one part of it.

(except for UNIX Timestamp which we should maybe migrate to Uint64?).

I think dropping the UNIX timestamp data type from VSS make sense. The main question is then if it's worth replacing with a better DateTime data type or simply propose it to have the basic type "string" (+ description field, which should specify the format of the string).

Today we had discussion on the "aggregate" keyword (which is already documented). It should suggest to "writers" that an update should be atomic, and suggests to readers they may want to fetch this whole branch at once. For both cases this is to avoid any resulting race condition by unsynchronized updates to different components.
(For example you cannot reliably write or read seconds and minutes separately in a time stamp).

Aggregate suggests that the branch can be interpreted as a structure containing its leaves (and/or subbranches).

Example (possible solution for HSVColor, but in each case consider if breaking it down to components really is worth it).

x.y.z.color:
     type: branch
-->  aggregate: true  <------

x.y.z.color.hue:
     type: double
     description:  Hue value in degrees, 0.0 <= n < 360.0
     
x.y.z.color.saturation:
     type: double
     description: Saturation in percent from 0.0 to 100.0
     
x.y.z.color.value:
     type: double
     description: Brightness value in percent from 0.0 to 100.0 

Another option is to keep it "untyped" and use a fixed-size array:

x.y.z.color:
     type: double[]
     arraysize: 3
     description:  Color in HSV format.   First component is Hue (0.0 <= n < 360.0), second is Saturation (0.0 <= n <= 100.0), third is Value/Brightness (0.0 <= n <= 100.0)

Next up is for everyone to consider if aggregate is an acceptable solution to the fundamental problem, and if we say yes, update its documentation, and make sure it is supported in the tools.

@erikbosch
Copy link
Collaborator

Concerning unix timestamp -we actually do not use it in VSS even if it is an allowed type. The only signal referring to timestamp (Vehicle.Powertrain.Battery.Charging.Timer.Time) actually use a uint32 as type.

@barbieri
Copy link
Contributor Author

I wasn't aware of aggregate: true, it may indeed help.

As said in the call, it would be interesting if I can get from vss-tools if the branch is being reused... so if I declare a color.vspec and use it in 3 different places, I can get that information and avoid creating 3 different types on my side... right now we just expand the types names based on their paths (ie: Vehicle_Cabin_...._Color_Hue), in that case I could declare a shared/color.vspec and detect that, generating Shared_Color instead

@crea7or
Copy link
Contributor

crea7or commented Sep 7, 2021

While moving our structures to VSS/VISS we have faced the same problem - we need some kind of aggregate types. Currently the both ways have drawbacks.

x.y.z.color:
  type: branch
  aggregate: true
 
x.y.z.color.hue:
     type: double
     description:  Hue value in degrees, 0.0 <= n < 360.0
      
x.y.z.color.saturation:
     type: double
     description: Saturation in percent from 0.0 to 100.0
      
x.y.z.color.value:
     type: double
     description: Brightness value in percent from 0.0 to 100.0 

This is good variant, but how to handle subscription? Client should receive one message when components saturation & hue will be updated, not the two messages when first updated saturation and hue in the next message. One possible solution is multi Set that allow to set different values with different types in the one message and subscribe notificaiton should be deffered until all valuse is updated by some agregate flag.

x.y.z.color:
     type: double[]
     arraysize: 3
     description:  Color in HSV format.   First component is Hue (0.0 <= n < 360.0), second is Saturation (0.0 <= n <= 100.0), third is Value/Brightness (0.0 <= n <= 100.0)

This variand can be done with a few easy modifications, but syntaxt to set an array should be described in the documentation in this case and this case does not handle valus with different types i.e. named set.

@barbieri
Copy link
Contributor Author

barbieri commented Sep 8, 2021

@crea7or the array version is only usable when all the components are of the same data type, which may not be the case for every thing (ie: if you have the GPS position + timestamp, the timestamp would have to be forced as a double)

@crea7or
Copy link
Contributor

crea7or commented Sep 9, 2021

@barbieri yep and this is why the first idea is better. Right now I have done "CAN frame" processing and noticed that I should send all signals(from frame) to VISS one-by-one since set doesn't allow to set different values in multipath request. Agregate in a manner of message can be useful when we'll be able to set all signals in the one message. Not only for CAN signals, but any other structs like health data that can be complex and require to set all containing values before triggering subscription notification.

@danielwilms
Copy link
Contributor

@danielwilms, summarise and create new issue, if necessary

@Mack-y
Copy link

Mack-y commented Sep 23, 2021

hello,I define a node with the datatype of bytebuffer. But when I parse the vspec file by vspec2yaml.py,it shows the error ---》KeyError: 'bytebuffer'
I checked the ‘vspec/model/constants.py’ that there’s no datatype of ‘bytebuffer’ or ‘byteBuffer’...
But byteBuffer indeed exists in the specification.
How could use the vss tool to parse the vspec file?

@crea7or
Copy link
Contributor

crea7or commented Sep 23, 2021

hello,I define a node with the datatype of bytebuffer. But when I parse the vspec file by vspec2yaml.py,it shows the error ---》KeyError: 'bytebuffer'
I checked the ‘vspec/model/constants.py’ that there’s no datatype of ‘bytebuffer’ or ‘byteBuffer’...
But byteBuffer indeed exists in the specification.
How could use the vss tool to parse the vspec file?

It's better to create a separate issue.

@Mack-y
Copy link

Mack-y commented Sep 23, 2021

hello,I define a node with the datatype of bytebuffer. But when I parse the vspec file by vspec2yaml.py,it shows the error ---》KeyError: 'bytebuffer'
I checked the ‘vspec/model/constants.py’ that there’s no datatype of ‘bytebuffer’ or ‘byteBuffer’...
But byteBuffer indeed exists in the specification.
How could use the vss tool to parse the vspec file?

It's better to create a separate issue.

ok,I‘ve created a new issue

@erikbosch
Copy link
Collaborator

I am doing some cleanup closing all issues created 2021 or earlier where there have been no activity 2023 or later. If you still consider this issue relevant feel free to reopen it add and add a comment on how you would like to see progress on this one.

@erikbosch erikbosch closed this as not planned Won't fix, can't repro, duplicate, stale Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants