[Spec] Token definitions #70

hexabits · 2018-05-15T14:00:28Z

Tokens could be defined directly in nif.xml which would help with simplification of writing and parsing expressions or composing strings in various attributes, while improving readability and adding context.

Right now the logical and arithmetic operators are used in nif.xml without any description. Also, various expressions used in vercond can be very complex to write, and harder to decipher. They can also be repeatedly used, in the case of custom versions (Bethesda and others). Tokens defined by the XML itself would give these expressions context and readability. Tokenizing the expressions with repeated usage would increase maintainability as the expression string would be in only one location.

Tokenizing operators would also alleviate issues with parsing, reading, and writing (typing) XML entities such as & > and <. It would also remove ambiguity between &/&& and |/|| which is currently an issue that parsers have to deal with specially.

Example:

<operator token="#ADD#" string="+" />
<operator token="#SUB#" string="-" />
<operator token="#MUL#" string="*" />
<operator token="#DIV#" string="/" />
<operator token="#AND#" string="&amp;&amp;" />
<operator token="#OR#" string="||" />
<operator token="#LT#" string="&lt;" />
<operator token="#GT#" string="&gt;" />
<operator token="#LTE#" string="&lt;=" />
<operator token="#GTE#" string="&gt;=" />
<operator token="#EQ#" string="==" />
<operator token="#NEQ#" string="!=" />
<operator token="#BITAND#" string="&amp;" />
<operator token="#BITOR#" string="|" />

<!-- Note: The strings for expression tokens may need to use the operator tokens. -->
<expression token="#BS202#" string="((Version == 20.2.0.7) &amp;&amp; (User Version 2 &gt; 0))" />
<expression token="#BSSTREAM#" string="(User Version 2 &gt; 0)" />
<expression token="#NISTREAM#" string="(User Version 2 == 0)" />
<expression token="#DIVINITY2#" string="((User Version == 0x20000) || (User Version == 0x30000))" />
<expression token="#SSE#" string="(User Version 2 == 100)" />
<expression token="#FO4#" string="(User Version 2 == 130)" />

Parsers would have two main ways of dealing with tokens, as first-class entities (ignoring the string attr and dealing with the tokens directly), or as second-class entities (using string contents to replace the token and dealing with the strings like before).

Operators are best dealt with as first-class, otherwise you eliminate the benefits of tokens for both regex and non-regex parsing. Other token types might be best dealt with as basic string replacement.

Other potential usages:

Commonly used default values

<default token="#FLT_MAX#" string="3.402823466e+38" />
<default token="#FLT_MIN#" string="-3.402823466e+38" />
<default token="#INV_FLT#" string="-3.402823466e+38" />
<default token="#INV_VEC3#" string="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
<default token="#INV_VEC4#" string="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />

<!-- BEFORE -->
<niobject name="NiConstColorEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Color4" default="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
</niobject>

<niobject name="NiConstFloatEvaluator" inherit="NiEvaluator">
    <add name="Value" type="float" default="-3.402823466e+38" />
</niobject>

<niobject name="NiConstPoint3Evaluator" inherit="NiEvaluator">
    <add name="Value" type="Vector3" default="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
</niobject>

<niobject name="NiConstQuaternionEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Quaternion" default="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
</niobject>

<niobject name="NiBSplineEvaluator" inherit="NiEvaluator">
    <add name="Start Time" type="float" default="3.402823466e+38" />
    <add name="End Time" type="float" default="-3.402823466e+38" />
    <add name="Data" type="Ref" template="NiBSplineData" />
    <add name="Basis Data" type="Ref" template="NiBSplineBasisData" />
</niobject>

<!-- AFTER -->
<niobject name="NiConstColorEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Color4" default="#INV_VEC4#" />
</niobject>

<niobject name="NiConstFloatEvaluator" inherit="NiEvaluator">
    <add name="Value" type="float" default="#INV_FLT#" />
</niobject>

<niobject name="NiConstPoint3Evaluator" inherit="NiEvaluator">
    <add name="Value" type="Vector3" default="#INV_VEC3#" />
</niobject>

<niobject name="NiConstQuaternionEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Quaternion" default="#INV_VEC4#" />
</niobject>

<niobject name="NiBSplineEvaluator" inherit="NiEvaluator">
    <add name="Start Time" type="float" default="#FLT_MAX#" />
    <add name="End Time" type="float" default="#FLT_MIN#" />
    <add name="Data" type="Ref" template="NiBSplineData" />
    <add name="Basis Data" type="Ref" template="NiBSplineBasisData" />
</niobject>

Note: #FLT_MIN# and #INV_FLT# have the same string value but provide different context for the usage of the value. One is the negation of #FLT_MAX# regarding start/end time in a sequence, and the other is denoting an invalid uninitialized value.

Real world example: These defaults caused a compiler error in niflib when I accidentally included a trailing comma in one of them. This would not have happened if I had been using tokens.

Forthcoming block versioning (will get separate ticket)

<!-- Note: Space-separated is the correct way to make lists in XML -->
<!-- Note: VXX_X_X_X is a unique ID for <version> being added to the spec, will receive ticket. -->
<versionset token="#BETHESDA#" string="V10_0_1_2 V10_1_0_101 V10_1_0_106 V10_2_0_0__10 V20_0_0_4__10 V20_0_0_4__11 V20_0_0_5_OBL V20_2_0_7__11_1 V20_2_0_7__11_2 V20_2_0_7__11_3 V20_2_0_7__11_4 V20_2_0_7__11_5 V20_2_0_7__11_6 V20_2_0_7__11_7 V20_2_0_7__11_8 V20_2_0_7_FO3 V20_2_0_7_SKY V20_2_0_7_SSE V20_2_0_7_FO4" />
<versionset token="#FO3#" string="V20_0_0_4__11 V20_2_0_7__11_1 V20_2_0_7__11_2 V20_2_0_7__11_3 V20_2_0_7__11_4 V20_2_0_7__11_5 V20_2_0_7__11_6 V20_2_0_7__11_7 V20_2_0_7__11_8 V20_2_0_7_FO3" />
<versionset token="#SSE#" string="V20_2_0_7_SSE" />
<versionset token="#FO4#" string="V20_2_0_7_FO4" />

<niobject name="bhkRefObject" versions="#BETHESDA#">
<niobject name="bhkSerializable" versions="#BETHESDA#">
<niobject name="bhkWorldObject" versions="#BETHESDA#">
<niobject name="bhkEntity" versions="#BETHESDA#">

<niobject name="bhkPoseArray" versions="#FO3#" />
<niobject name="bhkPhysicsSystem" versions="#FO4#" />

In brief, since it will be discussed in its own ticket, block versioning being reintroduced has run into issues with the granularity of since/until in that for custom blocks it is not good enough and would require vercond expressions, which should be avoided at all costs. Instead each version can be listed (<version> now being every distinct format including user and Bethesda versions) but this gets extremely repetitious and needs a system of abbreviation.

For Discussion

Delimiter

Is # the best delimiter? There is also @. Neither have any uses in logic/arithmetic or occur commonly in the strings we use in nif.xml. Especially not an issue since the delimiter goes at each end, which I think is required to eliminate any token containing another e.g. #LT and #LTE collide.

Organization

Do we put all token elements nested under a <tokens>? Or do we leave them flat alongside <version>, <basic>, etc. I feel grouping them is important as it makes sure an XML parser can grab every token at once without needing to know tag names.

Element tag names

Do we use multiple tag names to specify/limit the token usage, i.e. contexts?

Contexts

Separate tag name and contexts would require separate maps/dictionaries on the parser side. There are already 4 example contexts (operator, expression, default, versionset). Currently the example specification does not include any built-in way of specifying what tags and attributes a token's context is limited to, so the association with tag name -> attributes to parse using it would have to be manual.

Reusing token identifiers in differing contexts

Note that I use #FO4# and #SSE# in both <versionset> and <expression>. Given that these tags are used in completely different domains, is this OK? Reuse or not, contexts means that technically a parser can no longer do a naive search/replace on the entire file in memory before reading the XML in. Because regardless of identifier collisions, a search/replace violates the limitation on association of a token tag to a specific element attribute.

Token string attributes

Expression token strings will likely need to use the operator tokens, because first-class token parsers would no longer know about &&, et al.

Also if a parser is treating tokens as second-class, this means that order matters as all <operator> tags need to be read in first. Then as each <expression> tag is read in, the parser will replace the tokens with the operator strings.

The text was updated successfully, but these errors were encountered:

Tokens are grouped by name and associated with a list of attributes. Only listed attributes should be processed for tokens of that name. Order of token groups matters. Token strings can themselves include tokens, so any token group used in other tokens needs to come after.

Using tokens from niftools#70 it is now feasible to version the Bethesda blocks.

For niftools#70

Follow-up to fd66c97 for niftools#70

For niftools#70 Have not decided on the non-escaped entities such as `==`, `|`, `||`, etc. yet, as at least `==` is much more readable than `#EQ#`. Replaced all Version, User Version, etc. with their global tokens. Also cleaned up NiTexturingProperty in the process.

For niftools#70 and niftools#76 Adopting the token syntax of the other tokens, TEMPLATE and ARG can now be treated the same as other tokens and differentiated from regular identifiers in a generic manner.

For niftools#70 Their use was discussed as being useful for shifting and masking values before sending as an ARG.

For niftools#76. `calc` does not need to be supported and is only used for pre-serialization preparation of the data. This can also be done by hand. Some revisions also need to be made for niftools#70 and niftools#73 to account for another attribute with its own expression grammar and tokens. However, the tokens introduced for `calc` will not be added to the tokens in the XML and will need to be explicitly supported if supporting `calc`. They are: 1D array size (uses regex): `#LEN[(.*?)]#` 2D array size (uses regex): `#LEN2[(.*?)]#` Ternary `?`: `#THEN#` Ternary `:`: `#ELSE#` Also removes any unnecessary `calculated` without replacement. Additionally, `calc` is used to limit array size for the time being, though it's possible this should be done with its own attribute, such as `maxlen`.

For niftools#70 and niftools#76 Added tokens for range attribute strings. Did initial pass of ranges, many from Bethesda's official export scripts for FO4, which limit the values in the UI. Also have begun using data analysis to find implied min/max ranges (as well as sane defaults).

For niftools#70 and niftools#76

This was referenced May 15, 2018

[Spec] Block Versioning #71

Open

[Spec] Expression grammar formalization #73

Open

This was referenced May 31, 2018

[Spec] nifxml linter #74

Open

nifxml spec formalization #75

Merged

hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018

Bethesda block versioning

fd66c97

Using tokens from niftools#70 it is now feasible to version the Bethesda blocks.

hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018

Use tokens for defaults

30010df

For niftools#70

hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018

Use tokens for verconds

d886631

For niftools#70

hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018

Add versions to a few Bethesda-only compounds

f023ad3

Follow-up to fd66c97 for niftools#70

hexabits mentioned this issue Jun 7, 2018

[Spec] Additions, Removals, and Deprecations #76

Open

18 tasks

hexabits added a commit to hexabits/nifxml that referenced this issue Jun 14, 2018

RSH and LSH operators

331d2c7

For niftools#70 Their use was discussed as being useful for shifting and masking values before sending as an ARG.

hexabits added a commit to hexabits/nifxml that referenced this issue Aug 13, 2018

Add missing token for aa05717

a8a425b

For niftools#70 and niftools#76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec] Token definitions #70

[Spec] Token definitions #70

hexabits commented May 15, 2018

[Spec] Token definitions #70

[Spec] Token definitions #70

Comments

hexabits commented May 15, 2018

Example:

Other potential usages:

Commonly used default values

Forthcoming block versioning (will get separate ticket)

For Discussion

Delimiter

Organization

Element tag names

Contexts

Reusing token identifiers in differing contexts

Token string attributes