Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spec] Token definitions #70

Open
hexabits opened this issue May 15, 2018 · 0 comments
Open

[Spec] Token definitions #70

hexabits opened this issue May 15, 2018 · 0 comments

Comments

@hexabits
Copy link
Member

Tokens could be defined directly in nif.xml which would help with simplification of writing and parsing expressions or composing strings in various attributes, while improving readability and adding context.

Right now the logical and arithmetic operators are used in nif.xml without any description. Also, various expressions used in vercond can be very complex to write, and harder to decipher. They can also be repeatedly used, in the case of custom versions (Bethesda and others). Tokens defined by the XML itself would give these expressions context and readability. Tokenizing the expressions with repeated usage would increase maintainability as the expression string would be in only one location.

Tokenizing operators would also alleviate issues with parsing, reading, and writing (typing) XML entities such as & > and <. It would also remove ambiguity between &/&& and |/|| which is currently an issue that parsers have to deal with specially.

Example:

<operator token="#ADD#" string="+" />
<operator token="#SUB#" string="-" />
<operator token="#MUL#" string="*" />
<operator token="#DIV#" string="/" />
<operator token="#AND#" string="&amp;&amp;" />
<operator token="#OR#" string="||" />
<operator token="#LT#" string="&lt;" />
<operator token="#GT#" string="&gt;" />
<operator token="#LTE#" string="&lt;=" />
<operator token="#GTE#" string="&gt;=" />
<operator token="#EQ#" string="==" />
<operator token="#NEQ#" string="!=" />
<operator token="#BITAND#" string="&amp;" />
<operator token="#BITOR#" string="|" />

<!-- Note: The strings for expression tokens may need to use the operator tokens. -->
<expression token="#BS202#" string="((Version == 20.2.0.7) &amp;&amp; (User Version 2 &gt; 0))" />
<expression token="#BSSTREAM#" string="(User Version 2 &gt; 0)" />
<expression token="#NISTREAM#" string="(User Version 2 == 0)" />
<expression token="#DIVINITY2#" string="((User Version == 0x20000) || (User Version == 0x30000))" />
<expression token="#SSE#" string="(User Version 2 == 100)" />
<expression token="#FO4#" string="(User Version 2 == 130)" />

Parsers would have two main ways of dealing with tokens, as first-class entities (ignoring the string attr and dealing with the tokens directly), or as second-class entities (using string contents to replace the token and dealing with the strings like before).

Operators are best dealt with as first-class, otherwise you eliminate the benefits of tokens for both regex and non-regex parsing. Other token types might be best dealt with as basic string replacement.

Other potential usages:

Commonly used default values

<default token="#FLT_MAX#" string="3.402823466e+38" />
<default token="#FLT_MIN#" string="-3.402823466e+38" />
<default token="#INV_FLT#" string="-3.402823466e+38" />
<default token="#INV_VEC3#" string="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
<default token="#INV_VEC4#" string="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />

<!-- BEFORE -->
<niobject name="NiConstColorEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Color4" default="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
</niobject>

<niobject name="NiConstFloatEvaluator" inherit="NiEvaluator">
    <add name="Value" type="float" default="-3.402823466e+38" />
</niobject>

<niobject name="NiConstPoint3Evaluator" inherit="NiEvaluator">
    <add name="Value" type="Vector3" default="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
</niobject>

<niobject name="NiConstQuaternionEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Quaternion" default="-3.402823466e+38, -3.402823466e+38, -3.402823466e+38, -3.402823466e+38" />
</niobject>

<niobject name="NiBSplineEvaluator" inherit="NiEvaluator">
    <add name="Start Time" type="float" default="3.402823466e+38" />
    <add name="End Time" type="float" default="-3.402823466e+38" />
    <add name="Data" type="Ref" template="NiBSplineData" />
    <add name="Basis Data" type="Ref" template="NiBSplineBasisData" />
</niobject>

<!-- AFTER -->
<niobject name="NiConstColorEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Color4" default="#INV_VEC4#" />
</niobject>

<niobject name="NiConstFloatEvaluator" inherit="NiEvaluator">
    <add name="Value" type="float" default="#INV_FLT#" />
</niobject>

<niobject name="NiConstPoint3Evaluator" inherit="NiEvaluator">
    <add name="Value" type="Vector3" default="#INV_VEC3#" />
</niobject>

<niobject name="NiConstQuaternionEvaluator" inherit="NiEvaluator">
    <add name="Value" type="Quaternion" default="#INV_VEC4#" />
</niobject>

<niobject name="NiBSplineEvaluator" inherit="NiEvaluator">
    <add name="Start Time" type="float" default="#FLT_MAX#" />
    <add name="End Time" type="float" default="#FLT_MIN#" />
    <add name="Data" type="Ref" template="NiBSplineData" />
    <add name="Basis Data" type="Ref" template="NiBSplineBasisData" />
</niobject>

Note: #FLT_MIN# and #INV_FLT# have the same string value but provide different context for the usage of the value. One is the negation of #FLT_MAX# regarding start/end time in a sequence, and the other is denoting an invalid uninitialized value.

Real world example: These defaults caused a compiler error in niflib when I accidentally included a trailing comma in one of them. This would not have happened if I had been using tokens.

Forthcoming block versioning (will get separate ticket)

<!-- Note: Space-separated is the correct way to make lists in XML -->
<!-- Note: VXX_X_X_X is a unique ID for <version> being added to the spec, will receive ticket. -->
<versionset token="#BETHESDA#" string="V10_0_1_2 V10_1_0_101 V10_1_0_106 V10_2_0_0__10 V20_0_0_4__10 V20_0_0_4__11 V20_0_0_5_OBL V20_2_0_7__11_1 V20_2_0_7__11_2 V20_2_0_7__11_3 V20_2_0_7__11_4 V20_2_0_7__11_5 V20_2_0_7__11_6 V20_2_0_7__11_7 V20_2_0_7__11_8 V20_2_0_7_FO3 V20_2_0_7_SKY V20_2_0_7_SSE V20_2_0_7_FO4" />
<versionset token="#FO3#" string="V20_0_0_4__11 V20_2_0_7__11_1 V20_2_0_7__11_2 V20_2_0_7__11_3 V20_2_0_7__11_4 V20_2_0_7__11_5 V20_2_0_7__11_6 V20_2_0_7__11_7 V20_2_0_7__11_8 V20_2_0_7_FO3" />
<versionset token="#SSE#" string="V20_2_0_7_SSE" />
<versionset token="#FO4#" string="V20_2_0_7_FO4" />

<niobject name="bhkRefObject" versions="#BETHESDA#">
<niobject name="bhkSerializable" versions="#BETHESDA#">
<niobject name="bhkWorldObject" versions="#BETHESDA#">
<niobject name="bhkEntity" versions="#BETHESDA#">

<niobject name="bhkPoseArray" versions="#FO3#" />
<niobject name="bhkPhysicsSystem" versions="#FO4#" />

In brief, since it will be discussed in its own ticket, block versioning being reintroduced has run into issues with the granularity of since/until in that for custom blocks it is not good enough and would require vercond expressions, which should be avoided at all costs. Instead each version can be listed (<version> now being every distinct format including user and Bethesda versions) but this gets extremely repetitious and needs a system of abbreviation.

For Discussion

Delimiter

Is # the best delimiter? There is also @. Neither have any uses in logic/arithmetic or occur commonly in the strings we use in nif.xml. Especially not an issue since the delimiter goes at each end, which I think is required to eliminate any token containing another e.g. #LT and #LTE collide.

Organization

Do we put all token elements nested under a <tokens>? Or do we leave them flat alongside <version>, <basic>, etc. I feel grouping them is important as it makes sure an XML parser can grab every token at once without needing to know tag names.

Element tag names

Do we use multiple tag names to specify/limit the token usage, i.e. contexts?

Contexts

Separate tag name and contexts would require separate maps/dictionaries on the parser side. There are already 4 example contexts (operator, expression, default, versionset). Currently the example specification does not include any built-in way of specifying what tags and attributes a token's context is limited to, so the association with tag name -> attributes to parse using it would have to be manual.

Reusing token identifiers in differing contexts

Note that I use #FO4# and #SSE# in both <versionset> and <expression>. Given that these tags are used in completely different domains, is this OK? Reuse or not, contexts means that technically a parser can no longer do a naive search/replace on the entire file in memory before reading the XML in. Because regardless of identifier collisions, a search/replace violates the limitation on association of a token tag to a specific element attribute.

Token string attributes

Expression token strings will likely need to use the operator tokens, because first-class token parsers would no longer know about &amp;&amp;, et al.

Also if a parser is treating tokens as second-class, this means that order matters as all <operator> tags need to be read in first. Then as each <expression> tag is read in, the parser will replace the tokens with the operator strings.

This was referenced May 31, 2018
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018
Tokens are grouped by name and associated with a list of attributes.  Only listed attributes should be processed for tokens of that name.

Order of token groups matters.  Token strings can themselves include tokens, so any token group used in other tokens needs to come after.
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018
Using tokens from niftools#70 it is now feasible to version the Bethesda blocks.
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 5, 2018
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 6, 2018
For niftools#70

Have not decided on the non-escaped entities such as `==`, `|`, `||`, etc. yet, as at least `==` is much more readable than `#EQ#`.

Replaced all Version, User Version, etc. with their global tokens.

Also cleaned up NiTexturingProperty in the process.
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 7, 2018
For niftools#70 and niftools#76

Adopting the token syntax of the other tokens, TEMPLATE and ARG can now be treated the same as other tokens and differentiated from regular identifiers in a generic manner.
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 14, 2018
For niftools#70

Their use was discussed as being useful for shifting and masking values before sending as an ARG.
hexabits added a commit to hexabits/nifxml that referenced this issue Jun 25, 2018
For niftools#76.  `calc` does not need to be supported and is only used for pre-serialization preparation of the data.  This can also be done by hand.

Some revisions also need to be made for niftools#70 and niftools#73 to account for another attribute with its own expression grammar and tokens.

However, the tokens introduced for `calc` will not be added to the tokens in the XML and will need to be explicitly supported if supporting `calc`.  They are:

1D array size (uses regex): `#LEN[(.*?)]#`
2D array size (uses regex): `#LEN2[(.*?)]#`

Ternary `?`: `#THEN#`
Ternary `:`: `#ELSE#`

Also removes any unnecessary `calculated` without replacement.

Additionally, `calc` is used to limit array size for the time being, though it's possible this should be done with its own attribute, such as `maxlen`.
hexabits added a commit to hexabits/nifxml that referenced this issue Aug 13, 2018
For niftools#70 and niftools#76

Added tokens for range attribute strings.

Did initial pass of ranges, many from Bethesda's official export scripts for FO4, which limit the values in the UI.  Also have begun using data analysis to find implied min/max ranges (as well as sane defaults).
hexabits added a commit to hexabits/nifxml that referenced this issue Aug 13, 2018
For niftools#70 and niftools#76

Added tokens for range attribute strings.

Did initial pass of ranges, many from Bethesda's official export scripts for FO4, which limit the values in the UI. Also have begun using data analysis to find implied min/max ranges (as well as sane defaults).
hexabits added a commit to hexabits/nifxml that referenced this issue Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant