Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IE-0011: New data structures #11

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

curiousdannii
Copy link
Collaborator

@curiousdannii curiousdannii commented Aug 1, 2022

  • Proposal: IE-0011
  • Authors: Dannii Willis
  • Language feature name: None
  • Status: Draft
  • Related proposals: None
  • Implementation: None

Summary

Add some new data structure kinds into BasicInformKit.

Motivation

Inform has some data structures already, notably the List and the Table, but there are some others that would be very useful to authors for various purposes. These Data Structures were first prototyped in the extension Data Structures by Dannii Willis.

It is proposed that these kinds be brought into the core of Inform for these reasons:

  1. They are of general usefulness, and installing an extension/kit like this can be complicated. (Though hopefully much less so once IE-0001: Directory format for extensions with resources #1 is implemented.)
  2. At least in 6M62, it was not possible to specify how a new kind was to have its constant-compilation-method:special be implemented. This meant that most of these new kinds needed to be constructed with long blocks, when they could possibly be more efficient as short-block-only kinds.
  3. A function like ANY_TY_Print_Kind_Name can be constructed by the compiler rather than manually written.
  4. The Data Structures extension had to misuse loops in order to construct if statements with block scoped variables:
    [ We declare this as a loop, even though it isn't, because nonexisting variables don't seem to be unassigned at the end of conditionals. ]
    To if (O - value of kind K option) is some let (V - nonexisting K variable) be the value begin -- end loop:
    	(- if (BlkValueRead({-by-reference:O}, OPTION_TY_KOV) && (
    		(KOVIsBlockValue({-strong-kind:K})
    			&& BlkValueCopy({-lvalue-by-reference:V}, OPTION_TY_Get({-by-reference:O}))
    			|| ({-lvalue-by-reference:V} = OPTION_TY_Get({-by-reference:O}))
    		)
    	, 1)) -).
    
    If these kinds was brought into core then we could implemented them properly so hacks like this wouldn't be needed.

Components affected

  • Minor changes to the natural-language syntax (see point 4 above.)
  • Major changes to inbuild.
  • Change to inform7.
  • No change to inter.
  • No change to the Inter specification.
  • Change to runtime kits.
  • Changes to the Standard Rules and Basic Inform.
  • Changes to documentation.
  • No change to the GUI apps, when downloading or installing extensions.

Impact on existing projects

Could cause issues with any projects that use the new kind names as existing values, for example, many games might have a "map" thing or even a map kind. So should probably be part of a semver-major update.

Proposal

The following new kinds be added to Inform in the BasicInformKit:

Anys

An any stores a value and its kind; the kind cannot be determined at compile time, but can be read at run time. These are useful for when you want to store multiple kinds of values in one list or map, or for when you don't know what kind some data might be.

When play begins:
  let apple be "Royal Gala" as an any;
  if kind of apple is a text:
    say "[apple] is a text[line break]";
  if apple is a text let apple name be the value:
    say "Apple variety: [apple name][line break]";
  let year be apple as a number or 2022;

Couples/Tuples

A couple is a 2-tuple, grouping two values of any kind. Couples are useful for when you need to return two values of different kinds from a phrase.

To decide what couple of person and number is the person evaluation:
  decide on yourself and 1234 as a couple;
When play begins:
  let result be the person evaluation;
  say "Person: [first value of result][line break]Evaluation: [second value of result][line break]";

I couldn't work out how to implement a generic N-tuple type, but maybe it could be if incorporate into the core. Also, there does exist a hidden TUPLE_ENTRY_TY which I don't know anything about.

Maps

Maps store key-value pairs. Each map has a set kind for its keys and another set kind for its values, but if you need to store heterogenous keys or values you can make a map using anys.

When play begins:
  let data be a map of text to any;
  set key "player" of data to yourself;
  set key "score" of data to 0;
  set key "action" of data to the jumping action;
  if get key "score" of data is some let score be the value:
    say "Starting score: [score][line break]";
  let temperature be get key "temp" of data or 23 as an any;

Nulls

Nulls do not represent any value, but the positive affirmation of a lack of a value. They could be used, for example, when deserialising JSON.

Nulls could possibly be unified with the internal NIL_TY kind.

Options

An optional value, either nothing, or a value of a specific kind.

When play begins:
  let O1 be "Hello" as an option;
  let O2 be a text none option;
  if O1 is some let message be the value:
    say "Message: [message][line break]";
  let second message be value of O2 or "Goodbye";

Results

A result contains either a wrapped value or an error message text.

When play begins:
  let R1 be 1234 as a result;
  if R1 is okay let score be the value:
    say "Score: [score][line break]";
  let R2 be a number error result with message "Oops!";
  if R2 is okay let score be the value:
    say "Score: [score][line break]";
  otherwise if R2 is an error let  message be the error message:
    say "Error! [message][line break]";

Questions

  1. For each of these there is the question of what the best representation is: a short and long block? Short block only? I also never looked at hashing and how that would work for these new kinds. (I'm not sure when hashing is actually used by Inform.)

  2. How should Maps be implemented for efficient searching? Particularly string searching.

  3. Would it be feasible to have general purpose sum types added to Inform, in which case Options and Results would not need any special handling, but could just be kinds supported by default? And Results could return errors other than texts if that was the case.

  4. Should the unchecked phrases actually be included, or should only the safe phrases be supported? (If someone needed unsafe phrases they could perhaps be provided by a separate extension.)

@curiousdannii curiousdannii changed the title IE-0011: Proposal for some new data structures IE-0011: New data structures Aug 1, 2022
@curiousdannii
Copy link
Collaborator Author

curiousdannii commented Aug 2, 2022

I've been having a think more about how Maps should store their keys and realised there are several categories of kinds, with behaviours that affect more than just Map keys.

Category Example Copy Modify Equality Map keys
Simple values Number, truth state, objects, etc Copy values Modify in place Compare values Use directly
Block value Text Reference counting Mutate BV Compare contents Hash + compare contents
Short block value Stored actions? And possibly Anys/Options/Results Copy contents Modify in place Compare contents Hash + compare contents
Container List Reference counting Mutate BV Compare address Use address
Container with interior mutability Map Reference counting / explicit manual cloning Mutate interior BV Compare address Use address

So there are two types of what I'm calling "container" values, those that behave like Lists, and those that behave like Maps. Lots of people have surely been tripped up by the fact that if you pass a list to a phrase which modifies the list, the original phrase won't see the modifications. Would it be better to make Lists act like Maps and modify them internally, with explicit cloning when need?

For the block values, I think it might be best to store them in maps as hashed keys. I'm not sure if the existing I7 hashing is good for this purpose or not. Either way, the maps would actually then need to have three lists: hashed keys, actual keys, values. We need to store the actual keys for two reasons: so that they won't be reference counted and deleted, and to allow for the possibility of hash collisions because 32 bits is not a lot at all. We can use the Glulx search opcodes to do a first fast search for a matching hash, and then do a slower direct comparison of the contents. If the contents don't match then use the Glulx search opcodes to look for a second matching hash. By storing hashes then we don't need to compare the contents for every key, only some of them.

Perhaps we might even want to change the List kind to have an optional hidden second list for hashes, so that all list searches could be sped up?

If these categories were formalised then they could be included in the kind definitions, and we could maybe reduce how much special case code there is as the various template functions could decide what to do based on the kind category.

@erkyrath
Copy link
Collaborator

erkyrath commented Aug 2, 2022

I think more games will have a "map" object than will want to use a hash-map. Same with objects named "couple of Xs". And it's worth looking through the existing choice-based extensions to see if any of them define an "Option" kind.

As for "any", this is used as a descriptor word in all game code!

9.x gets away with "list of T" as a type name, and a quick test defining "any" as a kind doesn't seem to cause ambiguities either. (In 9.x, I mean.) So maybe this isn't a problem. But we definitely want to test this with test games that have treasure maps, etc.

@erkyrath
Copy link
Collaborator

erkyrath commented Aug 2, 2022

Now that I think about it, "an any" is on the jargon-y side; I don't like that in Inform. "Dynamic value" maybe?

Similarly, "optional T" instead of just "option".

@erkyrath
Copy link
Collaborator

erkyrath commented Aug 2, 2022

It sounds like Null is only useful in an Any. (Which is how JSON would use it.)

The only other use I can think of is a "get" operation on Map which returns a value or Null. But that's better expressed as returning an Option.

Is "Null" the singleton value or the kind? My Python experience says that you only ever refer to the value. That means "Null" will clash with a game object named "null" (unlike "map" or "list"). More stringent testing needed and maybe a different name.

It's possible that this isn't generally useful enough to be in BasicInformKit, and should instead be defined in the JSON extension.

@curiousdannii
Copy link
Collaborator Author

curiousdannii commented Aug 2, 2022

Yeah we'd definitely want to be careful about what to call all of these. These are the names used in the existing extension, so they compile without issue, but I didn't try using the names inside a real game with lots of other things using potentially conflicting names.

Same with objects named "couple of Xs".

I think that won't conflict as couples must be "couple of X and X" not "couple of Xs"? Now there could be an object using the first form, but it would probably be a little less common than the second.

Now that I think about it, "an any" is on the jargon-y side; I don't like that in Inform. "Dynamic value" maybe?

It is kind of jargony, but it's just so clean sounding in phrases like "map of any to text".

"Dynamic value" doesn't sound right to me because Inform has basically no immutability so all values are dynamic. (Also, from the perspective of anys as block values, they aren't dynamic - if you change the value of an any variable you replace it with a new any.) I tried calling it a "box" originally, but that's equally jargony, and even in other languages doesn't always mean that the type can be anything (it doesn't in Rust for example.)

I landed with "any" as that's what's used in C++, C#, Swift, TypeScript, etc.

I'm not sure if it would be valid, but maybe something like "any-kind" or "any kind" could work? Or "anykind" would definitely be valid, and kind of sounds somewhat natural in "map of anykind to text".

Similarly, "optional T" instead of just "option".

That sort of change would be fine with me. I think I used the nominal form for consistency so that all the added kinds would have nominal names rather than being mostly nominals with one adjectival.

It sounds like Null is only useful in an Any. (Which is how JSON would use it.)

The only other use I can think of is a "get" operation on Map which returns a value or Null. But that's better expressed as returning an Option.

Is "Null" the singleton value or the kind? My Python experience says that you only ever refer to the value. That means "Null" will clash with a game object named "null" (unlike "map" or "list"). More stringent testing needed and maybe a different name.

It's possible that this isn't generally useful enough to be in BasicInformKit, and should instead be defined in the JSON extension.

I've found three uses so far of a Null:

  1. The default kind/value of an Any.
  2. JSON nulls.
  3. To be used in other kinds such as a Result or Promise.

The first could perhaps be dealt with by changing the default value to Number/0.

If it was decided to not include Nulls in Basic Inform, then yes they could be defined in a JSON extension instead.

But it's the final one that I think shows their most use and why they should be included in Basic Inform. A null is useful for null results and null promises, the first being for a phrase which has no return value but could raise an error, the second for a long running process which again has no return value but for which people may want to await. In both cases you could just return a dummy number instead, so it's really for conceptual clarity that a null would be better.

Nulls isn't a singleton, it's a kind, but they all equal each other.

@curiousdannii
Copy link
Collaborator Author

I think all of these kinds would work in Z-Code. It would even be possible to port the Glulx search opcodes to Z-Code, though obviously they'd run much slower.

@curiousdannii
Copy link
Collaborator Author

curiousdannii commented Aug 3, 2022

A couple comments from Graham:

For example, how are "Maps" different from "relations"?

There's a few ways. Firstly maps are block values, so you can embed them in other kinds, use them as object properties, clone them, nest them, etc, whereas there's only one of each relation. The other big difference is that relations were limited to an arity of 2 (pre-v10). See this recent forum discussion. And relations really wouldn't be suitable for reading in a JSON file, which may have an unknown number of entries and may nest maps and lists to any depth.

This:

let O1 be "Hello" as an option;
 let O2 be a text none option;
 if O1 is some let message be the value:
   say "Message: [message][line break]";
 let second message be value of O2 or "Goodbye";

looks frankly a bit clumsy and non-Inform-like to me. Surely:

let O1 be an optional text;
let O1 be "Hello";
let O2 be an optional text;
let O2 be null;
if O1 is not null:
    let message be the value of O1;
    say "Message: [message][line break]";
let second message be value of O2 or "Goodbye";

This was my attempt at trying to make if let work in natural language. I think some of the if lets work sound nicer than others; the Anys, Maps and Results are all really natural sounding IMHO:

if MyAny is a text let MyAnyTextValue be the value:

if MyMap has key "hat" let hat score be the value:

if MyResult is okay let result be the value:

if MyResult is an error let message be the error message:

The options are more clumsy because I've stuck with the some/none terminology that other languages use. Maybe it would be better to say set/unset?

if MyOption is set let result be the value:

if MyOption is unset:

But a downside of this is that "unset" sounds more like an uninitialised option rather than an option that is initialised to a null/none value. Maybe we can think of another pair of terms that's even better.

We need if lets in order to prevent unsafe code. Languages like Rust and Typescript would raise a compiler error if you tried the following, but Inform 7 (currently) wouldn't be able to:

if O1 is null:
    let message be the value of O1;

By combining both of them in one phrase we ensure that the author can't accidentally access data in an invalid manner. And for maps it's also more performant to search and extract the value in one pass rather than test whether the map has a key and then extract the value for that key.

The huge advantage of Results in particular is that it forces users of a function to account for the possibility of a failure, they can't just assume that the function was a success. Inform phrases that currently can cause run time errors might be better if they were modified to return results, for example:

  • read (external file) into (table name)
  • choose a/the/-- row with (table column) of (value) in/from (table name)

And hey! Those are perfect examples of where null results would be useful. Null results are very common in Rust, though it's called the unit type instead of null. It's similar to a void in C. The null kind could be called "void", I called it "null" mainly because of JSON. (Also, isn't it technically incorrect to try to store a void in C? It might be misleading to call it a void kind then.)

let O1 be an optional text;
let O1 be "Hello";
let O2 be an optional text;
let O2 be null;

I'd be fine with these, but I wouldn't know how to implement them currently. Would that be a use for block value casting? I don't really know how that works, and the only current use of it that I saw was snippets to text. I like explicit casting phrases (X as an option) too though.

@erkyrath
Copy link
Collaborator

erkyrath commented Aug 3, 2022

if MyAny is a text let MyAnyTextValue be the value:

Surely the Inform idiom would be

if MyAny is a text (called MyAnyTextValue):

@curiousdannii
Copy link
Collaborator Author

Oh, yeah maybe. But do phrases with parenthesis like that even work?

@erkyrath
Copy link
Collaborator

erkyrath commented Aug 3, 2022

I can't make that work nicely for Map lookups, though.

if "Key" maps to a value (called Result) in MyMap: ...

(Reads okay but naming MyMap at the end is poor.)

@erkyrath
Copy link
Collaborator

erkyrath commented Aug 3, 2022

I expect this would require the compiler to expose its "...(called foo)" magic in some way.

@curiousdannii
Copy link
Collaborator Author

One pro for "K option" over "optional K" is that it makes nesting clearer.

Is "optional number result" a result containing an optional number or an optional containing a number result? Whereas "number result option" is unambiguous.

Nesting results and options is something people will want/need to use. For example, I used it for Promises:

To decide what K result option is value of (P - a value of kind K promise):

This returns an option containing a K result. If the promise is not yet resolved then it returns a none option. If the promise has been resolved then it returns a some option containing the promise's result.

@curiousdannii
Copy link
Collaborator Author

Inform phrases that currently can cause run time errors might be better if they were modified to return results, for example:

  • read (external file) into (table name)
  • choose a/the/-- row with (table column) of (value) in/from (table name)

I've changed my mind a little - probably for the average Inform author, having these phrases just print an error is okay, and having to deal with results might be overly complicated.

So instead of changing those phrases, there could be alternate forms that return results. Then extension authors can call those new phrases and be able to intercept errors their own way, whether that's printing the error or handling it some other way. Results would be most useful for extension authors who are able to silently handle errors.

@uecasm
Copy link

uecasm commented Dec 20, 2022

Possibly of interest is that Inform (both 9.3 and 10) already defines a Combination kind that is an n-tuple. It was just never given any I7 syntax so you couldn't really make any use of them. They appear to be used somewhere in the guts of Relations, but I haven't looked too closely at it.

@curiousdannii
Copy link
Collaborator Author

I started trying to port the extension for Inform 10, but it appears that the constructor kinds are now hardcoded in the compiler, so it's not possible to add new ones in extensions/kits.

https://github.com/ganelson/inform/blob/a79b56f24129b49c7c7c597ddf1fdd23a805e9c6/inform7/runtime-module/Chapter%202/Kind%20Declarations.w#L163-L174

@ganelson
Copy link
Owner

This is a point of the Inter design that I never felt very happy with, and I think it ought to be possible to change it; I hadn't realised the implications. There's no reason constructor kinds need to be hardwired.

@curiousdannii curiousdannii marked this pull request as draft July 28, 2023 05:17
@CelticMinstrel
Copy link

I hope it's not unwelcome for me to chime in over a year later…?

I just wanted to comment on the chosen syntax for each of the proposed types. (Though I've ignored promises and mostly ignored closures on the basis of them being both more advanced and more experimental.)

Any

If I'm not mistaken, the word "any" already has a meaning of sorts to Inform, so I think it might be bad to use it as the name of a kind. Maybe this is actually unambiguous with the other meaning though.

As to what the other meaning is:

To something or other:
	if any man (called the slouch) is on the couch:
		say "Look at that lazy [slouch]!";

I think that makes it basically one of the "noise words" alongside things like "a", "an", "some", etc. As far as I can tell it's only allowed in conditions though, not in assertions.

The only other possibility I can think of for a name though is "any value".

I'm also not happy with that "value as an any" syntax, but I don't have a better suggestion.

For these two:

To say kind/type of (A - any):
To decide if kind/type of (A - any) is (name of kind of value K):

I would say articles should be allowed here - "the" before "kind/type" (though maybe that already works since it's at the beginning of the phrase?) and "a/an/--" before the name of the kind.

Similarly, in these ones:

To decide what K result is (A - any) as a/an (name of kind of value K):
To decide what K is (A - any) as a/an (name of kind of value K) or (backup - K):

Is there some reason for the article to be required instead of optional?

It might be worth supporting some as an alternate to a/an in all these cases, too.

Couples

I don't actually have a comment on couples, but seeing the => syntax for accessing the values made me think it would be nice to have that work on lists too.

Maps

Is the new syntax actually necessary? As with lists, can't you already say "let M be a map of K to V"?

The clone syntax is probably useful for lists as well.

For checking keys, I think allowing "contains" as an alternative to "has" would be helpful for programmers coming from other languages, where that function is commonly called "contains".

I think it would be nicer if get were an optional keyword when looking up keys. Does that cause ambiguity problems or was it just not thought of?

The || option for the backup feels too programmer-ish for Inform. If it is included I think I'd prefer a single pipe instead of a double pipe (and if it works for maps it should probably also work for all the other backup cases).

It's slightly incongruous that getting and setting keys has a shorthand syntax (map => key [= value | or value]) but deleting and checking them does not. For checking, maybe something like "if map => key exists", and for deleting, maybe something like "map => key = none" or "del/delete map => key"?

Closures

I don't have much to say about these, but the term "closure" strikes me as a little strange here. Based on the example in the documentation, they seem more like what I'd call a "coroutine". What I'd call a "closure" does have some similar semantics, admittedly, in that it captures local variables from a stack somewhere, but from the way I understand things, it would usually be local variables from a function that has already terminated.

I was going to add an example in Inform-like pseudo-code, but I'm not even sure how to express what I'm trying to say in an Inform-like way. However, it's basically what you'd get from the (x, y) => expression syntax in JavaScript or the lamda x, y: expression syntax in Python when the expression references variables local to the current function.

Maybe that turns out to just be a special case of what you've implemented here as closures – I'm not sure. Since I'm not even sure how I'd express it in an Inform-like way, I can't even tell if your closures offer a way to express it.

Optionals

I second not being a fan of the "K option" syntax. I'm not sure if I can think of any good alternatives, but here are a few random ideas I thought of.

  • K or blank/empty/none – but this doesn't really suggest a type name…
  • K ? – might be a bit too short for Inform, though it's precedented in other languages
  • K optional – doesn't really change much but does at least avoid the reading of "a user setting of kind K"… though I have no idea how likely it would be for someone to have that misunderstanding
  • maybe K – probably runs into the same ambiguity issue discussed above
  • K maybe – sounds weird

I also really dislike the is some syntax. I think at least is something should be permitted, maybe also forms like is filled; and for the inverse, maybe is nothing, is empty, is blank, etc.

Also, though it works a bit against the list of forms I mentioned above, I notice that there's perhaps a missed opportunity here in that these phrases could instead be defined as adjectives. And maybe even writable adjectives, something like the below pseudo-code, though kind variables don't actually work in adjectives as far as I'm aware (and I don't know if rather than is even supported for adjectives defined in I6).

Definition: a value of kind K option is empty rather than full if I6 routine "OPTION_TY_Empty" makes it so (it does not contain a value).

Which has the neat feature that you can now write now my-optional is empty… although it does leave open the question of what happens if you write now my-optional is not empty. (I think filling in the default value of that kind would be fine.)

If this was already considered and rejected, perhaps on the basis of "what does now my-optional is full mean, then that's fine (I wouldn't say I'm especially attached to the idea in any case); but in case it was not considered at all, I thought it would be worth mentioning.

That said, defining them as read-only adjectives still remains an option even if one cares about the ambiguity of "what does now my-optional is full mean".

Results

Some of the comments about optionals also apply here – they're literally the same as an optional except that the "none" state is replaced with an error message that can be arbitrary text. Especially relevant is the comment about defining adjectives rather than phrases for the basic states of the type.

I don't see why a/an can't be optional when casting to a result. Also I think it should be allowed before error when creating a failed result, and before the name of the kind as well (maybe that one already works).

@curiousdannii
Copy link
Collaborator Author

curiousdannii commented Apr 15, 2024

Thanks for your comments @CelticMinstrel. I won't respond in detail because at this stage of the proposal the full details don't need to be worked out.

However in regards to naming things:

  1. It's one of the hard things of computer science 😉
  2. We want to avoid time bikeshedding
  3. Everything here has multiple names in other various languages. We can't choose things that are familiar to absolutely everyone, but we do want to try to choose names that will be familiar to at least some Inform users.
  4. Avoiding words already used in Inform can be helpful, but it isn't essential, as long as the syntax isn't ambiguous. Sometimes the best name will already be in use, and it won't be too confusing.
  5. Inform is meant to be primarily natural language inspired. Whether any phrases using punctuation instead of words actually get into the final implementation is yet to be decided. Note that authors can always add these shortcut phrases themselves (or they might get added to Zed's Code extension), just as they could add other phrases if their preferred wording isn't used (like contains vs has).
  6. It would be nice if articles didn't matter in phrases, but the problem is some phrases do matter if there's an article or not. Maybe they should only matter in "to say" phrases? This may already be partially the case, I don't remember.

(Re maps) Is the new syntax actually necessary? As with lists, can't you already say "let M be a map of K to V"?

When I tried that, all variables would get the same map. Ideally it won't be needed in the final version, but it will likely require compiler changes. Ideally we'll also fully work out and codify the details of interior vs exterior mutability, as in #11 (comment). I'd like that to be part of the type system rather than just implementation details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants