Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't mess with symbol identity in symbols-as-weakmap-keys proposal #3

Open
wants to merge 6 commits into
base: symbols-as-weakmap-keys
Choose a base branch
from

Conversation

michaelficarra
Copy link

@michaelficarra michaelficarra commented Feb 28, 2023

This should explain what makes symbols registered within the GlobalSymbolRegistry unsuitable for use as a weak reference without complicating their identity. For discussion among the 262 editor group.

Important points:

  • identity is a local concept: whether a value has identity may be determined without any other context
  • presence or absence of identity cannot change over time

acutmore and others added 6 commits February 6, 2023 11:11
- Proposal: https://github.com/tc39/proposal-symbols-as-weakmap-keys
- Also allows Symbols in WeakSet, WeakRef, and FinalizationRegistry
- Adds new AO 'CanBeHeldWeakly'
- Registered Symbols can not be held weakly

Closes #1194

Co-authored-by: Daniel Ehrenberg <dehrenberg@bloomberg.net>
Co-authored-by: Leo Balter <leonardo.balter@gmail.com>
Co-authored-by: Mathieu Hofman <86499+mhofman@users.noreply.github.com>
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Co-authored-by: Jordan Harband <ljharb@gmail.com>
Co-authored-by: Shu-yu Guo <syg@chromium.org>
Co-authored-by: Michael Dyck <jmdyck@ibiblio.org>
Co-authored-by: Michael Ficarra <mficarra@shapesecurity.com>
Co-authored-by: Kevin Gibbons <bakkot@gmail.com>
Co-authored-by: Michael Dyck <jmdyck@ibiblio.org>
Co-authored-by: Michael Dyck <jmdyck@ibiblio.org>
<p><dfn variants="values without identity,value without identity">Values without identity</dfn> are equal to other values without identity if all of their innate characteristics are the same — characteristics such as the magnitude of an integer or the length of a sequence. Because of this, values without identity may be manifest anywhere simply by fully describing their characteristics. It is not meaningful to change the characteristics of a value that does not have identity. Examples of values without identity include, but are not limited to: <emu-xref href="#sec-ecmascript-language-types-boolean-type">Booleans</emu-xref>; mathematical values and extended mathematical values; <emu-xref href="#sec-ecmascript-language-types-number-type">Numbers</emu-xref>; <emu-xref href="#sec-ecmascript-language-types-bigint-type">BigInts</emu-xref>; *null*; *undefined*; sequences, including <emu-xref href="#sec-ecmascript-language-types-string-type">Strings</emu-xref>, ECMAScript source text, surrogate pairs, Directive Prologues, etc; UTF-16 code units; Unicode code points; <emu-xref href="#sec-ecmascript-language-types-symbol-type">Symbols</emu-xref> in the <emu-xref href="#sec-symbol.for">GlobalSymbolRegistry</emu-xref>; enums; abstract operations, including syntax-directed operations, host hooks, etc; and ordered pairs. The preceding list is exhaustive for ECMAScript language values.</p>
<p>In contrast, each <dfn variants="values with identity">value with identity</dfn> is unique and therefore only equal to itself. Values with identity are like values without identity but with an additional unguessable, unchangeable, universally-unique characteristic called <em>identity</em>. References to existing values with identity cannot be manifest simply by describing them, as the identity itself is indescribable; instead, references to these values must be explicitly passed from one place to another. Some values with identity are mutable and therefore can have their characteristics (except their identity) changed in-place, causing all holders of the value to observe the new characteristics. Examples of values with identity include, but are not limited to: <emu-xref href="#sec-object-type">Objects</emu-xref>, including function objects, exotic objects, etc; any kind of Records, including Property Descriptors, PrivateElements, etc; <emu-xref href="#sec-ecmascript-language-types-symbol-type">Symbols</emu-xref> not in the <emu-xref href="#sec-symbol.for">GlobalSymbolRegistry</emu-xref>; Parse Nodes; Lists; <emu-xref href="#sec-set-and-relation-specification-type">Sets</emu-xref> and Relations; Abstract Closures; Data Blocks; Private Names; execution contexts and execution context stacks; agent signifiers; and WaiterLists. The preceding list is exhaustive for ECMAScript language values.</p>
<p><dfn variants="values without identity,value without identity">Values without identity</dfn> are equal to other values without identity if all of their innate characteristics are the same — characteristics such as the magnitude of an integer or the length of a sequence. Because of this, values without identity may be manifest anywhere simply by fully describing their characteristics. It is not meaningful to change the characteristics of a value that does not have identity. Examples of values without identity include, but are not limited to: <emu-xref href="#sec-ecmascript-language-types-boolean-type">Booleans</emu-xref>; mathematical values and extended mathematical values; <emu-xref href="#sec-ecmascript-language-types-number-type">Numbers</emu-xref>; <emu-xref href="#sec-ecmascript-language-types-bigint-type">BigInts</emu-xref>; *null*; *undefined*; sequences, including <emu-xref href="#sec-ecmascript-language-types-string-type">Strings</emu-xref>, ECMAScript source text, surrogate pairs, Directive Prologues, etc; UTF-16 code units; Unicode code points; enums; abstract operations, including syntax-directed operations, host hooks, etc; and ordered pairs. The preceding list is exhaustive for ECMAScript language values.</p>
<p>In contrast, each <dfn variants="values with identity">value with identity</dfn> is unique and therefore only equal to itself. Values with identity are like values without identity but with an additional unguessable, unchangeable, universally-unique characteristic called <em>identity</em>. References to existing values with identity cannot be manifest simply by describing them, as the identity itself is indescribable; instead, references to these values must be explicitly passed from one place to another. Some values with identity are mutable and therefore can have their characteristics (except their identity) changed in-place, causing all holders of the value to observe the new characteristics. Examples of values with identity include, but are not limited to: <emu-xref href="#sec-object-type">Objects</emu-xref>, including function objects, exotic objects, etc; any kind of Records, including Property Descriptors, PrivateElements, etc; <emu-xref href="#sec-ecmascript-language-types-symbol-type">Symbols</emu-xref>; Parse Nodes; Lists; <emu-xref href="#sec-set-and-relation-specification-type">Sets</emu-xref> and Relations; Abstract Closures; Data Blocks; Private Names; execution contexts and execution context stacks; agent signifiers; and WaiterLists. The preceding list is exhaustive for ECMAScript language values.</p>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI (since GitHub makes these diffs impossible to read), this just reverts the changes to the identity section.

</dl>
<emu-alg>
1. If _v_ is an Object, return *true*.
1. If _v_ is a Symbol and KeyForSymbol(_v_) is *undefined*, return *true*.
1. Return *false*.
</emu-alg>
<emu-note>
<p>Symbols in the GlobalSymbolRegistry (<emu-xref href="#sec-symbol.for"></emu-xref>) are not considered suitable for use as a WeakMap key because they can be described by the String used to register them, and therefore do not have <emu-xref href="#sec-identity">identity</emu-xref>. Well-known symbols (<emu-xref href="#sec-well-known-symbols"></emu-xref>) are likely to never be collected, but are nonetheless treated as suitable for weak reference because they are limited in number and therefore manageable by a variety of implementation approaches. However, any values associated to a well-known symbol in a live WeakMap is unlikely to be collected and could leak memory resources in many implementations.</p>
<p>Values without identity are not suitable for use as a weak reference because they may be manifest at any point without a prior reference. Symbols in the GlobalSymbolRegistry (<emu-xref href="#sec-symbol.for"></emu-xref>) are not suitable for use as a weak reference because they may be retrieved again at any later point using the String with which they were registered, making them ineligible for collection and therefore eternal. Well-known symbols (<emu-xref href="#sec-well-known-symbols"></emu-xref>) are likely to never be collected, but are nonetheless suitable for weak reference because they are limited in number and therefore manageable by a variety of implementation approaches. However, any value associated to a well-known symbol in a live WeakMap is unlikely to be collected and could "leak" memory resources in some implementations.</p>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael, I understand that you prefer all Symbols to have identity but I don't think "may be retrieved again at any later point using the String" is helpful as a separate reason from "may be manifest at any point without prior reference". "May be manifest at any point without prior reference" already fully explains Symbol.for. The specification mechanism of manifestation is the String-keyed global registry, but the reason that neither Symbol.for symbols and values without identity are suitable for weak references is exactly the same.

I feel that explaining that singular reason as two separate reasons muddles the explanation.

I don't see clarity being lost if we say that Symbol.for symbols don't have identity but other Symbols do. It fits very well with the existing definition of identity as "manifestable without prior reference". It means Symbols can't be thought of neatly as one category of values wrt identity, but that is a fact of the language today because we designed a crappy feature.

Here's a thought experiment. Suppose the spec specified all String values to be interned with a cross-Realm registry, like is the case of Symbol.for symbols. There is no observable difference on the language, and is a specification detail. Would you then say Strings have identity? I wouldn't.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that explaining that singular reason as two separate reasons muddles the explanation.

I'm fine with not elaborating on values without identity here. I added it for completeness and because I thought it might help with clarity. If you think it hurts clarity, I don't mind removing it.

I don't see clarity being lost if we say that Symbol.for symbols don't have identity but other Symbols do.

The thing is, there's no such thing as Symbol.for symbols. There's just symbols. Sometimes they are in the GlobalSymbolRegistry, sometimes not. No symbols are in the GlobalSymbolRegistry at their inception. And there's no invariant described in the spec at the moment that, once added, they are never removed. This intersects with my desire for whether something has identity to be an unchanging property of that thing.

but that is a fact of the language today because we designed a crappy feature

It's not. The formalisation I've proposed here does not rely on separation of these concepts or some exotic definition of identity.

Would you then say Strings have identity?

As long as the interning was used for spec uses of strings as well, it wouldn't matter whether strings had identity or not in this scenario. It's only important for strings to not have identity in the spec currently because we want "a" is "a" to be true. I don't know what this was supposed to illustrate.

@syg
Copy link

syg commented Feb 28, 2023

identity is a local concept: whether a value has identity may be determined without any other context

To address this directly, can you expand on why the meaning of identity should have this property? More to the point, what is gained by separating the notion of "can be used as weak reference" from "identity", if we're not actually disagreeing that identity means "can be manifested without prior reference"?

@michaelficarra
Copy link
Author

what is gained by separating the notion of "can be used as weak reference" from "identity"

"can be used as weak reference" is circumstantial whereas "identity" should be innate. The value of separating them allows identity to remain innate. I've never seen any formal system where identity is not innate, and I'd really hate for ours to be unique in this way. It's just strange, and as I've shown, it doesn't buy us anything because we can formalise it the way I've proposed here.

@syg
Copy link

syg commented Mar 1, 2023

"can be used as weak reference" is circumstantial whereas "identity" should be innate.

It really isn't circumstantial. I'm not saying identity is defined by what is usable as a weak reference. I'm saying identity is the property that wholly determines usability.

The way the original PR was isn't any less innate. A "Symbol value" is an arbitrary category we made up. Some Symbols have identity and some don't. Your desire to align it is forcing something unnatural into the formalism. I really do not think it is a useful argument to argue about formal systems here. In our chats it is clear that your notion of a formal system is not one that is common in PL formalisms.

@michaelficarra
Copy link
Author

The way the original PR was isn't any less innate.

This isn't true. With the identity change,

image

in steps 4 and 5, newSymbol has identity. In step 6, it is the same symbol but it no longer has identity. That is not how an innate quality would work, nor how identity works in any other system.

Some Symbols have identity and some don't.

That would be fine if all values remained in one category or the other. I am not comfortable with them changing between these categories.

In our chats it is clear that your notion of a formal system is not one that is common in PL formalisms.

Name a formalism that uses the word identity in the way you describe.

@syg
Copy link

syg commented Mar 1, 2023

in steps 4 and 5, newSymbol has identity. In step 6, it is the same symbol but it no longer has identity. That is not how an innate quality would work, nor how identity works in any other system.

I think I see the reason for our different views. There are 2 definitions of "identity".

  1. Identity of JS language values from the perspective of a JS programmer.
  2. Identity of spec-meta things from the perspective of spec steps.

For (2), it is true my preferred definition breaks innateness. But the salient definition here is (1). The identity between steps 4 and 5 is not observable from a JS program. Where observable from a running JS program, Symbol.for symbols don't have identity and other Symbols do.

This points to lack of editorial clarity in our current definition of identity in not distinguishing between from the perspective of JS and from the perspective of intra-spec things. Do you agree that innateness of (1) isn't broken?

For (1), I strongly disagree with saying all Symbols have identity, because it means the "manifestable without prior reference" property is no longer predictive and the definition just becomes a stipulation. Because we'll be saying Symbol.for symbols are manifestable without prior reference, yet have identity.

That would be fine if all values remained in one category or the other. I am not comfortable with them changing between these categories.

They don't, from the perspective of what's observable by a JS program.

Name a formalism that uses the word identity in the way you describe.

You go first, you appealed to its meaning one thing in formalisms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants