From 38ec620f6b2971524f4679f3463df263f9a1c108 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Fri, 8 Oct 2021 14:35:14 -0700 Subject: [PATCH 01/13] Filling out template with PR 875 --- proposals/p0875.md | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 proposals/p0875.md diff --git a/proposals/p0875.md b/proposals/p0875.md new file mode 100644 index 0000000000000..180d132c4736e --- /dev/null +++ b/proposals/p0875.md @@ -0,0 +1,62 @@ +# Principle: information accumulation + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/875) + + + +## Table of contents + +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) +- [Details](#details) +- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) +- [Alternatives considered](#alternatives-considered) + + + +## Problem + +TODO: What problem are you trying to solve? How important is that problem? Who +is impacted by it? + +## Background + +TODO: Is there any background that readers should consider to fully understand +this problem and your approach to solving it? + +## Proposal + +TODO: Briefly and at a high level, how do you propose to solve the problem? Why +will that in fact solve it? + +## Details + +TODO: Fully explain the details of the proposed solution. + +## Rationale based on Carbon's goals + +TODO: How does this proposal effectively advance Carbon's goals? Rather than +re-stating the full motivation, this should connect that motivation back to +Carbon's stated goals for the project or language. This may evolve during +review. Use links to appropriate goals, for example: + +- [Community and culture](/docs/project/goals.md#community-and-culture) +- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) +- [Performance-critical software](/docs/project/goals.md#performance-critical-software) +- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) +- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) +- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms) +- [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development) +- [Modern OS platforms, hardware architectures, and environments](/docs/project/goals.md#modern-os-platforms-hardware-architectures-and-environments) +- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + +## Alternatives considered + +TODO: What alternative solutions have you considered? From 235cb641274604b94574404c1ad29f6bf6131b55 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Fri, 8 Oct 2021 17:40:07 -0700 Subject: [PATCH 02/13] Initial incomplete sketch --- proposals/p0875.md | 366 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 349 insertions(+), 17 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index 180d132c4736e..abd779072b870 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -14,49 +14,381 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Problem](#problem) - [Background](#background) + - [Single-pass "splat" compilation](#single-pass-splat-compilation) + - [Global consistency](#global-consistency) + - [The C++ compromise](#the-c-compromise) + - [Separate declarations and definitions](#separate-declarations-and-definitions) - [Proposal](#proposal) - [Details](#details) + - [Goals](#goals) - [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) - [Alternatives considered](#alternatives-considered) + - [Strict top-down information flow](#strict-top-down-information-flow) + - [Strict global consistency](#strict-global-consistency) + - [Top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) + - [Consistent classes, top-down for everything else](#consistent-classes-top-down-for-everything-else) + - [Context-sensitive local consistency](#context-sensitive-local-consistency) ## Problem -TODO: What problem are you trying to solve? How important is that problem? Who -is impacted by it? +We should have consistent rules describing what information about a program is +visible where. ## Background -TODO: Is there any background that readers should consider to fully understand -this problem and your approach to solving it? +Information in a source file is provided incrementally, with each source +utterance providing a small piece of the overall picture. Different languages +have different rules for which information is available where. + +### Single-pass "splat" compilation + +In C and other languages of a similar age, single-pass compilation was highly +desirable, due to resource limits and performance concerns. In these languages: + +- Information is accumulated top-down, and can only be used lexically after it + appears. +- Most information can be discarded soon after it is provided: function bodies + don't need to be kept around once they've been converted to the output + format, and no information on local variable or parameter names needs to + persist past the end of the variable's scope. However, the types of globals + and the contents of type definitions must be retained. +- The behavior of an entity can be different at different places in the same + source file. An early use may fail if it depends on information that's + provided later, and in some cases a later use may fail when an earlier use + succeeded because the use i's invalid in a way that was not visible at the + point of an earlier use. + +### Global consistency + +In more modern languages such as C#, Rust, Java, and Swift, there is no lexical +information ordering. In these languages: + +- Information is effectively accumulated and processed in separate passes. +- The language design and implementation ensure that the behavior of an entity + is the same everywhere: both before its definition, after its definition, + within its definition, and in any other source file in which it was made + visible. +- Dependency cycles between program properties are carefully avoided by the + language designers. + +### The C++ compromise + +In C++, a hybrid approach is taken. There is a C-like lexical information +ordering rule, but this rule is subverted within classes by -- effectively -- +reordering certain parts of a class that appear within the class definition so +that they are processed after the class definition. This primarily applies to +the bodies of member functions. Here: + +- Information is mostly accumulated top-down, and is accumulated fully + top-down after the reordering step. +- The behavior of a class is the same within member function bodies that are + defined inside the class as it is within member function bodies defined + lexically after the class. +- The language designers need to ensure that the bounds of the member function + bodies and similar constructs can be determined without parsing them, so + that the late-parsed portions can be separated from the early-parsed + portions. In C++, this was not done successfully, and there are constructs + for which this determination is very hard or impossible. + +### Separate declarations and definitions + +Somewhat separate from the direction of information flow is the ability to +separate the information about an entity into multiple distinct regions of +source files. In C and C++, entities can be separately declared and defined. As +a consequence, these languages need rules to determine whether two declarations +declare the same entity. + +In C++, especially for templated declarations, these rules can be incredibly +complex, and even now, more than 30 years after the introduction of templates in +C++, [basic questions are not fully answered](https://wg21.link/cwg2), and +implementations disagree about which declarations declare the same entity in +fairly simple examples. + +One key benefit of this separation is in reduction of _physical dependencies_: +in order to validate a usage of an entity, we need only see a source file +containing a declaration of that entity, and need never consider the source file +containing its definition. This both reduces the number of steps required for an +incremental rebuild and reduces the input information and processing required +for each individual step. + +The ability to break physical dependencies is limited to the cases where +information can actually be hidden from the users of the entity. For example, if +the user actually needs a function body, either because they will evaluate a +call to the function during compilation or because they will inline it prior to +linking, it cannot be physically isolated from the user of that information. As +a consequence, in C++, a programmer must carefully manage which information they +put in the source files that are exposed to client code and which information is +kept separate. + +Another key benefit is that the exported interface of a source file can become +more readable, by presenting an interface that contains only the facts that are +salient to a user and not the implementation details. ## Proposal -TODO: Briefly and at a high level, how do you propose to solve the problem? Why -will that in fact solve it? +TODO: Decide between the alternatives listed below. ## Details TODO: Fully explain the details of the proposed solution. -## Rationale based on Carbon's goals +### Goals -TODO: How does this proposal effectively advance Carbon's goals? Rather than -re-stating the full motivation, this should connect that motivation back to -Carbon's stated goals for the project or language. This may evolve during -review. Use links to appropriate goals, for example: +For this proposal, we have the following goals as refinements of the overall +Carbon goals: + +- _Comprehensiblity._ Our rules should be understandable, and should minimize + surprise and gotchas. Our behavior should be self-consistent, and + explainable in only a few sentences. +- _Ergonomics._ It should be easy to express common developer desires, without + a lot of boilerplate or repetitive code. +- _Readability._ Code written using our rules should be as straightforward as + possible for Carbon developers to read and reason about. +- _Efficient and simple compilation._ It should be relatively straightforward + to implement our semantic rules. Implementation heroics shouldn't be + required, and the number of special cases required should be minimized. +- _Diagnosability._ An implementation should be able to explain coding errors + in ways that are easy to understand and are well-correlated with the error + and its remedy. Diagnostics should appear in an order and style that guides + the developer through logical steps to fix their mistakes. +- _Toolability._ Relatively simple tools should be able to understand simpler + properties of Carbon code. It should ideally be possible to identify which + names can be used in a particular context and what those names mean without + full processing. It should ideally be possible to gather useful and mostly + complete information about a potentially-invalid source file that is + currently being edited, for which it may be desirable to assume there is a + "hole" in the source file at the cursor position that will be filled by + unknown code. + +## Rationale based on Carbon's goals -- [Community and culture](/docs/project/goals.md#community-and-culture) - [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) -- [Performance-critical software](/docs/project/goals.md#performance-critical-software) + - See "Toolability" goal. - [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) + - TODO: Order-independence improves the ability to evolve code on a small + scale. - [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) -- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms) + - See "Readability", "Ergonomics", and "Comprehensibility" gaols. - [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development) -- [Modern OS platforms, hardware architectures, and environments](/docs/project/goals.md#modern-os-platforms-hardware-architectures-and-environments) -- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + - See "Efficient and simple compilation" and "Diagnosability" goals. ## Alternatives considered -TODO: What alternative solutions have you considered? +Below, various alternatives are presented and rated according to the +[goals](#goals) for this proposal. + +### Strict top-down information flow + +Carbon could accumulate information top-down. We could require that each program +utterance is type-checked and fully validated before any later code is +considered. + +In order to support this and still permit cyclic references between entities, we +would need to permit separate declaration and definition. + +_Comprehensibility:_ This rule is simple to explain, and has no special cases. +However, the inability to look at information from later in the source file is +likely to result in gotchas: + +``` +class Base { + var n: i32; +} +class Derived extends Base { + // Returns me.(Base.n), not me.(Derived.n), because the latter has not + // been declared yet. + fn Get[me: Self]() -> i32 { return me.n; } + var n: i32; +} +``` + +It might be possible to require a diagnostic in such cases, when we find a +declaration that would change the meaning of prior code if it had appeared +earlier, but that would result in implementation complexity, and the fact that +such cases are rejected would still be a surprise. + +_Ergonomics:_ The developer is required to topologically sort their source files +in dependency order, manually breaking cycles with forward declarations. Common +refactoring tasks such as reorganizing code may require effort or tooling +assistance in order to preserve a topological order. + +In practice, we would expect developers to react to this ruleset by beginning +each source file with a collection of forward declarations. This mitigates the +need to produce a topological ordering, except within those forward declarations +themselves, and other declarations required to provide those forward +declarations. For example, a forward declaration of a class member will likely +only be possible within a class definition, and the order in which that class +definition is given can be relevant to the validity of other class definitions. + +_Readability:_ Developers wishing to understand code have the advantage that +they need only consider prior code, and there is no possibility that a later +source utterance could change the meaning of the code they're reading. However, +it is rare to read code top-down, so the effect of this advantage may be modest. + +This advantage leads to a significant disadvantange: the behaviour of an entity +can be different at different places within a source file. For example, a type +can be incomplete in one place and complete in another, or can fail to implement +an interface when inspected early and then found to implement that interface +later. This can lead to very subtle incoherent behavior. + +In practice, the topological ordering constraint tends to lead to good locality +of information: helpers for particular functionality are often located near to +the functionality. However, this is not a universal advantage, and the +topological constraint sometimes leads to internal helpers being ordered +immediately before their first use instead of in a more logical position near +correlated functionality. + +The ability and tacit encouragement to start a source file with a list of +forward declarations of entities in that file -- or, for an API file, in its +corresponding implementation file -- will improve readability compared to an +approach in which that style is not possible or would not be used in practice. + +_Efficient and simple compilation:_ This rule is mostly simple and efficient to +implement, and even allows single-pass processing of source files. It supports +and is likely to encourage physical separation of implementation from interface, +potentially leading to build time wins through reduced recompilation. + +However, the requirement to support separate declaration and definition has the +potential to lead to substantial implementation complexity, as it does in C++, +as it imposes the requirement to determine whether two declarations declare the +same entity or different entities -- especially in the context of overloaded +function templates. + +_Diagnosability:_ Because information is provided top-down, diagnostics can also +be provided top-down and in every case the diagnostic will be caused by an error +at the given location or earlier. Fixing errors should require little +backtracking by the developer. + +However, an implementation that strictly confines its processing to top-down +order and produces diagnostics eagerly cannot deliver diagnostics that react +intelligently to contextual cues that appear after the point of the diagnostic. +This approach diminishes the ability for an implementation to pinpoint the cause +of the error and describe it in a developer-oriented fashion. + +_Toolability:_ Limiting information flow to top-down means that tools such as +code completion tools need only consider context prior to the cursor, and they +can be confident that if all the code prior to the cursor is valid that it can +be type-checked and suitable completions offered. + +However, in the case where the user wants to refer to a later-declared entity, +such tools would not be able to use this strategy. They would need to parse as +if there were not a top-down rule in order to find such later-declared entities, +and would likely additionally need the ability to add forward declarations or to +reorder declarations in order to satisfy the ordering requirement. + +### Strict global consistency + +Carbon could follow an approach of requiring the behavior of every entity to be +globally consistent. In this approach, the behavior of every entity would be as +if the entire program could be consulted to determine facts about that entity. + +In practice, to make this work, we would need to limit where those facts can be +declared. For example, we limit implementations of interfaces to appear only in +source files that must already be in use wherever the question "does this type +implement that interface?" can be asked. + +In addition, we need to reject at least the case where some property of the +program recursively depends upon itself: + +``` +struct X { + var a: Array(sizeof(X), i8); +} +``` + +In order to give globally consistent semantics to, for example, a package name, +we would likely need to process all source files comprising a package at the +same time. + +This alternative can be considered either with or without the ability to +separate declarations from definitions. + +_Comprehensibility:_ This rule is simple to explain, and has no special cases. +The disallowance of semantic cycles is likely to be unsurprising as it is a +logical necessity in any rule. + +Applying this rule to local name lookup in block scope does result in some +surprises. For example, C# uses this approach, and combined with its +disallowance of shadowing of local variables, this +[confuses some developers](https://stackoverflow.com/questions/1196941/variable-scope-confusion-in-c-sharp). + +_Ergonomics:_ The developer can organize or arrange their code in any way they +desire. There is never a need to forward-declare or repeat an interface +declaration. Refactoring and code reorganization do not require any non-obvious +changes, because the same code means the same thing regardless of how it is +located relative to other code. + +_Readability:_ Reasoning about code is simple in this model, as such reasoning +is largely not context-sensitivity. Instead of questioning "what does this do +here?" we can instead consider "what does this do?". Some context sensitivity +may remain, for example due to access and name bindings differing in different +contexts. + +However, to developers accustomed to a top-down semantic model, the ability to +defer giving key information about an entity -- or even declaring it at all -- +until long after it is first used may hinder readability in some circumstances, +particularly when reading code top-down. + +_Efficient and simple compilation:_ This model forces the compilation process to +operate in multiple stages rather than as a single pass. + +Some form of cycle detection is necessary if cycles are possible. However, such +a mechanism is likely to be necessary for template instantiation too, so this is +likely not a novel requirement for Carbon. + +Forcing all files within a package to be compiled together in order to provide +consistent semantics for the package name may place an undesirable scalability +barrier on the build system. + +_Diagnosability:_ The implementation is likely to have more contextual +information when providing diagnostics, improving their quality. However, the +diagnostics may appear in a confusing order: if an early declaration needs +information from a later declaration in order to type-check, diagnostics +associated with that later declaration may be produced first, or may be +interleaved with diagnostics for the earlier declaration, leading to the +programmer potentially revisiting the same code multiple times during a +compile-edit cycle. + +_Toolability:_ This model requires tools to consider the whole file as context, +because code may refer to entities that are only introduced later. For an IDE +scenario, where the cursor represents a location where an arbitrary chunk of +code may be missing, this presents a challenge of determining how to +resynchronize the input in order to determine how to interpret the portion of +the source file following the cursor. + +Sophisticated tooling for a top-down model may wish to inspect the trailing +portion of the file anyway, in order to provide a better developer experience, +but this complexity would be forced upon tools with this model. + +### Top-down with minimally deferred type checking + +We could follow a top-down approach generally, but defer type-checking each +top-level entity until we reach the end of that entity. For example, we would +check an entire class as a single unit, following the same principles as in the +globally-consistent rule, but using only information provided prior to the end +of the class definition. This would allow class members to use other members +that have not yet been declared, while not permitting a function definition +preceding the class definition to use such members. + +### Consistent classes, top-down for everything else + +We could provide a globally-consistent rule for some entities and a top-down +rule for others. Following C++'s lead, we could provide a top-down rule for +packages, namespaces, and within functions, but provide a globally-consistent +rule for classes. + +### Context-sensitive local consistency + +We could use different behaviors in different contexts, as follows: + +- For contexts that are fundamentally ordered, such as function bodies, a + top-down rule is used. +- For contexts that are defined across multiple source files, such as packages + and namespaces, we guarantee consistent behavior within each source file, + but the behavior may be inconsistent across source files: different source + files may see different sets of names within a package or namespace, + depending on what they have imported. +- For contexts that are defined within a single source file, such as a class + or an interface, we guarantee globally consistent behavior. From a31257a2c08bdaf05f3ec5085bb9dabf409b6430 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Fri, 15 Oct 2021 17:46:18 -0700 Subject: [PATCH 03/13] Respond to review comments and add a proposed direction based on a poll at our weekly meeting. --- proposals/p0875.md | 374 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 324 insertions(+), 50 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index abd779072b870..d6ed31759f7da 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -23,11 +23,14 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Goals](#goals) - [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) - [Alternatives considered](#alternatives-considered) - - [Strict top-down information flow](#strict-top-down-information-flow) - - [Strict global consistency](#strict-global-consistency) - - [Top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) - - [Consistent classes, top-down for everything else](#consistent-classes-top-down-for-everything-else) - - [Context-sensitive local consistency](#context-sensitive-local-consistency) + - [Information flow](#information-flow) + - [Strict top-down](#strict-top-down) + - [Strict global consistency](#strict-global-consistency) + - [Top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) + - [Context-sensitive local consistency](#context-sensitive-local-consistency) + - [Separate declaration and definition](#separate-declaration-and-definition) + - [Allow separate declaration and definition](#allow-separate-declaration-and-definition) + - [Disallow separate declaration and definition](#disallow-separate-declaration-and-definition) @@ -57,7 +60,7 @@ desirable, due to resource limits and performance concerns. In these languages: - The behavior of an entity can be different at different places in the same source file. An early use may fail if it depends on information that's provided later, and in some cases a later use may fail when an earlier use - succeeded because the use i's invalid in a way that was not visible at the + succeeded because the use is invalid in a way that was not visible at the point of an earlier use. ### Global consistency @@ -128,11 +131,36 @@ salient to a user and not the implementation details. ## Proposal -TODO: Decide between the alternatives listed below. +Carbon will provide +[context-sensitive local consistency](#context-sensitive-local-consistency), +with [forward declarations](#allow separate declaration and definition). ## Details -TODO: Fully explain the details of the proposed solution. +Within a single source file, each entity has only one meaning and only one set +of properties, that does not depend on where in the source file you are. + +There are two different name lookup behaviors: + +- For scopes that are inherently ordered -- function bodies and braced code + blocks within them -- name lookup only finds earlier-declared entities in + that scope. +- For scopes in which order is largely immaterial -- all non-statement scopes + -- name lookup always finds all entities declared in the same file, + regardless of where the name is declared relative to the point of lookup. + +The latter rule is used for class scopes, even though the relative order of +field declarations in a class is relevant to the layout of the class and to the +behavior of the program. + +Forward declarations are permitted in order to allow separation of interface and +implementation whenever the developer wishes to do so. All declarations of an +entity are required to be part of the same library, but can be in different +source files. If a source file can only see a declaration of an entity and not +its definition, some uses of that entity will be invalid that would be valid if +the definition were visible. For example, without a class definition, we may be +unable to create instances of the class, and without a function definition, we +may be unable to use the function to compute a compile-time constant. ### Goals @@ -179,14 +207,21 @@ Carbon goals: Below, various alternatives are presented and rated according to the [goals](#goals) for this proposal. -### Strict top-down information flow +Generally we need to pick a rule for information flow and a rule for separate +declaration and definition. These choices are not independent: some information +flow rules require separate declaration and definition. + +### Information flow + +#### Strict top-down Carbon could accumulate information top-down. We could require that each program utterance is type-checked and fully validated before any later code is considered. In order to support this and still permit cyclic references between entities, we -would need to permit separate declaration and definition. +would need to +[allow separate declaration and definition](#allow-separate-declaration-and-definition). _Comprehensibility:_ This rule is simple to explain, and has no special cases. However, the inability to look at information from later in the source file is @@ -214,13 +249,18 @@ in dependency order, manually breaking cycles with forward declarations. Common refactoring tasks such as reorganizing code may require effort or tooling assistance in order to preserve a topological order. -In practice, we would expect developers to react to this ruleset by beginning -each source file with a collection of forward declarations. This mitigates the -need to produce a topological ordering, except within those forward declarations -themselves, and other declarations required to provide those forward -declarations. For example, a forward declaration of a class member will likely -only be possible within a class definition, and the order in which that class -definition is given can be relevant to the validity of other class definitions. +Developers could adapt to this ruleset by beginning each source file with a +collection of forward declarations. This would mitigate the need to produce a +topological ordering, except within those forward declarations themselves, and +other declarations required to provide those forward declarations. For example, +a forward declaration of a class member will likely only be possible within a +class definition, and the order in which that class definition is given can be +relevant to the validity of other class definitions. However, our experience +with C++ indicates that this will not be done, and instead an ad-hoc and +undisciplined combination of a topological sort and occasional forward +declarations will be used. Indeed, including these additional forward +declarations would increase the maintenance burden and introduce an additional +opportunity for errors due to mismatches between declaration and definition. _Readability:_ Developers wishing to understand code have the advantage that they need only consider prior code, and there is no possibility that a later @@ -240,25 +280,12 @@ topological constraint sometimes leads to internal helpers being ordered immediately before their first use instead of in a more logical position near correlated functionality. -The ability and tacit encouragement to start a source file with a list of -forward declarations of entities in that file -- or, for an API file, in its -corresponding implementation file -- will improve readability compared to an -approach in which that style is not possible or would not be used in practice. - _Efficient and simple compilation:_ This rule is mostly simple and efficient to -implement, and even allows single-pass processing of source files. It supports -and is likely to encourage physical separation of implementation from interface, -potentially leading to build time wins through reduced recompilation. - -However, the requirement to support separate declaration and definition has the -potential to lead to substantial implementation complexity, as it does in C++, -as it imposes the requirement to determine whether two declarations declare the -same entity or different entities -- especially in the context of overloaded -function templates. +implement, and even allows single-pass processing of source files. _Diagnosability:_ Because information is provided top-down, diagnostics can also be provided top-down and in every case the diagnostic will be caused by an error -at the given location or earlier. Fixing errors should require little +at the given location or earlier. Fixing errors should require little or no backtracking by the developer. However, an implementation that strictly confines its processing to top-down @@ -278,7 +305,7 @@ if there were not a top-down rule in order to find such later-declared entities, and would likely additionally need the ability to add forward declarations or to reorder declarations in order to satisfy the ordering requirement. -### Strict global consistency +#### Strict global consistency Carbon could follow an approach of requiring the behavior of every entity to be globally consistent. In this approach, the behavior of every entity would be as @@ -303,7 +330,10 @@ we would likely need to process all source files comprising a package at the same time. This alternative can be considered either with or without the ability to -separate declarations from definitions. +separate declarations from definitions. If we permit separate declaration and +definition, we would likely require declarations of the same entity to appear in +the same library; however, that limitation is desirable regardless of which +strategy we choose. _Comprehensibility:_ This rule is simple to explain, and has no special cases. The disallowance of semantic cycles is likely to be unsurprising as it is a @@ -315,7 +345,7 @@ disallowance of shadowing of local variables, this [confuses some developers](https://stackoverflow.com/questions/1196941/variable-scope-confusion-in-c-sharp). _Ergonomics:_ The developer can organize or arrange their code in any way they -desire. There is never a need to forward-declare or repeat an interface +desire. There is never a requirement to forward-declare or repeat an interface declaration. Refactoring and code reorganization do not require any non-obvious changes, because the same code means the same thing regardless of how it is located relative to other code. @@ -340,7 +370,11 @@ likely not a novel requirement for Carbon. Forcing all files within a package to be compiled together in order to provide consistent semantics for the package name may place an undesirable scalability -barrier on the build system. +barrier on the build system. This will also tend to push up build latency, +especially for incremental builds, by decreasing the granule size of +compilation. It may increase the scope of recompilations due to additional +physical dependencies unless we implement a mechanism to detect whether a +library has changed in an "important" way. _Diagnosability:_ The implementation is likely to have more contextual information when providing diagnostics, improving their quality. However, the @@ -362,24 +396,81 @@ Sophisticated tooling for a top-down model may wish to inspect the trailing portion of the file anyway, in order to provide a better developer experience, but this complexity would be forced upon tools with this model. -### Top-down with minimally deferred type checking +#### Top-down with minimally deferred type checking + +We could follow a top-down approach generally, but defer type-checking some or +all top-level entities until we reach the end of that entity. For example, we +would check an entire class as a single unit, following the same principles as +in the globally-consistent rule, but using only information provided prior to +the end of the class definition. This would allow class members to use other +members that have not yet been declared, while not permitting a function +definition preceding the class definition to use such members. + +Following C++'s lead, we would apply this to at least classes and interfaces, +but perhaps to nothing else. We may still want to apply the top-down rule to +function definitions appearing within a class or interface, as the C# behavior +that the scope of a local variable extends backwards before its declaration may +be surprising. + +This approach generally has the properties as +[strict top-down](#strict-top-down), except as follows: + +_Comprehensibility:_ Slightly reduced due to additional special-casing of +top-level declarations. Gotchas would be rarer, but may not be fully eliminated. +For example: + +``` +import X; +// G from package X, not G from class X below. +fn F() { X.G(); } +class X { + fn G() {} +} +class Y { + // G from class X below, not G from package X. + fn F() { X.G(); } + class X { + fn G() {} + } +} +``` + +_Ergonomics:_ Improved within classes, as members can now be named everywhere +within the class. + +_Readability:_ The ability to read and understand code may be somewhat harmed, +as the rules for behavior across top-level declarations aren't the same as the +rules for behavior within a top-level declaration. -We could follow a top-down approach generally, but defer type-checking each -top-level entity until we reach the end of that entity. For example, we would -check an entire class as a single unit, following the same principles as in the -globally-consistent rule, but using only information provided prior to the end -of the class definition. This would allow class members to use other members -that have not yet been declared, while not permitting a function definition -preceding the class definition to use such members. +However, the behavior of an entity no longer differs at different points within +the same declaration of that entity, so the ability to reason about the behavior +of an entity is somewhat improved compared to the strict top-down approach. For +entities without a separate declaration and definition, the behavior of those +entities is the same everywhere. -### Consistent classes, top-down for everything else +_Efficient and simple compilation:_ This has the same complications as the +strict global consistency approach, except that it does not apply to contexts +that span multiple files, so it doesn't have the build time disadvantages of +requiring all source files in a library to be built together. -We could provide a globally-consistent rule for some entities and a top-down -rule for others. Following C++'s lead, we could provide a top-down rule for -packages, namespaces, and within functions, but provide a globally-consistent -rule for classes. +_Diagnosability:_ Diagnostics should be encountered by the implementation in the +top-down order of the top-level declaration containing the error, but may in +principle appear in any order within that top-level declaration, if there are +forward references that require later code to be analyzed before earlier code +is. -### Context-sensitive local consistency +Diagnostics can be caused by errors appearing both before and after the location +reported, but not arbitrarily far after: later top-level declarations cannot +affect an earlier diagnostic. + +_Toolability:_ Tools that want to correctly model Carbon semantics would be +required to deal with incomplete source code in the vicinity of the cursor. This +is probably the hardest kind of incomplete source code to handle, as the region +of "damaged" code is most likely to be relevant context. In practice, this is +likely to present the same difficulties as tooling in the +[strict global consistency](#strict-global-consistency) model. + +#### Context-sensitive local consistency We could use different behaviors in different contexts, as follows: @@ -392,3 +483,186 @@ We could use different behaviors in different contexts, as follows: depending on what they have imported. - For contexts that are defined within a single source file, such as a class or an interface, we guarantee globally consistent behavior. + +Compared to the [strict global consistency](#strict-global-consistency) model, +this would not guarantee that the contents of a package or namespace are the +same everywhere: you only see the names that you declare or import into a +package or namespace. Also, a top-down rule is used within function bodies, as +that likely better matches programmer expectations. + +Compared to the +[top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) +model, this would not require forward declarations for namespace and package +members that are declared or defined later in the same source file. + +_Comprehensibility:_ This model is probably a little easier to understand than +the minimally-deferred type-checking model, because there is no difference +between top-level declarations and nested declarations. + +Because the behavior of every entity is consistent within a file, and every name +lookup is consistent within a scope -- other than the top-down behavior within +function scopes -- there are fewer opportunities for surprises than in the +top-down approaches, and nearly as few as in the strict global consistency +model. + +_Ergonomics:_ As with the strict global consistency model, code can be organized +as the developer desires, and refactoring doesn't have any surprises due to +disrupting a topological order. + +_Readability:_ As with the strict global consistency model, readability is +improved by not being context-sensitive. However, it is file-sensitive, and a +different set of imports may lead to different behavior. + +Similarly, as with the strict global consistency model, developers accustomed to +the top-down model may find it harder to understand code in which a name can be +referenced before it is declared. + +_Efficient and simple compilation:_ This model has the same general costs as the +top-down with minimal deferred type-checking model. The implementation needs to +be able to separate parsing from type-checking, and type-check lazily in order +to handle forward references. However, more state needs to be accumulated before +type-checking can begin. + +This model supports separate compilation of source files, as no attempt is made +to ensure consistency across source files, only within a source file. + +_Diagnosability:_ The compilation process will be split up into multiple phases, +and it is likely that diagnostics from one phase will precede those from +another. For example, a parsing error for a later construct may precede a type +error for an earlier one. + +If an earlier phase fails, it may not be feasible to diagnose later phases. For +example, if parsing encounters an unrecoverable failure, type errors for the +successfully-parsed portion cannot be generated, because they may depend on +constructs appearing later in the same source file. However, this may also mean +that better diagnostics are produced for cases such as a missing `}`, where the +syntactic error will precede or prevent diagnostics for semantic errors caused +by the misinterpretation of the program resulting from that missing `}`. + +Within a particular phase of processing, diagnostics will generally be produced +in a topological order, and the processing can be arranged such that they are +produced in top-down order except where a forward reference requires that a +later declaration is checked earlier. + +_Toolability:_ Largely the same as the strict global consistency model, except +that tools only need to use the current source file as context. Also largely the +same as the top-down with minimally deferred type checking model. + +### Separate declaration and definition + +#### Allow separate declaration and definition + +We could allow two or more declarations of each entity, with at most one +declaration providing a definition and the other declarations merely introducing +the existence of the entity and its basic properties: + +``` +// Forward declaration. +fn F(); +// ... +// Another forward declaration of the same function. +fn F(); +// One and only definition. +fn F() {} +// Yet another forward declaration. +fn F(); +``` + +This could be done in either a free-for-all fashion as in the above example, +where any number of declarations is permitted, or in a more restricted fashion, +where there can be at most one declaration introducing an entity, and, if that +declaration is not a definition, at most one separate implementation providing +the definition. In this case, the separate definition could contain a syntactic +marker indicating that a prior declaration should exist: + +``` +// Forward declaration. +fn F(); +// ... +// Separate definition, with `impl` marker to indicate that this is just an +// implementation of an entity that was already declared. +impl fn F() {} +``` + +_Comprehensiblity:_ In general, determining whether two declarations declare the +same entity may not be straightforward. + +- We could require token-for-token identical declarations, but this may result + in ergonomic problems -- developers may wish to leave out information in an + interface that isn't relevant for consumers of that interface, or spell the + same type or parameter name differently in an implementation. +- If we allow the declarations to differ, there may be cases, as there are in + C++, where it's unclear whether two declarations are "sufficiently similar" + so as to declare the same entity. + +_Ergonomics:_ Any case where the declaration and definition are separated +results in repetitive code. However, this repetition may serve a purpose to the +extent that it's presenting an API description or acting as a compilation +barrier. + +_Readability:_ Readability of code may be substantially improved by separating a +declaration of an API from its definition, and permitting an API user to read, +in a small localized portion of a source file, only the information they care +about, without regard for implementation details. + +_Efficient and simple compilation:_ Allowing multiple declarations of an entity +introduces some implementation complexity, as the implementation must look up, +validate, and link together multiple declarations of the same entity. If +redeclarations are permitted across source files -- such as between the +interface and implementation of an API, or between multiple implementation files +-- then this may also require some stable mechanism for identifying the +declaration, such as a name mangling scheme. This may be especially complex in +the case of overloaded function templates, which might differ in arbitrary +Carbon expressions appearing in the function's declaration. This complexity +could be greatly reduced if we require all overloads to be declared together -- +at least in the same source file -- as we can then identify them by name and +index. + +This approach supports physical separation of implementation from interface, +potentially leading to build time wins through reduced recompilation and through +reducing the amount of information fed into each compilation step. + +_Diagnosability:_ Identifying errors where a redeclaration doesn't exactly match +a prior declaration is not completely straightforward. This is especially the +case when function overloads can be added by the same syntax with which a +definition of a prior declaration would be introduced. Even if declarations and +definitions use a different syntax, diagnosing a mismatch between a definition +of an overloaded function and a prior set of declarations requires some amount +of fuzzy matching to infer which one was probably intended. + +_Toolability:_ Supporting separate declarations and definitions will present the +same kinds of complexity for non-compiler tools as for compilers. Tools will +need to be able to reason about multiple declarations providing distinct +information about an entity and the effect of that on how the entity can be used +at different source locations. + +#### Disallow separate declaration and definition + +We could require each entity to be declared in exactly one place, and reject any +cases where the same entity is declared more than once. + +_Comprehensibility:_ This rule is easy to explain and easy to reason about. + +_Ergonomics:_ Minimizes the work required by the developer to implement new +functionality. + +_Readability:_ No ability to present an interface as source code without also +including implementation details. The ability to read and understand an +interface will depend more heavily on tools that can hide implementation +details, such as documentation generators and IDEs with outlining. + +_Efficient and simple compilation:_ This approach leads to a simpler compilation +strategy. However, every compilation referring to an entity needs to see -- +either in the same file or an imported file -- a definition of that entity, +meaning there is no physical separation between the APIs of entities and their +definitions, which imposes a performance cost on compilations. + +_Diagnosability:_ Because each entity can only be declared once, there is no +significant challenge in diagnosing redeclaration issues. However, there is +still some work required to diagnose conflicts between similar or functionally +identical function overloads. + +_Toolability:_ Having a unique source location for each entity allows for +somewhat simpler tooling. For example, there is no need to distinguish between +"jump to declaration" and "jump to definition", or to decide which declaration +should be consulted to find documentation comments, parameter names, and so on. From 2d8066c49e91935ab913dd56b9cae1f03466a5a1 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Fri, 7 Jan 2022 18:30:42 -0800 Subject: [PATCH 04/13] Update based on recent discussions. --- .../principles/information_accumulation.md | 123 ++++++++++++ proposals/p0875.md | 182 ++++++++++++++---- 2 files changed, 266 insertions(+), 39 deletions(-) create mode 100644 docs/project/principles/information_accumulation.md diff --git a/docs/project/principles/information_accumulation.md b/docs/project/principles/information_accumulation.md new file mode 100644 index 0000000000000..fec0afd841a97 --- /dev/null +++ b/docs/project/principles/information_accumulation.md @@ -0,0 +1,123 @@ +# Principle: Information accumulation + + + + + +## Table of contents + +- [Background](#background) +- [Principle](#principle) +- [Applications of this principle](#applications-of-this-principle) +- [Exceptions](#exceptions) +- [Alternatives considered](#alternatives-considered) + + + +## Background + +There are many different sources of information in a program, and a tool or a +human interpreting code will not in general have full information, but will +still want to draw conclusions about the code. + +Different languages take different approaches to this problem. For example: + +- In C, information is accumulated linearly in each source file independently, + and only information from earlier in the same file is available. A program + can observe that information is incomplete at one point and complete at + another. +- In C++, the behavior is largely similar to C, except: + - Within certain contexts in a class, information from later in the class + definition is available. + - With C++20 modules, information from other source files can be made + available. + - It is easier to observe -- perhaps even accidentally -- that information + is accumulated incrementally. +- In Rust, all information from the entire crate is available everywhere + within that crate, with exceptions for constructs like proc macros that can + see the state of the program being incrementally built. +- In Swift, all information from the entire source file is available within + that source file. + +It is problematic for the same entity to have different behaviors depending on +which information about it is available. This can lead to a violation of +[coherence](/docs/design/generics/goals.md#coherence) and program consistency. + +It is also problematic for both tools and human readers to make information +available prior to its introduction in a source file. + +- For a human reader, this violates the literary principle that ideas should + be introduced before they are referenced. +- For a tool, this makes it harder to reason about incomplete or invalid code + for which a prefix is valid, such as frequently happens while editing a + source file, and expands the scope of code that a tool must understand to + correctly interpret code. +- This also makes it harder to use tools such as bisection to determine the + cause of an error, because any part of the source file could contribute to + an error. + +## Principle + +In Carbon, information is accumulated incrementally within each source file. +Carbon programs are invalid if they would have a different meaning if more +information were available. + +Carbon source files can be interpreted top-down, without referring to +information that appears substantially later in a file. Source files are +expected to be organized into a topological order where that makes sense, with +forward declarations used to introduce names before they are first referenced +when necessary. + +If a program attempts to use information that has not yet been provided, the +program is invalid. There are multiple options for how this can be reported: + +- The program can be rejected as soon as it tries to use information that + might not be known yet. +- For the case where the information can only be provided in the same source + file, an assumption about the information can be made at the point where it + is needed, and the program can be rejected only if that assumption turns out + to be incorrect. + +Disallowing programs from changing meaning in the context of more information +ensures that the program is interpreted consistently or is rejected. This is +especially important to the coherence of generics and templates. + +This principle serves several goals: + +- [Language tools](/docs/project/goals.md#language-tools-and-ecosystem) should + be easier to write and maintain with information accumulating linearly in + the source file. +- [Language evolution](/docs/project/goals.md#software-and-language-evolution) + options for revisiting this choice are kept open by ensuring that allowing + more information to flow backwards is a non-breaking change. +- [Understandability of code](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) + is improved by requiring source files to introduce information before it is + used. + +## Applications of this principle + +- As in C++, and unlike in Rust and Swift, name lookup only finds names + declared earlier. +- Classes are incomplete until the end of their definition. Unlike in C++, any + attempt to observe a property of an incomplete class that is not known until + the class is complete renders the program invalid. +- When an `impl` needs to be resolved, only those `impl`s that have already + been declared are considered. However, if a later `impl` would change the + result of any earlier `impl` lookup, the program is invalid. + +## Exceptions + +Because a class is not complete until its definition has been fully parsed, +applying this rule would make it impossible to define most member functions +within the class definition. In order to still provide the convenience of +defining class member functions inline, such member function bodies are deferred +and processed as if they appeared immediately after the end of the outermost +enclosing class, like in C++. + +## Alternatives considered + +- [Allow information to be used before it is provided](/proposals/p0875.md#strict-global-consistency) diff --git a/proposals/p0875.md b/proposals/p0875.md index d6ed31759f7da..09ed600cbf98a 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -27,9 +27,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Strict top-down](#strict-top-down) - [Strict global consistency](#strict-global-consistency) - [Top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) + - [Top-down with deferred method bodies](#top-down-with-deferred-method-bodies) - [Context-sensitive local consistency](#context-sensitive-local-consistency) - - [Separate declaration and definition](#separate-declaration-and-definition) - - [Allow separate declaration and definition](#allow-separate-declaration-and-definition) - [Disallow separate declaration and definition](#disallow-separate-declaration-and-definition) @@ -131,36 +130,61 @@ salient to a user and not the implementation details. ## Proposal -Carbon will provide -[context-sensitive local consistency](#context-sensitive-local-consistency), +Carbon will use a +[top-down approach with deferred method bodies](#top-down-with-deferred-method-bodies), with [forward declarations](#allow separate declaration and definition). +In any case where the [strict global consistency](#strict-global-consistency) +model would given a different result, the program is invalid. This does not +apply to name lookup, however: the set of names available in a scope are +expected to increase throughout the scope, and later names should not be visible +earlier. + ## Details -Within a single source file, each entity has only one meaning and only one set -of properties, that does not depend on where in the source file you are. +Each entity has only one meaning and only one set of properties, that does not +depend on which source file you are within or where in that source file you are. +However, not all information will be available everywhere, and it is an error to +attempt to use information that is not available -- either because it's in a +different source file and not exported or not imported, or because it appears +later in the same source file. + +Name lookup only provides earlier-declared names. However, the bodies of class +member functions that are defined inside the class are reordered to after the +class, so all class member names are declared before the class body is seen. +Unlike in C++, classes are either incomplete or complete, with no intermediate +state. Any attempt to perform name lookup into an incomplete class is an error, +even if the name being looked up has already been declared. -There are two different name lookup behaviors: +``` +class A { + fn F() { + // OK: body of F reordered to after the end of class A, so + // B is in scope here. + var b1: B; + // OK: A is a complete class type here. + var b2: A.B; + } -- For scopes that are inherently ordered -- function bodies and braced code - blocks within them -- name lookup only finds earlier-declared entities in - that scope. -- For scopes in which order is largely immaterial -- all non-statement scopes - -- name lookup always finds all entities declared in the same file, - regardless of where the name is declared relative to the point of lookup. + class B {} -The latter rule is used for class scopes, even though the relative order of -field declarations in a class is relevant to the layout of the class and to the -behavior of the program. + // OK. + var p: B*; + // Error: cannot look up B within incomplete class A. + var q: A.B*; +} +``` Forward declarations are permitted in order to allow separation of interface and -implementation whenever the developer wishes to do so. All declarations of an -entity are required to be part of the same library, but can be in different -source files. If a source file can only see a declaration of an entity and not -its definition, some uses of that entity will be invalid that would be valid if -the definition were visible. For example, without a class definition, we may be -unable to create instances of the class, and without a function definition, we -may be unable to use the function to compute a compile-time constant. +implementation whenever the developer wishes to do so, and introduce names early +in order to break cycles or when the developer does not wish to define entities +in topological order. All declarations of an entity are required to be part of +the same library, but can be in different source files. If a point within a +source file can only see a declaration of an entity and not its definition, some +uses of that entity will be invalid that would be valid if the definition were +visible. For example, without a class definition, we may be unable to create +instances of the class, and without a function definition, we may be unable to +use the function to compute a compile-time constant. ### Goals @@ -195,8 +219,15 @@ Carbon goals: - [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) - See "Toolability" goal. - [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) - - TODO: Order-independence improves the ability to evolve code on a small - scale. + - Ensuring that the program interpretation is consistent with an + interpretation with full information makes it easier to evolve code, as + adding more imports cannot change the meaning of programs. + - Selecting a rule that disallows information from flowing backwards keeps + open more language evolution paths: + - We could allow more information to flow backwards. + - We could expose metaprogramming constructs that can inspect the + state of a partial source file without creating inconsistency in our + model. - [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) - See "Readability", "Ergonomics", and "Comprehensibility" gaols. - [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development) @@ -470,6 +501,73 @@ of "damaged" code is most likely to be relevant context. In practice, this is likely to present the same difficulties as tooling in the [strict global consistency](#strict-global-consistency) model. +#### Top-down with deferred method bodies + +We could follow a top-down approach generally, but defer all processing of +bodies of class member functions until we have finished processing the class. +For example, we would reinterpret: + +```carbon +class A { + fn F[me: Self]() -> i32 { return me.n; } + fn Make() -> A { return {.n = 5}; } + var n: i32; +} +``` + +exactly as if it were written with the member function bodies out of line: + +```carbon +class A { + fn F[me: Self]() -> i32; + fn Make() -> A; + var n: i32; +} +fn A.F[me: Self]() -> i32 { return me.n; } +fn A.Make() -> A { return {.n = 5}; } +``` + +The [strict top-down](#strict-top-down) would be applied to this rewritten form +of the program. + +As in C++, member functions of nested classes would be deferred until the +outermost class is complete. Unlike in C++, this deferral would apply only to +the bodies of member functions, and not to default arguments or other contexts +inside the class. + +This approach generally has the properties as +[strict top-down](#strict-top-down), except as follows: + +_Comprehensibility:_ Slightly reduced due to additional special-casing of class +member functions, but these rules are well-aligned with the rules from C++, +which do not seem to suffer from major comprehensibility problems in this area. + +_Ergonomics:_ Improved within classes, as members can now be named within member +function bodies. + +_Readability:_ No major concerns are anticipated. The very similar rule in C++ +is not known to cause readability problems. + +_Efficient and simple compilation:_ The complexity of this approach is not +substantially greater than the simple top-down approach. Processing of member +function bodies must be deferred, but this can be done either by storing and +replaying the tokens, or by forming a parse tree but deferring the semantic +analysis and type-checking until the end of the class. + +_Diagnosability:_ Diagnostics for errors in member functions may appear after +diagnostics for later code in the same class. + +Developers may be confused by diagnostics claiming that a class member function +body is not available yet and referring to a point in the program text that +lexically follows the member function definition. It is likely possible to +special-case such diagnostics to explain the nature of the problem. + +_Toolability:_ Tools that want to parse incomplete Carbon code would need to +cope with member function bodies containing errors. For the most part, skipping +a brace-balanced function body should be straightforward, but tools will also +need to consider whether they attempt to detect and recover from mismatched +braces within member functions. + #### Context-sensitive local consistency We could use different behaviors in different contexts, as follows: @@ -539,26 +637,32 @@ that better diagnostics are produced for cases such as a missing `}`, where the syntactic error will precede or prevent diagnostics for semantic errors caused by the misinterpretation of the program resulting from that missing `}`. -Within a particular phase of processing, diagnostics will generally be produced -in a topological order, and the processing can be arranged such that they are -produced in top-down order except where a forward reference requires that a -later declaration is checked earlier. + Within a particular phase of processing, + diagnostics will generally be produced in a topological order, + and the processing can be arranged such that they are produced in top - + down order except where a forward reference requires that a later + declaration is checked earlier. -_Toolability:_ Largely the same as the strict global consistency model, except -that tools only need to use the current source file as context. Also largely the -same as the top-down with minimally deferred type checking model. + _Toolability + : _ Largely the same as the strict global consistency model, + except that tools only need to use the current source file + as context.Also largely the same as the top - + down with minimally deferred type checking model. -### Separate declaration and definition + ## #Separate declaration and definition -#### Allow separate declaration and definition + ####Allow separate declaration and definition -We could allow two or more declarations of each entity, with at most one -declaration providing a definition and the other declarations merely introducing -the existence of the entity and its basic properties: + We could allow two or + more declarations of each entity, + with at most one declaration providing a definition and the other + declarations merely introducing the existence of the entity and its + basic properties : ``` -// Forward declaration. -fn F(); + // Forward declaration. + fn + F(); // ... // Another forward declaration of the same function. fn F(); From 7eac76ec38bdb682fca9d5712b4b4715f25877b1 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Mon, 10 Jan 2022 14:11:59 -0800 Subject: [PATCH 05/13] Address review comments and undo mess that prettier made. --- proposals/p0875.md | 85 +++++++++++++++++++++++++++------------------- 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index 09ed600cbf98a..f2e6aef5085eb 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -29,6 +29,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) - [Top-down with deferred method bodies](#top-down-with-deferred-method-bodies) - [Context-sensitive local consistency](#context-sensitive-local-consistency) + - [Separate declaration and definition](#separate-declaration-and-definition) + - [Allow separate declaration and definition](#allow-separate-declaration-and-definition) - [Disallow separate declaration and definition](#disallow-separate-declaration-and-definition) @@ -132,7 +134,7 @@ salient to a user and not the implementation details. Carbon will use a [top-down approach with deferred method bodies](#top-down-with-deferred-method-bodies), -with [forward declarations](#allow separate declaration and definition). +with [forward declarations](#allow-separate-declaration-and-definition). In any case where the [strict global consistency](#strict-global-consistency) model would given a different result, the program is invalid. This does not @@ -142,19 +144,31 @@ earlier. ## Details -Each entity has only one meaning and only one set of properties, that does not -depend on which source file you are within or where in that source file you are. -However, not all information will be available everywhere, and it is an error to -attempt to use information that is not available -- either because it's in a -different source file and not exported or not imported, or because it appears -later in the same source file. +Each entity has only one meaning and only one set of properties; that meaning +and those properties do not depend on which source file you are within or where +in that source file you are. However, not all information will be available +everywhere, and it is an error to attempt to use information that is not +available -- either because it's in a different source file and not exported or +not imported, or because it appears later in the same source file. + +For example, a full definition of a class `Widget` may be available in some +parts of the program, with only a forward declaration available in other parts +of the program. In this case, any attempt to use `Widget` where only a forward +declaration is available will always either result in the program being invalid +or doing the same thing that would happen if the full information were +available. This is a stronger guarantee than is provided by C++'s One Definition +Rule, which guarantees that every use sees an equivalent definition or no +definition, but does not guarantee that every use has the same behavior -- in +C++ a function could do one thing if `Widget` is defined and a different thing +if `Widget` is only forward declared, and that will not be possible in Carbon. Name lookup only provides earlier-declared names. However, the bodies of class -member functions that are defined inside the class are reordered to after the -class, so all class member names are declared before the class body is seen. -Unlike in C++, classes are either incomplete or complete, with no intermediate -state. Any attempt to perform name lookup into an incomplete class is an error, -even if the name being looked up has already been declared. +member functions that are defined inside the class are reordered to be +separately defined after the class, so all class member names are declared +before the class body is seen. Unlike in C++, classes are either incomplete or +complete, with no intermediate state. Any attempt to perform name lookup into an +incomplete class is an error, even if the name being looked up has already been +declared. ``` class A { @@ -221,7 +235,9 @@ Carbon goals: - [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) - Ensuring that the program interpretation is consistent with an interpretation with full information makes it easier to evolve code, as - adding more imports cannot change the meaning of programs. + changing the amount of visible information -- for example, by reordering + code or by adding or removing imports -- cannot change the meaning of a + program from one valid meaning to another. - Selecting a rule that disallows information from flowing backwards keeps open more language evolution paths: - We could allow more information to flow backwards. @@ -232,6 +248,11 @@ Carbon goals: - See "Readability", "Ergonomics", and "Comprehensibility" gaols. - [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development) - See "Efficient and simple compilation" and "Diagnosability" goals. +- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + - The chosen rule is mostly the same as the corresponding rule in C++, + both in accumulating information top-down and in treating inline method + bodies as a special case. This should keep migration simple and improve + familiarity for experienced C++ developers. ## Alternatives considered @@ -590,7 +611,9 @@ that likely better matches programmer expectations. Compared to the [top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) -model, this would not require forward declarations for namespace and package +and +[top-down with deferred method bodies](#top-down-with-deferred-method-bodies) +models, this would not require forward declarations for namespace and package members that are declared or defined later in the same source file. _Comprehensibility:_ This model is probably a little easier to understand than @@ -637,32 +660,26 @@ that better diagnostics are produced for cases such as a missing `}`, where the syntactic error will precede or prevent diagnostics for semantic errors caused by the misinterpretation of the program resulting from that missing `}`. - Within a particular phase of processing, - diagnostics will generally be produced in a topological order, - and the processing can be arranged such that they are produced in top - - down order except where a forward reference requires that a later - declaration is checked earlier. +Within a particular phase of processing, diagnostics will generally be produced +in a topological order, and the processing can be arranged such that they are +produced in top-down order except where a forward reference requires that a +later declaration is checked earlier. - _Toolability - : _ Largely the same as the strict global consistency model, - except that tools only need to use the current source file - as context.Also largely the same as the top - - down with minimally deferred type checking model. +_Toolability:_ Largely the same as the strict global consistency model, except +that tools only need to use the current source file as context. Also largely the +same as the top-down with minimally deferred type checking model. - ## #Separate declaration and definition +### Separate declaration and definition - ####Allow separate declaration and definition +#### Allow separate declaration and definition - We could allow two or - more declarations of each entity, - with at most one declaration providing a definition and the other - declarations merely introducing the existence of the entity and its - basic properties : +We could allow two or more declarations of each entity, with at most one +declaration providing a definition and the other declarations merely introducing +the existence of the entity and its basic properties: ``` - // Forward declaration. - fn - F(); +// Forward declaration. +fn F(); // ... // Another forward declaration of the same function. fn F(); From 9e8169745ee9de0b9eaf1581d947632234cd5583 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Mon, 10 Jan 2022 14:34:25 -0800 Subject: [PATCH 06/13] More responses to review comments. --- proposals/p0875.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index f2e6aef5085eb..df13efea59bcb 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -767,10 +767,20 @@ _Comprehensibility:_ This rule is easy to explain and easy to reason about. _Ergonomics:_ Minimizes the work required by the developer to implement new functionality. -_Readability:_ No ability to present an interface as source code without also -including implementation details. The ability to read and understand an -interface will depend more heavily on tools that can hide implementation -details, such as documentation generators and IDEs with outlining. +_Readability:_ Each entity has only one declaration, giving a single canonical +location to include documentation, and no question as to where information about +the entity might be found. This will likely improve the navigability of source +code and the ease with which all the information about an entity can be +gathered, and reduce the chance that information in the declaration or +definition will be incomplete or that they will be inconsistent with each other. + +However, there would be no ability to present a function or class as source code +without also including implementation details. Such implementation details +cannot be physically separated from the API as presented to clients. The ability +to read and understand an interface will depend more heavily on tools that can +hide implementation details, such as documentation generators and IDEs with +outlining, and implementation techniques such as using `interface`s to present +the API of a class separate from its implementation. _Efficient and simple compilation:_ This approach leads to a simpler compilation strategy. However, every compilation referring to an entity needs to see -- From 85682d2cc12b154bc33ec21137375ad81885237b Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Mon, 10 Jan 2022 15:09:20 -0800 Subject: [PATCH 07/13] Add section describing how the selected alternative was chosen. --- proposals/p0875.md | 52 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/proposals/p0875.md b/proposals/p0875.md index df13efea59bcb..a97a812ad2502 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -21,6 +21,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Proposal](#proposal) - [Details](#details) - [Goals](#goals) + - [Choice of alternative](#choice-of-alternative) - [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) - [Alternatives considered](#alternatives-considered) - [Information flow](#information-flow) @@ -228,6 +229,57 @@ Carbon goals: "hole" in the source file at the cursor position that will be filled by unknown code. +### Choice of alternative + +We have a large space of options here, all of which have both benefits and +drawbacks. The rationale for our choice is as follows: + +It is important to allow physically separating declaration from definition, both +to allow the code author to separate the interface of a library from its +implementation and to improve build performance. This is also an important +aspect in providing a language that is familiar to C++ developers. So we choose +to +[allow separate declaration and definition](#allow-separate-declaration-and-definition). + +Once declarations and definitions can be separated, there will be cases where +the meaning of an entity differs in different parts of the program, such as +where only a declaration but not a definition of a class is available in some +source file or package. In order for programs to behave consistently and +predictably, especially in the presence of templates and generics, we do not +want to allow the same operation to have two or more different behaviors +depending on which information is available. So we choose to make programs +invalid if they would make use of information that is not available. + +For build performance reasons, we also want to avoid package-at-a-time or even +library-at-a-time compilation, and want a file-at-a-time compilation strategy. +This implies that if we do make information available before the point where it +is introduced, we should only do that within a single source file, and should +make no attempt to automatically resolve cycles within a package or library. + +This leaves the question of what happens within a single source file: do +entities need to be manually written in dependency order, or is the +implementation expected to use information that appears after its point of +declaration? Neither option guarantees that an entity will behave consistently +wherever it's used, because we already gave up that guarantee for entities +visible across source files, and we need to cope with the resulting risk of +incoherence anyway. + +The simplest option for implementation, and the one most similar to C++, would +then be to accumulate information top-down. This is also the choice that is most +consistent with the cross-source-file behavior: moving a prefix of a source file +into a separate, imported file should generally preserve the validity of the +program. Further, this choice gives us the most opportunity for future language +evolution, as it is the most restrictive option, and hence it is +forward-compatible with the other rules under consideration, as well as +supporting paths for evolution that require a top-down view, such as some +approaches to metaprogramming. + +Finally, because class member functions could not be meaningfully defined inline +under a strict top-down rule, because the class would not yet be complete, we +make an ergonomic affordance of reordering the bodies of such functions as if +they appear after the class definition. This also improves consistency with C++, +which has a similar rule. + ## Rationale based on Carbon's goals - [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) From 3fa2ae452467a99aae3d08287bf567f9d896bf8d Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Mon, 10 Jan 2022 15:10:25 -0800 Subject: [PATCH 08/13] Simplify example to assume less about undecided language questions. --- proposals/p0875.md | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index a97a812ad2502..710366e75fc30 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -174,19 +174,14 @@ declared. ``` class A { fn F() { - // OK: body of F reordered to after the end of class A, so - // B is in scope here. - var b1: B; - // OK: A is a complete class type here. - var b2: A.B; + // OK: A and B are complete class types here. + var b: A.B; } class B {} - // OK. - var p: B*; // Error: cannot look up B within incomplete class A. - var q: A.B*; + var p: A.B*; } ``` From e98f7835cbb453e30f1f7bdba4ae94853b0ceab0 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Mon, 10 Jan 2022 16:07:04 -0800 Subject: [PATCH 09/13] More responses to review comments. --- proposals/p0875.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index 710366e75fc30..30ee233f4fef7 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -571,9 +571,12 @@ likely to present the same difficulties as tooling in the #### Top-down with deferred method bodies +**This is the proposed approach.** + We could follow a top-down approach generally, but defer all processing of -bodies of class member functions until we have finished processing the class. -For example, we would reinterpret: +bodies of class member functions until we have finished processing the class, at +which point we would process those member function bodies in the order they +appeared within the class. For example, we would reinterpret: ```carbon class A { From 7642dcfb9f9f8abca16da4000314dc9b01d0f3a0 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Thu, 13 Jan 2022 18:47:56 -0800 Subject: [PATCH 10/13] Update based on review comments. --- .../principles/information_accumulation.md | 24 +-- proposals/p0875.md | 176 +++++++++++++----- 2 files changed, 135 insertions(+), 65 deletions(-) diff --git a/docs/project/principles/information_accumulation.md b/docs/project/principles/information_accumulation.md index fec0afd841a97..3b70bec8c91f8 100644 --- a/docs/project/principles/information_accumulation.md +++ b/docs/project/principles/information_accumulation.md @@ -43,23 +43,6 @@ Different languages take different approaches to this problem. For example: - In Swift, all information from the entire source file is available within that source file. -It is problematic for the same entity to have different behaviors depending on -which information about it is available. This can lead to a violation of -[coherence](/docs/design/generics/goals.md#coherence) and program consistency. - -It is also problematic for both tools and human readers to make information -available prior to its introduction in a source file. - -- For a human reader, this violates the literary principle that ideas should - be introduced before they are referenced. -- For a tool, this makes it harder to reason about incomplete or invalid code - for which a prefix is valid, such as frequently happens while editing a - source file, and expands the scope of code that a tool must understand to - correctly interpret code. -- This also makes it harder to use tools such as bisection to determine the - cause of an error, because any part of the source file could contribute to - an error. - ## Principle In Carbon, information is accumulated incrementally within each source file. @@ -120,4 +103,9 @@ enclosing class, like in C++. ## Alternatives considered -- [Allow information to be used before it is provided](/proposals/p0875.md#strict-global-consistency) +- Allow information to be used before it is provided + [globally](/proposals/p0875.md#strict-global-consistency), + [within a file](/proposals/p0875.md#context-sensitive-local-consistency), or + [within a top-level declaration](/proposals/p0875.md#top-down-with-minimally-deferred-type-checking). +- [Do not allow inline method bodies to use members before they are declared](/proposals/p0875.md#strict-top-down) +- [Do not allow separate declaration and definition](/proposals/p0875.md#disallow-separate-declaration-and-definition) diff --git a/proposals/p0875.md b/proposals/p0875.md index 30ee233f4fef7..1c7a3857115db 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -201,7 +201,7 @@ use the function to compute a compile-time constant. For this proposal, we have the following goals as refinements of the overall Carbon goals: -- _Comprehensiblity._ Our rules should be understandable, and should minimize +- _Comprehensibility._ Our rules should be understandable, and should minimize surprise and gotchas. Our behavior should be self-consistent, and explainable in only a few sentences. - _Ergonomics._ It should be easy to express common developer desires, without @@ -229,29 +229,39 @@ Carbon goals: We have a large space of options here, all of which have both benefits and drawbacks. The rationale for our choice is as follows: -It is important to allow physically separating declaration from definition, both -to allow the code author to separate the interface of a library from its -implementation and to improve build performance. This is also an important -aspect in providing a language that is familiar to C++ developers. So we choose -to -[allow separate declaration and definition](#allow-separate-declaration-and-definition). - -Once declarations and definitions can be separated, there will be cases where -the meaning of an entity differs in different parts of the program, such as -where only a declaration but not a definition of a class is available in some -source file or package. In order for programs to behave consistently and -predictably, especially in the presence of templates and generics, we do not -want to allow the same operation to have two or more different behaviors -depending on which information is available. So we choose to make programs -invalid if they would make use of information that is not available. - -For build performance reasons, we also want to avoid package-at-a-time or even -library-at-a-time compilation, and want a file-at-a-time compilation strategy. -This implies that if we do make information available before the point where it -is introduced, we should only do that within a single source file, and should -make no attempt to automatically resolve cycles within a package or library. - -This leaves the question of what happens within a single source file: do +The first consideration is whether to provide a +[globally consistent view](#strict-global-consistency) of code. Providing such a +view would require us to find a compatible build process that can scale to large +projects, and we do not currently have a design for such a process. Moreover, +this is likely to be a choice that is relatively hard to evolve away from. A +non-scalable build process would be an existential threat to the Carbon language +under its stated goals, so we set this option aside. This should be revisited if +a complete design for a scalable build process with a globally-consistent +semantic model is proposed. + +Once we step away from strict global consistency, there will be cases where the +meaning of an entity differs in different parts of the program. In order for +programs to behave consistently and predictably, especially in the presence of +templates and generics, we do not want to allow the same operation to have two +or more different behaviors depending on which information is available. So we +choose to make programs invalid if they would make use of information that is +not available. + +Even though we have chosen not to provide a globally consistent view, we could +still choose to provide a consistent view throughout a package, library, or +source file, such as in the +[context-sensitive local consistency](#context-sensitive-local-consistency) +model. However, again for build performance reasons, we want to avoid +package-at-a-time or even library-at-a-time compilation, and would like a +file-at-a-time compilation strategy. The cost of having different views in +different files is substantially reduced given that we have already chosen to +have different views at least in different packages. So we allow the known +information about an entity to vary between source files, even in the same +library. + +This implies that we at least need forward declarations for entities that are +declared in an `api` file for a library and defined in an `impl` file. However, +this leaves the question of what happens within a single source file: do entities need to be manually written in dependency order, or is the implementation expected to use information that appears after its point of declaration? Neither option guarantees that an entity will behave consistently @@ -269,11 +279,22 @@ forward-compatible with the other rules under consideration, as well as supporting paths for evolution that require a top-down view, such as some approaches to metaprogramming. -Finally, because class member functions could not be meaningfully defined inline -under a strict top-down rule, because the class would not yet be complete, we -make an ergonomic affordance of reordering the bodies of such functions as if -they appear after the class definition. This also improves consistency with C++, -which has a similar rule. +This choice implies that we need forward declarations, both for declarations in +different source files from their definitions and to resolve cycles or cases +where the developer cannot or does not wish to write code in topological order. +Allowing separate declaration and definition may also be valuable to allow the +code author to hide implementation details from readers of the interface, and +present a condensed catalog or table of contents for the library. This is also +an important aspect in providing a language that is familiar to C++ developers. +So we choose to +[allow separate declaration and definition](#allow-separate-declaration-and-definition) +in general. + +Given the above decisions, class member functions could not be meaningfully +defined inline under a strict top-down rule, because the class would not yet be +complete, we make an ergonomic affordance of reordering the bodies of such +functions as if they appear after the class definition. This also improves +consistency with C++, which has a similar rule. ## Rationale based on Carbon's goals @@ -426,7 +447,9 @@ struct X { In order to give globally consistent semantics to, for example, a package name, we would likely need to process all source files comprising a package at the -same time. +same time. This is likely to encourage packages to be small, whereas we have +previously expected the scope of a package to match that of a complete source +code repository. This alternative can be considered either with or without the ability to separate declarations from definitions. If we permit separate declaration and @@ -434,14 +457,16 @@ definition, we would likely require declarations of the same entity to appear in the same library; however, that limitation is desirable regardless of which strategy we choose. -_Comprehensibility:_ This rule is simple to explain, and has no special cases. -The disallowance of semantic cycles is likely to be unsurprising as it is a -logical necessity in any rule. - Applying this rule to local name lookup in block scope does result in some surprises. For example, C# uses this approach, and combined with its disallowance of shadowing of local variables, this [confuses some developers](https://stackoverflow.com/questions/1196941/variable-scope-confusion-in-c-sharp). +As a variant of this alternative, it would be reasonable and in line with likely +programmer expectations to not apply this rule to names in block scope. + +_Comprehensibility:_ This rule is simple to explain, and has no special cases, +other than perhaps the block scope variant. The disallowance of semantic cycles +is likely to be unsurprising as it is a logical necessity in any rule. _Ergonomics:_ The developer can organize or arrange their code in any way they desire. There is never a requirement to forward-declare or repeat an interface @@ -475,6 +500,17 @@ compilation. It may increase the scope of recompilations due to additional physical dependencies unless we implement a mechanism to detect whether a library has changed in an "important" way. +We do not have an existence proof of an efficient build strategy for a +Carbon-like language that follows this model. For example, Rust has known build +scalability issues, as does Java for large projects unless a tool like +[ijar](https://github.com/bazelbuild/bazel/tree/master/third_party/ijar) is used +to automatically create something resembling a C++ header file from a source +file. There is a significant risk that such an automated tool would be very hard +to create for Carbon: because the definitions of classes and functions can be +exposed in various ways through compile-time function evaluation, templates, and +metaprogramming, it seems challenging to put any useful bound on which entities +can be stripped without restricting the capabilities of client code. + _Diagnosability:_ The implementation is likely to have more contextual information when providing diagnostics, improving their quality. However, the diagnostics may appear in a confusing order: if an early declaration needs @@ -539,7 +575,20 @@ within the class. _Readability:_ The ability to read and understand code may be somewhat harmed, as the rules for behavior across top-level declarations aren't the same as the -rules for behavior within a top-level declaration. +rules for behavior within a top-level declaration. It may be especially jarring +that code that is valid within a class definition is not valid at the top level: + +``` +class A { + // OK + var b: B; + class B {} +} + +// Error, I have not heard of B. +var b: B; +class B {} +``` However, the behavior of an entity no longer differs at different points within the same declaration of that entity, so the ability to reason about the behavior @@ -616,8 +665,10 @@ which do not seem to suffer from major comprehensibility problems in this area. _Ergonomics:_ Improved within classes, as members can now be named within member function bodies. -_Readability:_ No major concerns are anticipated. The very similar rule in C++ -is not known to cause readability problems. +_Readability:_ The ability to read and understand code may be somewhat harmed, +as the rules for behavior within a member function body are different from the +rules everywhere else in the language. However, the very similar rule in C++ is +not known to cause readability problems, so no major concerns are anticipated. _Efficient and simple compilation:_ The complexity of this approach is not substantially greater than the simple top-down approach. Processing of member @@ -755,16 +806,27 @@ fn F(); impl fn F() {} ``` -_Comprehensiblity:_ In general, determining whether two declarations declare the -same entity may not be straightforward. +_Comprehensibility:_ In general, determining whether two declarations declare +the same entity may not be straightforward, particularly for a redeclaration of +an overloaded function. We have some options, but they all have challenges for +comprehensibility or ergonomics of the rule: - We could require token-for-token identical declarations, but this may result in ergonomic problems -- developers may wish to leave out information in an interface that isn't relevant for consumers of that interface, or spell the same type or parameter name differently in an implementation. -- If we allow the declarations to differ, there may be cases, as there are in - C++, where it's unclear whether two declarations are "sufficiently similar" - so as to declare the same entity. +- We could allow the declarations to differ so long as they have the same + meaning. However, there may be cases, as there are in C++, where it's + unclear whether two declarations are "sufficiently similar" so as to declare + the same entity. +- We could require all overloads of a function to be declared together in a + group, perhaps with some dedicated syntax, and then match up redeclarations + based on position in the group rather than signature. Such an approach is + likely to lead to developers, especially those familiar with the C++ rule, + being surprised. It will likely also interact poorly with cases where some + but not all of the overloads are defined in their first declaration and the + rest are defined separately, or where it is desirable for different + overloads to be defined in different implementation files. _Ergonomics:_ Any case where the declaration and definition are separated results in repetitive code. However, this repetition may serve a purpose to the @@ -815,7 +877,8 @@ cases where the same entity is declared more than once. _Comprehensibility:_ This rule is easy to explain and easy to reason about. _Ergonomics:_ Minimizes the work required by the developer to implement new -functionality. +functionality and maintain existing functionality. Reduces the chance that a +change in one place will be forgotten in another. _Readability:_ Each entity has only one declaration, giving a single canonical location to include documentation, and no question as to where information about @@ -833,10 +896,29 @@ outlining, and implementation techniques such as using `interface`s to present the API of a class separate from its implementation. _Efficient and simple compilation:_ This approach leads to a simpler compilation -strategy. However, every compilation referring to an entity needs to see -- -either in the same file or an imported file -- a definition of that entity, -meaning there is no physical separation between the APIs of entities and their -definitions, which imposes a performance cost on compilations. +strategy. However, in the absence of forward declarations, every compilation +referring to an entity needs to have a build dependency on the full definition +of that entity, meaning there is no physical separation between the APIs of +entities and their definitions. This would imposes a potentially-significant +build time cost due to increased recompilations when the definition of an entity +changes. + +There are possibilities to reduce this build time cost. For example, instead of +each declaration identifying whether it is public or private to a library, we +could add a third visibility level to say that the declaration but not the +definition is public, and use a stripping tool as part of the build process to +avoid dependencies on non-public definitions from cascading into unnecessary +rebuilds. + +This approach is also unlikely to be acceptable from an ergonomic strategy +unless declarations within a source file are visible throughout at least the +entire library containing that file, although this is strictly an independent +choice. Making definitions from a source file visible throughout its library +would presumably require a library-at-a-time build strategy, which is also +expected to introduce build time costs due to changes to a library requiring +more code in that library to be recompiled. This cost could be reduced by using +a reactive compilation model, recompiling only the portions of a library that +are affected by a code change, at some complexity cost. _Diagnosability:_ Because each entity can only be declared once, there is no significant challenge in diagnosing redeclaration issues. However, there is From 570dfc2f8d825d39f0e6be60fc15d8c3ef91bb0f Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Fri, 14 Jan 2022 14:17:15 -0800 Subject: [PATCH 11/13] Be more explicit about how the Carbon rule differs from the C++ rule. --- proposals/p0875.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index 1c7a3857115db..84106d1950542 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -651,9 +651,9 @@ The [strict top-down](#strict-top-down) would be applied to this rewritten form of the program. As in C++, member functions of nested classes would be deferred until the -outermost class is complete. Unlike in C++, this deferral would apply only to -the bodies of member functions, and not to default arguments or other contexts -inside the class. +outermost class is complete. This deferral would apply only to the bodies of +member functions, unlike in C++ where it also applies to default arguments, +default member initializers, and exception specifications. This approach generally has the properties as [strict top-down](#strict-top-down), except as follows: From 7aff47394f1134049678282da72b0737361a9a24 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Fri, 25 Feb 2022 17:52:45 -0800 Subject: [PATCH 12/13] Respond to review comments. --- .../principles/information_accumulation.md | 12 ---- proposals/p0875.md | 60 +++++++++++++++++-- 2 files changed, 54 insertions(+), 18 deletions(-) diff --git a/docs/project/principles/information_accumulation.md b/docs/project/principles/information_accumulation.md index 3b70bec8c91f8..6fcfb632c5337 100644 --- a/docs/project/principles/information_accumulation.md +++ b/docs/project/principles/information_accumulation.md @@ -69,18 +69,6 @@ Disallowing programs from changing meaning in the context of more information ensures that the program is interpreted consistently or is rejected. This is especially important to the coherence of generics and templates. -This principle serves several goals: - -- [Language tools](/docs/project/goals.md#language-tools-and-ecosystem) should - be easier to write and maintain with information accumulating linearly in - the source file. -- [Language evolution](/docs/project/goals.md#software-and-language-evolution) - options for revisiting this choice are kept open by ensuring that allowing - more information to flow backwards is a non-breaking change. -- [Understandability of code](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) - is improved by requiring source files to introduce information before it is - used. - ## Applications of this principle - As in C++, and unlike in Rust and Swift, name lookup only finds names diff --git a/proposals/p0875.md b/proposals/p0875.md index 84106d1950542..9f0e621ff0007 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -27,6 +27,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Information flow](#information-flow) - [Strict top-down](#strict-top-down) - [Strict global consistency](#strict-global-consistency) + - [Block-scope exception](#block-scope-exception) + - [Analysis](#analysis) - [Top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) - [Top-down with deferred method bodies](#top-down-with-deferred-method-bodies) - [Context-sensitive local consistency](#context-sensitive-local-consistency) @@ -138,10 +140,12 @@ Carbon will use a with [forward declarations](#allow-separate-declaration-and-definition). In any case where the [strict global consistency](#strict-global-consistency) -model would given a different result, the program is invalid. This does not -apply to name lookup, however: the set of names available in a scope are -expected to increase throughout the scope, and later names should not be visible -earlier. +model with the [block-scope exception](#block-scope-exception) would consider +the program to be invalid or would assign it a different meaning, the program is +invalid. + +Put another way: we accept exactly the cases where those two models both accept +and give the same result, and reject all other cases. ## Details @@ -292,7 +296,7 @@ in general. Given the above decisions, class member functions could not be meaningfully defined inline under a strict top-down rule, because the class would not yet be -complete, we make an ergonomic affordance of reordering the bodies of such +complete. We make an ergonomic affordance of reordering the bodies of such functions as if they appear after the class definition. This also improves consistency with C++, which has a similar rule. @@ -322,6 +326,13 @@ consistency with C++, which has a similar rule. bodies as a special case. This should keep migration simple and improve familiarity for experienced C++ developers. +It should be noted that the decision between the model presented here and the +leading alternative, which was +[context-sensitive local consistency](#context-sensitive-local-consistency), was +very close, and came down to minor details. For each of our goals, neither +alternative had a significant advantage. The similarity to C++ and evolvability +of the alternative in this proposal were ultimately the deciding factors. + ## Alternatives considered Below, various alternatives are presented and rated according to the @@ -457,12 +468,41 @@ definition, we would likely require declarations of the same entity to appear in the same library; however, that limitation is desirable regardless of which strategy we choose. +##### Block-scope exception + Applying this rule to local name lookup in block scope does result in some surprises. For example, C# uses this approach, and combined with its disallowance of shadowing of local variables, this [confuses some developers](https://stackoverflow.com/questions/1196941/variable-scope-confusion-in-c-sharp). As a variant of this alternative, it would be reasonable and in line with likely -programmer expectations to not apply this rule to names in block scope. +programmer expectations to not apply this rule to names in block scope. We call +this the _block-scope exception_. + +For example: + +``` +var m: i32 = 1; +fn F(b: bool) -> i32 { + if (b) { + var n: i32 = 2; + // With the block-scope exception: + // * `n` is unambiguously the variable declared on the line above, and + // * `m` is unambiguously the global. + // + // Without the block-scope exception: + // * depending on the rules we choose for shadowing, `n` might name the + // variable above, might be ambiguous, or might be invalid because it + // shadows the variable `n` declared below, and + // * `m` would name the local variable declared below. + return n + m; + } + var n: i32 = 3; + var m: i32 = 4; + return n * m; +} +``` + +##### Analysis _Comprehensibility:_ This rule is simple to explain, and has no special cases, other than perhaps the block scope variant. The disallowance of semantic cycles @@ -895,6 +935,14 @@ hide implementation details, such as documentation generators and IDEs with outlining, and implementation techniques such as using `interface`s to present the API of a class separate from its implementation. +For developers familiar with C++ in particular, the absence of the ability to +separate declaration from definition may create friction, due to the familiarity +of that feature and having their thinking about code layout shaped around it. +Given that Carbon aims to be familiar to those coming from C++, an absence of +forward declarations will work against that goal to some extent, although +emiprical evidence suggests that the extent to which this is an issue varies +widely between existing C++ developers. + _Efficient and simple compilation:_ This approach leads to a simpler compilation strategy. However, in the absence of forward declarations, every compilation referring to an entity needs to have a build dependency on the full definition From 89d9113f2fdabe3bf1d6c2fb77bae171160aab12 Mon Sep 17 00:00:00 2001 From: Richard Smith Date: Thu, 3 Mar 2022 17:04:01 -0800 Subject: [PATCH 13/13] Apply suggestions from code review. Co-authored-by: Geoff Romer --- proposals/p0875.md | 382 ++++++++++++++++++++++++--------------------- 1 file changed, 202 insertions(+), 180 deletions(-) diff --git a/proposals/p0875.md b/proposals/p0875.md index 9f0e621ff0007..cd0c48aa91e49 100644 --- a/proposals/p0875.md +++ b/proposals/p0875.md @@ -24,17 +24,23 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Choice of alternative](#choice-of-alternative) - [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) - [Alternatives considered](#alternatives-considered) + - [Separate declaration and definition](#separate-declaration-and-definition) + - [Allow separate declaration and definition](#allow-separate-declaration-and-definition) + - [Analysis](#analysis) + - [Disallow separate declaration and definition](#disallow-separate-declaration-and-definition) + - [Analysis](#analysis-1) - [Information flow](#information-flow) - [Strict top-down](#strict-top-down) + - [Analysis](#analysis-2) - [Strict global consistency](#strict-global-consistency) - [Block-scope exception](#block-scope-exception) - - [Analysis](#analysis) + - [Analysis](#analysis-3) - [Top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking) + - [Analysis](#analysis-4) - [Top-down with deferred method bodies](#top-down-with-deferred-method-bodies) + - [Analysis](#analysis-5) - [Context-sensitive local consistency](#context-sensitive-local-consistency) - - [Separate declaration and definition](#separate-declaration-and-definition) - - [Allow separate declaration and definition](#allow-separate-declaration-and-definition) - - [Disallow separate declaration and definition](#disallow-separate-declaration-and-definition) + - [Analysis](#analysis-6) @@ -236,8 +242,8 @@ drawbacks. The rationale for our choice is as follows: The first consideration is whether to provide a [globally consistent view](#strict-global-consistency) of code. Providing such a view would require us to find a compatible build process that can scale to large -projects, and we do not currently have a design for such a process. Moreover, -this is likely to be a choice that is relatively hard to evolve away from. A +projects; see the [detailed analysis](#analysis-3) for details. Moreover, this +is likely to be a choice that is relatively hard to evolve away from. A non-scalable build process would be an existential threat to the Carbon language under its stated goals, so we set this option aside. This should be revisited if a complete design for a scalable build process with a globally-consistent @@ -338,9 +344,182 @@ of the alternative in this proposal were ultimately the deciding factors. Below, various alternatives are presented and rated according to the [goals](#goals) for this proposal. -Generally we need to pick a rule for information flow and a rule for separate -declaration and definition. These choices are not independent: some information -flow rules require separate declaration and definition. +Generally we need to pick a rule for allowing or disallowing separate +declaration and definition, and a rule for information flow. These choices are +not independent: some information flow rules require separate declaration and +definition. + +### Separate declaration and definition + +#### Allow separate declaration and definition + +We could allow two or more declarations of each entity, with at most one +declaration providing a definition and the other declarations merely introducing +the existence of the entity and its basic properties: + +``` +// Forward declaration. +fn F(); +// ... +// Another forward declaration of the same function. +fn F(); +// One and only definition. +fn F() {} +// Yet another forward declaration. +fn F(); +``` + +This could be done in either a free-for-all fashion as in the above example, +where any number of declarations is permitted, or in a more restricted fashion, +where there can be at most one declaration introducing an entity, and, if that +declaration is not a definition, at most one separate implementation providing +the definition. In this case, the separate definition could contain a syntactic +marker indicating that a prior declaration should exist: + +``` +// Forward declaration. +fn F(); +// ... +// Separate definition, with `impl` marker to indicate that this is just an +// implementation of an entity that was already declared. +impl fn F() {} +``` + +##### Analysis + +_Comprehensibility:_ In general, determining whether two declarations declare +the same entity may not be straightforward, particularly for a redeclaration of +an overloaded function. We have some options, but they all have challenges for +comprehensibility or ergonomics of the rule: + +- We could require token-for-token identical declarations, but this may result + in ergonomic problems -- developers may wish to leave out information in an + interface that isn't relevant for consumers of that interface, or spell the + same type or parameter name differently in an implementation. +- We could allow the declarations to differ so long as they have the same + meaning. However, there may be cases, as there are in C++, where it's + unclear whether two declarations are "sufficiently similar" so as to declare + the same entity. +- We could require all overloads of a function to be declared together in a + group, perhaps with some dedicated syntax, and then match up redeclarations + based on position in the group rather than signature. Such an approach is + likely to lead to developers, especially those familiar with the C++ rule, + being surprised. It will likely also interact poorly with cases where some + but not all of the overloads are defined in their first declaration and the + rest are defined separately, or where it is desirable for different + overloads to be defined in different implementation files. + +_Ergonomics:_ Any case where the declaration and definition are separated +results in repetitive code. However, this repetition may serve a purpose to the +extent that it's presenting an API description or acting as a compilation +barrier. + +_Readability:_ Readability of code may be substantially improved by separating a +declaration of an API from its definition, and permitting an API user to read, +in a small localized portion of a source file, only the information they care +about, without regard for implementation details. + +_Efficient and simple compilation:_ Allowing multiple declarations of an entity +introduces some implementation complexity, as the implementation must look up, +validate, and link together multiple declarations of the same entity. If +redeclarations are permitted across source files -- such as between the +interface and implementation of an API, or between multiple implementation files +-- then this may also require some stable mechanism for identifying the +declaration, such as a name mangling scheme. This may be especially complex in +the case of overloaded function templates, which might differ in arbitrary +Carbon expressions appearing in the function's declaration. This complexity +could be greatly reduced if we require all overloads to be declared together -- +at least in the same source file -- as we can then identify them by name and +index. + +This approach supports physical separation of implementation from interface, +potentially leading to build time wins through reduced recompilation and through +reducing the amount of information fed into each compilation step. + +_Diagnosability:_ Identifying errors where a redeclaration doesn't exactly match +a prior declaration is not completely straightforward. This is especially the +case when function overloads can be added by the same syntax with which a +definition of a prior declaration would be introduced. Even if declarations and +definitions use a different syntax, diagnosing a mismatch between a definition +of an overloaded function and a prior set of declarations requires some amount +of fuzzy matching to infer which one was probably intended. + +_Toolability:_ Supporting separate declarations and definitions will present the +same kinds of complexity for non-compiler tools as for compilers. Tools will +need to be able to reason about multiple declarations providing distinct +information about an entity and the effect of that on how the entity can be used +at different source locations. + +#### Disallow separate declaration and definition + +We could require each entity to be declared in exactly one place, and reject any +cases where the same entity is declared more than once. + +##### Analysis + +_Comprehensibility:_ This rule is easy to explain and easy to reason about. + +_Ergonomics:_ Minimizes the work required by the developer to implement new +functionality and maintain existing functionality. Reduces the chance that a +change in one place will be forgotten in another. + +_Readability:_ Each entity has only one declaration, giving a single canonical +location to include documentation, and no question as to where information about +the entity might be found. This will likely improve the navigability of source +code and the ease with which all the information about an entity can be +gathered, and reduce the chance that information in the declaration or +definition will be incomplete or that they will be inconsistent with each other. + +However, there would be no ability to present a function or class as source code +without also including implementation details. Such implementation details +cannot be physically separated from the API as presented to clients. The ability +to read and understand an interface will depend more heavily on tools that can +hide implementation details, such as documentation generators and IDEs with +outlining, and implementation techniques such as using `interface`s to present +the API of a class separate from its implementation. + +For developers familiar with C++ in particular, the absence of the ability to +separate declaration from definition may create friction, due to the familiarity +of that feature and having their thinking about code layout shaped around it. +Given that Carbon aims to be familiar to those coming from C++, an absence of +forward declarations will work against that goal to some extent, although +emiprical evidence suggests that the extent to which this is an issue varies +widely between existing C++ developers. + +_Efficient and simple compilation:_ This approach leads to a simpler compilation +strategy. However, in the absence of forward declarations, every compilation +referring to an entity needs to have a build dependency on the full definition +of that entity, meaning there is no physical separation between the APIs of +entities and their definitions. This would imposes a potentially-significant +build time cost due to increased recompilations when the definition of an entity +changes. + +There are possibilities to reduce this build time cost. For example, instead of +each declaration identifying whether it is public or private to a library, we +could add a third visibility level to say that the declaration but not the +definition is public, and use a stripping tool as part of the build process to +avoid dependencies on non-public definitions from cascading into unnecessary +rebuilds. + +This approach is also unlikely to be acceptable from an ergonomic standpoint +unless declarations within a source file are visible throughout at least the +entire library containing that file, although this is strictly an independent +choice. Making definitions from a source file visible throughout its library +would presumably require a library-at-a-time build strategy, which is also +expected to introduce build time costs due to changes to a library requiring +more code in that library to be recompiled. This cost could be reduced by using +a reactive compilation model, recompiling only the portions of a library that +are affected by a code change, at some complexity cost. + +_Diagnosability:_ Because each entity can only be declared once, there is no +significant challenge in diagnosing redeclaration issues. However, there is +still some work required to diagnose conflicts between similar or functionally +identical function overloads. + +_Toolability:_ Having a unique source location for each entity allows for +somewhat simpler tooling. For example, there is no need to distinguish between +"jump to declaration" and "jump to definition", or to decide which declaration +should be consulted to find documentation comments, parameter names, and so on. ### Information flow @@ -354,6 +533,8 @@ In order to support this and still permit cyclic references between entities, we would need to [allow separate declaration and definition](#allow-separate-declaration-and-definition). +##### Analysis + _Comprehensibility:_ This rule is simple to explain, and has no special cases. However, the inability to look at information from later in the source file is likely to result in gotchas: @@ -459,8 +640,8 @@ struct X { In order to give globally consistent semantics to, for example, a package name, we would likely need to process all source files comprising a package at the same time. This is likely to encourage packages to be small, whereas we have -previously expected the scope of a package to match that of a complete source -code repository. +designed package support on the assumption that the scope of a package matches +that of a complete source code repository. This alternative can be considered either with or without the ability to separate declarations from definitions. If we permit separate declaration and @@ -587,6 +768,8 @@ function definitions appearing within a class or interface, as the C# behavior that the scope of a local variable extends backwards before its declaration may be surprising. +##### Analysis + This approach generally has the properties as [strict top-down](#strict-top-down), except as follows: @@ -693,7 +876,12 @@ of the program. As in C++, member functions of nested classes would be deferred until the outermost class is complete. This deferral would apply only to the bodies of member functions, unlike in C++ where it also applies to default arguments, -default member initializers, and exception specifications. +default member initializers, and exception specifications, and unlike +[top-down with minimally deferred type checking](#top-down-with-minimally-deferred-type-checking), +where it also applies to member function signatures and the declarations and +definitions of non-function members. + +##### Analysis This approach generally has the properties as [strict top-down](#strict-top-down), except as follows: @@ -757,6 +945,8 @@ and models, this would not require forward declarations for namespace and package members that are declared or defined later in the same source file. +##### Analysis + _Comprehensibility:_ This model is probably a little easier to understand than the minimally-deferred type-checking model, because there is no difference between top-level declarations and nested declarations. @@ -809,171 +999,3 @@ later declaration is checked earlier. _Toolability:_ Largely the same as the strict global consistency model, except that tools only need to use the current source file as context. Also largely the same as the top-down with minimally deferred type checking model. - -### Separate declaration and definition - -#### Allow separate declaration and definition - -We could allow two or more declarations of each entity, with at most one -declaration providing a definition and the other declarations merely introducing -the existence of the entity and its basic properties: - -``` -// Forward declaration. -fn F(); -// ... -// Another forward declaration of the same function. -fn F(); -// One and only definition. -fn F() {} -// Yet another forward declaration. -fn F(); -``` - -This could be done in either a free-for-all fashion as in the above example, -where any number of declarations is permitted, or in a more restricted fashion, -where there can be at most one declaration introducing an entity, and, if that -declaration is not a definition, at most one separate implementation providing -the definition. In this case, the separate definition could contain a syntactic -marker indicating that a prior declaration should exist: - -``` -// Forward declaration. -fn F(); -// ... -// Separate definition, with `impl` marker to indicate that this is just an -// implementation of an entity that was already declared. -impl fn F() {} -``` - -_Comprehensibility:_ In general, determining whether two declarations declare -the same entity may not be straightforward, particularly for a redeclaration of -an overloaded function. We have some options, but they all have challenges for -comprehensibility or ergonomics of the rule: - -- We could require token-for-token identical declarations, but this may result - in ergonomic problems -- developers may wish to leave out information in an - interface that isn't relevant for consumers of that interface, or spell the - same type or parameter name differently in an implementation. -- We could allow the declarations to differ so long as they have the same - meaning. However, there may be cases, as there are in C++, where it's - unclear whether two declarations are "sufficiently similar" so as to declare - the same entity. -- We could require all overloads of a function to be declared together in a - group, perhaps with some dedicated syntax, and then match up redeclarations - based on position in the group rather than signature. Such an approach is - likely to lead to developers, especially those familiar with the C++ rule, - being surprised. It will likely also interact poorly with cases where some - but not all of the overloads are defined in their first declaration and the - rest are defined separately, or where it is desirable for different - overloads to be defined in different implementation files. - -_Ergonomics:_ Any case where the declaration and definition are separated -results in repetitive code. However, this repetition may serve a purpose to the -extent that it's presenting an API description or acting as a compilation -barrier. - -_Readability:_ Readability of code may be substantially improved by separating a -declaration of an API from its definition, and permitting an API user to read, -in a small localized portion of a source file, only the information they care -about, without regard for implementation details. - -_Efficient and simple compilation:_ Allowing multiple declarations of an entity -introduces some implementation complexity, as the implementation must look up, -validate, and link together multiple declarations of the same entity. If -redeclarations are permitted across source files -- such as between the -interface and implementation of an API, or between multiple implementation files --- then this may also require some stable mechanism for identifying the -declaration, such as a name mangling scheme. This may be especially complex in -the case of overloaded function templates, which might differ in arbitrary -Carbon expressions appearing in the function's declaration. This complexity -could be greatly reduced if we require all overloads to be declared together -- -at least in the same source file -- as we can then identify them by name and -index. - -This approach supports physical separation of implementation from interface, -potentially leading to build time wins through reduced recompilation and through -reducing the amount of information fed into each compilation step. - -_Diagnosability:_ Identifying errors where a redeclaration doesn't exactly match -a prior declaration is not completely straightforward. This is especially the -case when function overloads can be added by the same syntax with which a -definition of a prior declaration would be introduced. Even if declarations and -definitions use a different syntax, diagnosing a mismatch between a definition -of an overloaded function and a prior set of declarations requires some amount -of fuzzy matching to infer which one was probably intended. - -_Toolability:_ Supporting separate declarations and definitions will present the -same kinds of complexity for non-compiler tools as for compilers. Tools will -need to be able to reason about multiple declarations providing distinct -information about an entity and the effect of that on how the entity can be used -at different source locations. - -#### Disallow separate declaration and definition - -We could require each entity to be declared in exactly one place, and reject any -cases where the same entity is declared more than once. - -_Comprehensibility:_ This rule is easy to explain and easy to reason about. - -_Ergonomics:_ Minimizes the work required by the developer to implement new -functionality and maintain existing functionality. Reduces the chance that a -change in one place will be forgotten in another. - -_Readability:_ Each entity has only one declaration, giving a single canonical -location to include documentation, and no question as to where information about -the entity might be found. This will likely improve the navigability of source -code and the ease with which all the information about an entity can be -gathered, and reduce the chance that information in the declaration or -definition will be incomplete or that they will be inconsistent with each other. - -However, there would be no ability to present a function or class as source code -without also including implementation details. Such implementation details -cannot be physically separated from the API as presented to clients. The ability -to read and understand an interface will depend more heavily on tools that can -hide implementation details, such as documentation generators and IDEs with -outlining, and implementation techniques such as using `interface`s to present -the API of a class separate from its implementation. - -For developers familiar with C++ in particular, the absence of the ability to -separate declaration from definition may create friction, due to the familiarity -of that feature and having their thinking about code layout shaped around it. -Given that Carbon aims to be familiar to those coming from C++, an absence of -forward declarations will work against that goal to some extent, although -emiprical evidence suggests that the extent to which this is an issue varies -widely between existing C++ developers. - -_Efficient and simple compilation:_ This approach leads to a simpler compilation -strategy. However, in the absence of forward declarations, every compilation -referring to an entity needs to have a build dependency on the full definition -of that entity, meaning there is no physical separation between the APIs of -entities and their definitions. This would imposes a potentially-significant -build time cost due to increased recompilations when the definition of an entity -changes. - -There are possibilities to reduce this build time cost. For example, instead of -each declaration identifying whether it is public or private to a library, we -could add a third visibility level to say that the declaration but not the -definition is public, and use a stripping tool as part of the build process to -avoid dependencies on non-public definitions from cascading into unnecessary -rebuilds. - -This approach is also unlikely to be acceptable from an ergonomic strategy -unless declarations within a source file are visible throughout at least the -entire library containing that file, although this is strictly an independent -choice. Making definitions from a source file visible throughout its library -would presumably require a library-at-a-time build strategy, which is also -expected to introduce build time costs due to changes to a library requiring -more code in that library to be recompiled. This cost could be reduced by using -a reactive compilation model, recompiling only the portions of a library that -are affected by a code change, at some complexity cost. - -_Diagnosability:_ Because each entity can only be declared once, there is no -significant challenge in diagnosing redeclaration issues. However, there is -still some work required to diagnose conflicts between similar or functionally -identical function overloads. - -_Toolability:_ Having a unique source location for each entity allows for -somewhat simpler tooling. For example, there is no need to distinguish between -"jump to declaration" and "jump to definition", or to decide which declaration -should be consulted to find documentation comments, parameter names, and so on.