structure draft

FedericoPonzi · Jun 12, 2024 · 07fa067 · 07fa067
1 parent ffd984f
commit 07fa067
Show file tree

Hide file tree

Showing 11 changed files with 446 additions and 2 deletions.
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -1,3 +1,19 @@
 # Summary
 
-- [Chapter 1](./chapter_1.md)
+- [TLA+ Wiki intro](./intro.md)
+- [TLA+ Codebase](./codebase/intro.md)
+    - [Contributing](./codebase/contributing.md)
+    - [Setting up the development environment](./codebase/ide.md)
+    - [Codebase Architecture Walkthrough](./codebase/architecture.md)
+    - [Codebase Idiosyncrasies](./codebase/idiosyncrasies.md)
+    - [Wishlist](./codebase/wishlist.md)
+    - [Tree-sitter grammar]()
+- [Learning TLA+]()
+    - [Lamport's video tutorial]()
+    - [Example repo]()
+    - [Notable repos]()
+- [Using TLA+]()
+    - [Standard Library]()
+    - [Community Modules]()
+- [TLA+ Community: what to do when you're stuck](community.md)
+- [TLAF: TLA+ Foundation]()
diff --git a/src/chapter_1.md b/src/chapter_1.md
diff --git a/src/codebase/architecture.md b/src/codebase/architecture.md
@@ -0,0 +1,124 @@
+# Codebase Architecture Walkthrough
+### Modules
+- [pcal](../src/pcal): Converts the pcal language into TLA+ that can by run by TLC2.
+- [tla2tex](../src/tla2tex): "Pretty Prints" TLA+ code into LaTex. This LaTex can then be used for publishing.
+- [tla2sany](../src/tla2sany): Contains the AST and parsing logic for the TLA+ language. The AST classes generated from parsing are used by TLC2.
+- [tlc2](../src/tlc2): The model checker to validate the AST generated by tla2sany. This is the largest and most complex part of the codebase by far.
+
+### The Standard Flow
+This is a generalized mplified description of how this code is normally used, and the involved components. More detail will be given on some of these components in the following sections.
+
+1. (Optional) PCal code translated into TLA+ code. [pcal.trans](src/pcal/trans.java)::main
+2. TLC2 Model Checker run from standard entrypoint [tlc2.TLC](src/tlc2/TLC.java)::main on that code and associated .cfg file
+3. TLC2 class created, arguments parsed, and processing started. [tlc2.TLC](src/tlc2/TLC.java)::TLC, TLC::handleParameters, TLC::process
+4. The spec is parsed and stored in [Tool](../src/tlc2/tool/impl/Tool.java). This occurs in the Tool constructor.
+   1. A [SpecObj](../src/tla2sany/modanalyzer/SpecObj.java) is created with file information about the spec
+   2. A [SpecProcessor](../src/tlc2/tool/impl/SpecProcessor.java) is created with that SpecObj
+   3. At the start of processing the spec, SpecProcessor calls [SANY](../src/tla2sany/drivers/SANY.java)::frontEndMain that parses the spec into an AST
+   4. The SpecProcessor performs a number of extraction and analysis operations on the AST.
+        > Note: These are called in a precise order in the [Tool](../src/tlc2/tool/impl/Tool.java) constructor. Modifying this is highly error prone, though the tests will fail reliably if there is an issue.
+   5. The Tool takes the results of all these operations and assigns them to final variables. The Tool is now the only source of truth for specification information, and is immutable.
+        > Note: This is a critical separation of concerns, to separate the complex job of parsing and resolving the spec, from the job of analyzing it. The tool is the only object that spans both.
+5. The following are selected and initialized
+    - The model checker or simulator. This class performs a certain type of analysis on the spec.
+      - [ModelChecker](../src/tlc2/tool/ModelChecker.java): Standard Breadth-First checking of entire state space
+        - [FPSet](../src/tlc2/tool/fp/FPSet.java): Contains the fingerprints of all explored states.
+        - [StateQueue](../src/tlc2/tool/queue/IStateQueue.java): Contains new states to explore.
+      - [DFIDModelChecker](../src/tlc2/tool/DFIDModelChecker.java): Depth first checking of entire state space, to a pre-specified level of depth.
+        - [FPIntSet](../src/tlc2/tool/fp/dfid/FPIntSet.java)
+      - [Simulator](../src/tlc2/tool/Simulator.java): Randomized exploration and checking of the state space.
+6. The analysis is performed
+    1. Workers are run per configuration to allow multi-threaded analysis
+        - ModelChecker: [Worker](../src/tlc2/tool/Worker.java)
+        - DFIDModelChecker: [DFIDWorker](../src/tlc2/tool/DFIDWorker.java)
+        - Simulator: [SimulationWorker](../src/tlc2/tool/SimulationWorker.java)
+    2. Each checker performs analysis differently, but generally the analysis consists of generating new states from existing states, and checking those new states
+7. The result is returned to the user
+
+### Model Checker Analysis
+
+1. The StateQueue is seeded with the initial states
+2. Workers take states of the StateQueue. From there they generate next potential states. They check that the fingerprint of the state is not already in the FPSet (such that analysis is not duplicated). They then check that state for safety properties, and add it to both the FPSet and the StateQueue.'
+3. Periodically the analysis pauses. If configured to create a checkpoint, a checkpoint is created. If configured to check temporal properties, the Liveness Checking workflow (described below) is performed by calling [LiveChecker](../src/tlc2/tool/liveness/LiveCheck.java)::check. Once done the analysis continues.
+4. When the StateQueue is empty or an error is detected, the analysis is over. 
+
+> Performance Note: The vast majority of CPU cycles are spent either calculating fingerprints or storing fingerprints in the FPSet.                        
+
+### Depth-First Checker Analysis
+
+1. Data structure initialized with InitStates
+2. The analysis occurs by setting a temporary max depth that increases to the configured maximum depth
+3. Worker retrieves unexplored initial state, removing it from the set of init states
+4. Worker explores the state space depth first up to the temporary max depth, storing all seen states in the FPIntSet
+5. Periodically the analysis pauses. If configured to create a checkpoint, a checkpoint is created. If configured to check temporal properties, the Liveness Checking workflow (described below) is performed by calling [LiveCheck](../src/tlc2/tool/liveness/LiveCheck.java)::check. Once done the analysis continues.
+6. Once there are no more levels to explore, or an error is detected, the analysis is over
+
+
+### Simulator Analysis
+
+1. Initial states calculated and checked for validity
+2. Each SimulationWorker retrieves a random initial state
+3. Next states are randomly generated and checked, up to a specified depth. Each one of these runs is a trace.
+4. If liveness checking is enabled, check the trace with [LiveChecker](../src/tlc2/tool/liveness/LiveCheck.java)::checkTrace
+5. Once the specified number of traces have been run, or an error is detected, the analysis is over
+
+### Liveness Checking
+Liveness checking is what allows checking of temporal properties.
+
+Liveness checking is initiated by calling: [LiveCheck](../src/tlc2/tool/liveness/LiveCheck.java)::check or checkTrace.
+
+Every time this occurs it creates new [LiveWorker](../src/tlc2/tool/liveness/LiveWorker.java)(s) to perform the analysis.
+
+> Note: Unlike standard model checking, Liveness Checking by default is not guaranteed to produce the shortest trace that violates the properties. There is a [AddAndCheckLiveCheck](../src/tlc2/tool/liveness/AddAndCheckLiveCheck.java) that extends LiveCheck that attempts to do this, at the cost of significant performance. It is selected in the constructor of [AbstractChecker](../src/tlc2/tool/AbstractChecker.java).
+
+### States and Fingerprints
+States are a fundamental part of TLC. They represent a set of variable values, that entirely determine the state of the system. States are generated and checked as part of the TLC2 analysis process. The base class for this is [TLCState](../src/tlc2/tool/TLCState.java).
+
+Fingerprints are hashes taken of these states using the [FP64](src/tlc2/util/FP64.java) hash. Fingerprints are a 64bit hash representing the state. This is significantly smaller than storing the state itself, and yet collisions are very unlikely (though if they did occur, part of the statespace would not be explored). The fingerprinting process is initiated from [TLCState](../src/tlc2/tool/TLCState.java)::generateFingerPrint. 
+
+There are two primary implementations of state that are very similar:
+- [TLCStateMut](../src/tlc2/tool/TLCStateMut.java): Higher performance, records less information
+- [TLCStateMutExt](../src/tlc2/tool/TLCStateMutExt.java): Lower performance, records more information
+
+Effectively, functions that are no-ops in TLCStateMut, persistently store that information in TLCStateMutExt. This information can include the name of the generating action, and is often needed for testing / debugging purposes.
+
+The implementation to use is selected in the constructor of [Tool](../src/tlc2/tool/impl/Tool.java), by setting a reference Empty state for the analysis.
+
+### High Performance Datastructures
+
+The ModelChecker performs a breadth-first search. In a standard BFS algorithm, there are two main datastructures: a queue of items of explore next, and a set of the nodes already explored. Because of the sheer size of the state space for the ModelChecker, specialized versions of these datastructures are used.
+
+> Note: These data-structures are a large focus of the profiling / testing work, because both performance and correctness are key for the ModelChecker to explore the full state space.
+
+#### Fingerprint Sets (FPSets)
+FPSets are sets with two main operations
+- put fingerprint
+- determine if fingerprint in set
+
+Located in [tlc2.tool.fp](../src/tlc2/tool/fp). All but [FPIntSet](../src/tlc2/tool/fp/dfid/FPIntSet.java) extend [FPSet](../src/tlc2/tool/fp/FPSet.java). FPIntSet is used specifically for depth-first-search and is not discussed here.
+
+In practice the performance and size of the FPSet determine the extent of analysis possible. The [OffHeapDiskFPSet](../src/tlc2/tool/fp/OffHeapDiskFPSet.java) is in practice the most powerful of the FPSets, efficiently spanning the operations across off-heap memory and the disk. There are also memory only FPSets that may be preferable for smaller state spaces.
+
+#### StateQueue
+Located in [tlc2.tool.queue](../src/tlc2/tool/queue), with all implementations extending [StateQueue](../src/tlc2/tool/queue/StateQueue.java).
+
+The default implementation is [DiskStateQueue](../src/tlc2/tool/queue/DiskStateQueue.java). Because of the size of the queue, storing it in memory may not be an option.
+
+### Symmetry Sets
+A discussion of Symmetry Sets can be found [here](https://www.learntla.com/core/constants.html).
+
+They are implemented in the [TLCState](../src/tlc2/tool/TLCState.java)::generateFingerPrint function by calculated all permutations of a particular state (for all symmetry set model values), and returning the smallest fingerprint. Therefore all permutations would have a same fingerprint, and so only the first example generated would be explored.
+
+> Note: The order dependant nature of the fingerprint means that when using symmetry sets, the majority of the cpu cycles are spent "normalizing" the data structures such that the order the fingerprint gets assembled in are consistent. An order independent fingerprint hash would remove this performance hit, however would significantly increase the likelihood of collisions unless the hash algorithm itself was upgraded.
+
+### Debugger
+The [AttachingDebugger](../src/tlc2/debug/AttachingDebugger.java) is the main debugger. It works in conjunction with the [DebugTool](../src/tlc2/tool/impl/DebugTool.java) to allow a user to step through a TLC execution.
+
+> Note: Using the debugger will keep the process alive indefinitely due to the listener. The eclipse network listener is not interruptable, so thread interruption behavior will not work. It relies on process exit.
+
+### Coverage
+The coverage functionality determines what statements in the TLA+ spec have been used.
+Located in [tlc2.tool.coverage](../src/tlc2/tool/coverage)
+
+### Unique Strings
+String comparison can be a relatively expensive operation compared to primitive comparison. There are lots of string comparisons required in the code, but with a relatively limited set of strings.[UniqueString](../src/util/UniqueString.java) maps each string to a monotonically increasing integer. Then comparisons are reduced to the much cheaper integer comparisons. This is used throughout the codebase.
diff --git a/src/codebase/contributing.md b/src/codebase/contributing.md
@@ -0,0 +1,72 @@
+# Contributing
+
+TL;DR: engage maintainers early & often, be surgical in your changes, and write lots of tests.
+
+We welcome contributions to this open source project!
+Before beginning work, please take some time to familiarize yourself with our development processes and expectations.
+The TLA⁺ tools are used to validate safety-critical systems, so maintaining quality is paramount.
+The existing code can also be quite complicated to modify and it is difficult to review changes effectively.
+So, there are several practices that are necessary for having a good contribution experience.
+The last thing anybody wants is for you to feel as though you have wasted your time!
+If you open a massive pull request (PR) out of nowhere it is **very unlikely to be merged**.
+Follow this guide to ensure your effort is put to the best possible use.
+
+Always remember that we are all volunteers here.
+Be kind to everybody!
+You can review our Code of Conduct [here](.github/CODE_OF_CONDUCT.md).
+
+The Contribution Process & Style
+--------------------------------
+
+The most important step is to engage with existing developers & maintainers before beginning work.
+The best way to do this is to comment on an issue you want to fix, or open a new issue if a suitable candidate does not exist.
+It is also very helpful to join the [monthly online community meeting](https://groups.google.com/g/tlaplus/c/CpAEnrf-DHQ/m/YrORpIfSBwAJ) to introduce yourself and speak with people directly.
+The development team will help you better understand the scope & difficulty of the proposed changes, along with what parts of the codebase they'll touch and what additional tests are required to safely merge them.
+Building & maintaining consensus around these aspects will ensure a smooth review & merge process.
+
+To begin work on your change, fork this repository then create an appropriately-named branch to track your work.
+You will eventually submit a PR between your feature branch and the master branch of this repository, at which point developers & maintainers will review your changes.
+After integrating this feedback your PR will then hopefully be merged.
+
+When creating and submitting your changes, closely follow these guidelines:
+ * **Be surgical:** What is the smallest diff you can make that still accomplishes your goal?
+ Avoid the urge to fix unrelated stylistic issues in a file you are changing.
+ * **Write tests first:** If you're changing a specific part of the codebase, first ensure it has good test coverage; if it does, write a few more tests yourself anyway!
+ Writing tests is a great way to learn how the code functions.
+ If possible, submit your tests in a separate PR so their benefits can be realized immediately.
+ We don't fully subscribe to Test-Driven Development (TDD) but having good existing test coverage of a changed area is a prerequisite to changes being merged.
+ * **Split up your changes:** If parts of your changes provide some benefit by themselves - like additional tests - split them into a separate PR.
+ It is preferable to have several small PRs merged quickly than one large PR that takes longer to review.
+ * **Craft your commits well:** This can require advanced git knowledge (see [DEVELOPING.md](DEVELOPING.md)).
+ Treat your commits as forms of communication, not a way to bludgeon the codebase into shape.
+ You don't need to check this, but ideally the build & tests should pass on every commit.
+ This means you will often be amending commits as you work instead of adding new ones.
+
+When you open a PR against this repo, the continuous integration (CI) checks will run.
+These ensure your changes do not break the build & tests.
+While you can run most of these checks locally, if you'd like to run the CI at any time you can open a PR between your feature branch and the master branch of *your own* fork of this repo.
+
+Contribution Possibilities
+--------------------------
+
+These tools have a large number existing issues to fix, which you can see on the repo's [issues tab](https://github.com/tlaplus/tlaplus/issues).
+In particular, consider looking at issues [tagged with "help wanted"](https://github.com/tlaplus/tlaplus/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22).
+For new developers, there are issues [tagged "good first issue"](https://github.com/tlaplus/tlaplus/labels/good%20first%20issue).
+
+For new features, there are a number of [substantial improvements we'd like to see implemented](general/docs/contributions.md).
+If you have an idea for a new feature, open an issue in this repository to start a discussion.
+For more fundamental proposals that involve changes to the TLA⁺ language itself, you can [open a RFC](https://github.com/tlaplus/rfcs/issues).
+Often it will be necessary to build community support for your idea; for this you can use the [mailing list](https://groups.google.com/g/tlaplus) and especially the [monthly online community meeting](https://groups.google.com/g/tlaplus/c/CpAEnrf-DHQ/m/YrORpIfSBwAJ).
+Expressions of goodwill like volunteering labor to review pending PRs is also appreciated!
+
+Non-code contributions are also welcome!
+There is ample work available in expanding developer documentation (like this very document) or user-facing TLA⁺ documentation.
+We also collect [quality metrics](https://sonarcloud.io/organizations/tlaplus/projects), and the reports indicate several low-hanging fruits to pick.
+
+Getting Started
+---------------
+
+Take the time to set up your local development environment.
+See [DEVELOPING.md](DEVELOPING.md) for details.
+Good luck, and have fun!
+
diff --git a/src/codebase/ide.md b/src/codebase/ide.md
@@ -0,0 +1 @@
+# Setting up the development environment