Skip to content

Latest commit

 

History

History
116 lines (64 loc) · 13.8 KB

overview.md

File metadata and controls

116 lines (64 loc) · 13.8 KB
title category categoryindex index
Overview
Compiler Internals
200
100

Overview

There are several artifacts involved in the development of F#:

  • FSharp.Compiler.Service (docs, source). Contains all logic for F# compilation - including parsing, syntax tree processing, typechecking, constraint solving, optimizations, IL importing, IL writing, pretty printing of F# constructs, and F# metadata format processing - and the F# compiler APIs for tooling.

  • The F# compiler executable, called fsc, which is called as a console app. It sets the .NET GC into batch mode and then invokes FSharp.Compiler.Service with command-line arguments.

  • The FSharp.Core Library, called FSharp.Core. Contains all primitive F# types and logic for how they interact, core data structures and library functions for operating on them, structured printing logic, units of measure for scientific programming, core numeric functionality, F# quotations, F# type reflection logic, and asynchronous programming types and logic.

  • The F# Interactive tool, called fsi. A REPL for F# that supports execution and pretty-printing of F# code and results, loading F# script files, referencing assemblies, and referencing packages from NuGet.

The FSharp.Compiler.Service is by far the largest of these components and contains nearly all logic that fsc and fsi use. It is the primary subject of this guide.

Key compiler data formats and representations

The following are the key data formats and internal data representations of the F# compiler code in its various configurations:

  • Input source files Read as Unicode text, or binary for referenced assemblies.

  • Input command-line arguments See CompilerOptions.fs for the full code implementing the arguments table. Command-line arguments are also accepted by the F# Compiler Service API in project specifications, and as optional input to F# Interactive.

  • Tokens, see pars.fsy, lex.fsl, lexhelp.fs and related files.

  • Abstract Syntax Tree (AST), see SyntaxTree.fs, the untyped syntax tree resulting from parsing.

  • Typed Abstract Syntax Tree (Typed Tree), see TypedTree.fs, TypedTreeBasics.fs, TypedTreeOps.fs, and related files. The typed, bound syntax tree including both type/module definitions and their backing expressions, resulting from type checking and the subject of successive phases of optimization and representation change.

  • Type checking context/state, see for example TcState in ParseAndCheckInputs.fsi and its constituent parts, particularly TcEnv in CheckExpressions.fsi and NameResolutionEnv in NameResolution.fsi. A set of tables representing the available names, assemblies etc. in scope during type checking, plus associated information.

  • Abstract IL, the output of code generation, then used for binary generation, and the input format when reading .NET assemblies, see ILModuleDef in il.fsi.

  • The .NET Binary format (with added "pickled" F# Metadata resource), the final output of fsc.exe, see the ECMA 335 specification and the ilread.fs and ilwrite.fs binary reader/generator implementations. The added F# metadata is stored in a binary resource, see TypedTreePickle.fs.

  • The incrementally emitted .NET reflection assembly, the incremental output of fsi.exe. See ilreflect.fs.

Key constructs and APIs for F# tooling

The following are the most relevant parts of the F# compiler tooling, making up the "engine" and API surface area of FSharp.Compiler.Service.

Key compiler phases

The following is a diagram of how different phases of F# compiler work:

F# compiler phases

The following are the key phases and high-level logical operations of the F# compiler code in its various configurations:

  • Basic lexing. Produces a token stream from input source file text.

  • White-space sensitive lexing. Accepts and produces a token stream, augmenting per the F# Language Specification.

  • Parsing. Accepts a token stream and produces an AST per the grammar in the F# Language Specification.

  • Resolving references. For .NET SDK generally references are resolved explicitly by external tooling. There is a legacy aspect to this if references use old .NET Framework references including for scripting. See ReferenceResolver.fs for the abstract definition of compiler reference resolution. See LegacyMSBuildReferenceResolver.fs for reference resolution used by the .NET Framework F# compiler when running on .NET Framework. See SimulatedMSBuildReferenceResolver.fs when not using the .NET Framework F# compiler. See DependencyManager for reference resolution and package management used in fsi.

  • Importing referenced .NET binaries, see import.fsi/import.fs. Accepts file references and produces a Typed Tree node for each referenced assembly, including information about its type definitions (and type forwarders if any).

  • Importing referenced F# binaries and optimization information as Typed Tree data structures, see TypedTreePickle.fs. Accepts binary data and produces Typed Tree nodes for each referenced assembly, including information about its type/module/function/member definitions.

  • Sequentially type checking files, see CheckDeclarations.fsi/CheckDeclarations.fs. Accepts an AST plus a type checking context/state and produces new Typed Tree nodes incorporated into an updated type checking state, plus additional Typed Tree Expression nodes used during code generation. A key part of this is checking syntactic types and expressions, see CheckExpressions.fsi/CheckExpressions.fs including the state held across the checking of a file (see TcFileState) and the environment active as we traverse declarations and expressions (see TcEnv).

  • Pattern match compilation, see PatternMatchCompilation.fsi/PatternMatchCompilation.fs. Accepts a subset of checked Typed Tree nodes representing F# pattern matching and produces Typed Tree expressions implementing the pattern matching. Called during type checking as each construct involving pattern matching is processed.

  • Constraint solving, see ConstraintSolver.fsi/ConstraintSolver.fs.A constraint solver state is maintained during type checking of a single file, and constraints are progressively asserted (i.e. added to this state). Fresh inference variables are generated and variables are eliminated (solved). Variables are also generalized at various language constructs, or explicitly declared, making them "rigid". Called during type checking as each construct is processed.

  • Post-inference type checks, see PostInferenceChecks.fsi/PostInferenceChecks.fs. Called at the end of type checking/inference for each file. A range of checks that can only be enforced after type checking on a file is complete, such as analysis when using byref<'T> or other IsByRefLike structs.

  • Quotation translation, see QuotationTranslator.fsi/QuotationTranslator.fs/QuotationPickler.fsi/QuotationPickler.fs. Generates the stored information for F# quotation nodes, generated from the Typed Tree expression structures of the F# compiler. Quotations are ultimately stored as binary data plus some added type references. "ReflectedDefinition" quotations are collected and stored in a single blob.

  • Optimization phases, primarily the "Optimize" (peephole/inlining) and "Top Level Representation" (lambda lifting) phases, see Optimizer.fsi/Optimizer.fs and InnerLambdasToTopLevelFuncs.fsi/InnerLambdasToTopLevelFuncs.fs and LowerCallsAndSeqs.fs. Each of these takes Typed Tree nodes for types and expressions and either modifies the nodes in place or produces new Typed Tree nodes. These phases are orchestrated in CompilerOptions.fs

  • Code generation, see IlxGen.fsi/IlxGen.fs. Accepts Typed Tree nodes and produces Abstract IL nodes, sometimes applying optimizations.

  • Abstract IL code rewriting, see EraseClosures.fs and EraseUnions.fs. Eliminates some constructs by rewriting Abstract IL nodes.

  • Binary emit, see ilwrite.fsi/ilwrite.fs.

  • Reflection-Emit, see ilreflect.fs.

These and transformations used to build the following:

Bootstrapping

The F# compiler is bootstrapped. That is, an existing F# compiler is used to build a "proto" compiler from the current source code. That "proto" compiler is then used to compile itself, producing a "final" compiler. This ensures the final compiler is compiled with all relevant optimizations and fixes.

FSharp.Build

FSharp.Build.dll and Microsoft.FSharp.targets give MSBuild support for F# projects (.fsproj) and contain the. Although not strictly part of the F# compiler, they are essential for using F# in all contexts for .NET, aside from some more targeted scripting scenarios. The targets expose things like the CoreCompile and Fsc tasks called by MSBuild.