Author: José Valim <jose(dot)valim(at)gmail(dot)com>,
Eric Bailey,
Radek Szymczyszyn
Status: Final/24.0 Implemented in OTP release 24
Type: Standards Track
Created: 04-Jan-2018
Post-History:
https://github.com/erlef/documentation-wg/issues/3
https://github.com/erlang/otp/pull/2545
https://github.com/erlang/otp/pull/2803
https://github.com/erlef/build-and-packaging-wg/issues/25
This EEP proposes an official API documentation storage to be used by by BEAM languages. By standardizing how API documentation is stored, it will be possible to write tools that work across languages.
Currently, different programming languages and libraries running on BEAM devise their own schemas for storing and accessing documentation.
For example, Elixir and LFE provide a h
helper in their shell that
can print the documentation of any module:
iex> h String
A String in Elixir is a UTF-8 encoded binary.
However, Elixir is only able to show docs for Elixir modules. LFE is only able to show docs for LFE functions and so on. If documentation is standardized, such features can be easily added to other languages in a way that works consistently across all BEAM languages.
Furthermore, each language ends up building their own tools for generating, processing and converting documentation. We hope a unified approach to documentation will improve the compatibility between tools. For instance, an Erlang IDE will be able to show inline documentation for any module and function, regardless if the function is part of OTP, a library or even written in Elixir, LFE or Alpaca.
Note: in this document, the word "documentation" refers exclusively to the API documentation of modules and functions. Guides, tutorials and others materials are also essential to projects but not the focus of this EEP.
Note: This EEP is not about documentation format. It is about a mechanism for storing documentation to make it easier to produce other formats. For example, a tool can read the documentation and produce man pages from it.
This EEP is divided in three parts. The first defines the two places the documentation can be stored, the second defines the shape of the documentation and the third discusses integration with OTP.
There are two main mechanisms in which BEAM languages store documentation:
in the filesystem (usually in the /doc
directory) and inside .beam
files.
This EEP recognizes both options and aim to support both. To look for
documentation for a module name example
, a tool should:
-
Look for
example.beam
in the code path, parse the BEAM file and retrieve theDocs
chunk -
If the chunk is not available, it should look for "example.beam" in the code path and find the
doc/chunks/example.chunk
file in the application that defines theexample
module -
If a
.chunk
file is not available, then documentation is not available
The choice of using a chunk or the filesystem is completely up to the
language or library. In both cases, the documentation can be added or
removed at any moment by stripping the Docs
chunk or by removing the
doc/chunks
directory.
For example, languages like Elixir and LFE attach the Docs
chunk at
compilation time, which can be controlled via a compiler flag. On the
other hand, projects like OTP itself will likely generate the doc/chunks
entries on a separate command, completely unrelated from code compilation.
In both storages, the documentation is written in the exactly same
format: an Erlang term serialized to binary via term_to_binary/1
.
The term may be optionally compressed when serialized and must follow
the type specification below:
{docs_v1,
Anno :: erl_anno:anno(),
BeamLanguage :: atom(),
Format :: mime_type(),
ModuleDoc :: #{optional(DocLanguage) := DocValue} | none | hidden,
Metadata :: map(),
Docs ::
[{{Kind, Name, Arity},
Anno :: erl_anno:anno(),
Signature :: [binary()],
Doc :: #{optional(DocLanguage) := DocValue} | none | hidden,
Metadata :: map()
}]} when DocLanguage :: binary(),
DocValue :: binary() | term()
where in the root tuple we have:
-
Anno
- annotation (line, column, file) of the definition itself (seeerl_anno
) -
BeamLanguage
- an atom representing the language, for example:erlang
,elixir
,lfe
,alpaca
, etc -
Format
- the mime type of the documentation, such as "text/markdown" or "application/erlang+html" (see the FAQ for a discussion on this field) -
ModuleDoc
- a map with the documentation language as key, such as<<"en">>
or<<"pt_BR">>
, and the documentation as a binary value. It may be the atomnone
in case there is no documentation or the atomhidden
if documentation has been explicitly disabled for this entry -
Metadata
- a map of atom keys with any term as value. This can be used to add annotations like the "authors" of a module, "deprecated", or anything else a language or documentation tool may find relevant -
Docs
- a list of documentation for other entities (such as functions and types) in the module
For each entry in Docs
, we have:
-
{Kind, Name, Arity}
- the kind, name and arity identifying the function, callback, type, etc. The official entities are:function
,type
andcallback
. Other languages will add their own. For instance, Elixir and LFE may addmacro
-
Anno
- annotation (line, column, file) of the module documentation or of the definition itself (see erl_anno) -
Signature
- the signature of the entity. It is is a list of binaries. Each entry represents a binary in the signature that can be joined with a whitespace or a newline. For example,["binary_to_atom(Binary, Encoding)", "when is_binary(Binary)"]
may be rendered as as a single line or two lines. It exists exclusively for exhibition purposes -
Doc
- a map with the documentation language as key, such as<<"en">>
or<<"pt_BR">>
, and the documentation as a value. The documentation may either be a binary or any Erlang term, both described byFormat
. If it is an Erlang term, then theFormat
must be "application/erlang+SUFFIX", such as "application/erlang+html" when the documentation is an Erlang representation of an HTML document. TheDoc
may also be the atomnone
in case there is no documentation or the atomhidden
if documentation has been explicitly disabled for this entry -
Metadata
- a map of atom keys with any term as value
Note: the documentation map can be empty. In this case, a reference to said function was added to the documentation index, making it effectively public, but no documentation was written.
This shared format is the heart of the EEP as it is what effectively allows cross-language collaboration.
The Metadata
field exists to allow languages, tools and libraries to
add custom information to each entry. This EEP documents the
following metadata keys:
-
authors := [binary()]
- a list of authors as binaries -
cross_references := [module() | {module(), {Kind, Name, Arity}}]
- a list of modules or module entries that can be used as cross references when generating documentation -
deprecated := binary()
- when present, it means the current entry is deprecated with a binary that represents the reason for deprecation and a recommendation to replace the deprecated code -
since := binary()
- a binary representing the version such entry was added, such as<<"1.3.0">>
or<<"20.0">>
-
edit_url := binary()
- a binary representing a URL to change to change the documentation itself
Any key may be added to Metadata at any time. Keys that are frequently used by the community can be standardized in future versions.
The last part focuses on integrating the previous parts with OTP docs, tools and workflows. The items below are suggestions and are not necessary for the adoption of this EEP, neither by OTP nor by any other language or library.
At this point we should consider changes to OTP such as:
-
Distributing the
doc/chunks/*.chunk
files as part of OTP and changing the tools that ship with OTP to rely on them. For example,erl -man lists
could be changed to locate thelists.chunk
file, parsing the documentation out and then converting it to a man page on the fly. This task may require multiple changes, as OTP stores documentation on XML files as well as directly in the source code.edoc
itself should likely be augmented with functions that spit out.chunk
files from the source code -
Adding
h(Module)
,h(Module, Function, Arity)
, and similar to Erlang's shell to print the documentation of a module or of a given function and arity. This should be able to print docs any other library or language that implements this proposal
Q: Why do we have a Format entry in the documentation?
The main trade-off in the proposal is the documentation format. We have two options:
- Allow each language/library/tool to choose their own documentation format
- Impose a unified documentation format on all languages
A unified format for documentation gives no flexibility to languages and libraries in choosing how documentation is written. As the ecosystem gets more diverse, it will be unlikely to find a format that suits all. For this reason we introduced a Format field that allows each language and library to pick their documentation format. The downside is that, if the Elixir docs are written in Markdown and a language does not know how to format Markdown, then the language will have to choose to either not show the Elixir docs or show them raw (i.e. in Markdown).
Erlang is in a privileged position. All languages will be able to support whatever format is chosen for Erlang since all languages run on Erlang and will have direct access to Erlang's tooling.
Q: If I have an Erlang/Elixir/LFE/Alpaca library that uses a custom documentation toolkit, will I also be able to leverage this?
As long as the documentation ends up up in the Docs
chunk or inside
the doc/chunks
directory, we absolutely do not care how the
documentation was originally written. If you use a custom format,
you may need to teach your language of choice how to render it though.
See the previous question.
This document has been placed in the public domain.