- Ways to contribute
- Finding your way through the source grammar
- Target language dependent code
- Code conventions
- Where is Wolfram Language documented?
- Helpful Mathematica code snippets
- Mathematica oddities
Write a unit test (or doctest for the Python target) for every function in the FullFormEmitter
or PostParser
.
Write some automated tests comparing the equivalence of Mathematica's output to FoxySheep.
Here are some ideas:
- Find an input for which FoxySheep and Mathematica give differing
FullForm
's and document it in a new ticket. - Align lexical names to Mathematica.
- Implement parsing of "special input": http://reference.wolfram.com/language/tutorial/InputSyntax.html
- Implement a FullFormEmitter in a new target language. As of April, 2017, ANTLR4 can currently target Java, C#, Python 2&3, JavaScript, Go, C++, and Swift. See the ANTLR4 docs for up-to-date information about target languages.
- Implement a translator to your favorite CAS's input language.
- Implement a semantic analyzer (in your favorite target language).
- Implement error recovery rules. (This is best left to experienced developers.)
- Create a new option. Here are some ideas:
- An option affecting parsing of subtraction: to automatically simplify
Times[-1, a]
to-a
in the case thata
is a number literal; to outputSubtract[a, b]
fora-b
; to emulate Mathematica. - An option for
PostParser
to flatten functions having theFlat
attribute (Attributes[name]
containsFlat
). Similarly forOneIdentity
. - An option specifying some translation behavior.
- An option affecting parsing of subtraction: to automatically simplify
See Known Bugs.
The lexer rules are implemented in FoxySheepLexerRules.g4
, while the parser rules are in FoxySheep.g4
. Most of the action in the FoxySheep grammar happens in the expr
production rule. The grammar relies on ANTLR to implement correct operator precedence according to the order of the alternatives.
ANTLR generates a parse tree, but the tree needs to be walked using the PostParser listener (implemented in src/PostParser.java
and FoxySheep/PostParser.py
) which restructures the parse tree to flatten some of the operators.
The FullForm emitter is a visitor class. It is implemented in src/FullFormEmitter.java
and FoxySheep/FullFormEmitter.py
.
The project attempts to keep target language dependent code in the grammar to a minimum. There is minimal target language dependent code embedded in FoxySheepLexerRules.g4
that is written to be both valid Java and valid Python. It should be trivial to port this code to another target language. This code can be found:
- In every bracket-like lexer rule.
- In the actions on the PLUS, MINUS, PLUSMINUS, and MINUSPLUS lexer rules.
- In the action on DOUBLESEMICOLON ";;".
- In the NEWLINE action.
To make this embedded target language dependent code work, we put supporting member functions in a superclass that the generated lexer subclasses.
There is a post parser and FullForm emitter for Java and Python. Creating these classes for another target language would be a great way for you to contribute!
Parser rules go in FoxySheep.g4
while lexer rules go in FoxySheepLexerRules.g4
.
A readable and correct grammar is much more important than a grammar which produces a fast lexer/parser.
Unless you have a compelling reason to do otherwise, you should prefer to write
| expr (OPTOKENA | OPTOKENB | OPTOKENC) expr
for three infix operators having the same precedence instead of the three lines
| expr OPTOKENA expr
| expr OPTOKENB expr
| expr OPTOKENC expr
which forces an arbitrary and artificial precedence on the operators.
Use a lexer rule when the object is treated as a monolithic "chunk" by the parser. Use a parser rule when the object needs to be taken apart and inspected by the parser, that is, when the parser "cares" about the individual pieces on the RHS of the rule.
Maximize the use of fragment
in lexer rules.
Use named tokens.
Tokens that are string literals or character classes are in all caps. Tokens that are regular expressions or rules are in CamelCase. (If someone wants to go through and regularize the naming, go for it.)
Special symbols are just unicode characters. No non-ascii characters should appear in the code itself. Use escape codes instead. Mathematica provides a database of named characters here: .../SystemFiles/FrontEnd/TextResources/UnicodeCharacters.tr
.
There is no official grammar available. However, the language is described in detail in a few documents published on Wolfram Research's website.
Document | Description |
---|---|
Operator Input Forms | Describes the majority of the Wolfram Language syntax, including operator precedence and associativity. |
Wolfram Language Syntax | A "big picture" overview of the syntax, much more terse than the above document. |
Input Syntax | Detailed descriptions of syntax for numeric literals, strings, symbols and contexts, and other language constructs. |
Special Characters | A description of how special characters work in Mathematica. This is useful in dealing with operators involving special symbols. |
Listing of Named Characters | Lists all of the special characters Mathematica has a name for. This is useful when working with operators involving special symbols. |
Mathematica includes facilities in the language itself to describe the syntax, options, messages, etc., of functions. The most important functions that do this are given below.
Document | Description |
---|---|
SyntaxInformation |
Gives "ArgumentPattern" and other syntax properties of built-in functions. |
WolframLanguageData |
Can be used to query a database of a variety of information on Wolfram Language symbols, including many—but not all—of the information that these other functions produce. |
Definition |
Prints out attributes, defaults, options, and, for user defined functions, definitions (*-Values ). |
OwnValues |
Gives a symbol's OwnValues. |
DownValues |
DownValues are a symbols function definitions. |
UpValues |
Like DownValues , but more obscure. |
Defaults |
Default values for function arguments. |
Attributes |
Gives "attributes" of a function, i.e. whether the function is listable, flat, etc. |
Messages |
Gives all messages associated to a function. |
"\\u" <> TextString[BaseForm[ToCharacterCode["\[And]"][[1]], 16]]
FullForm["\[And]"]
FullForm
also gives what is essentially a syntax tree of any Wolfram Language expression. For example,
In[1]:= FullForm[Hold[Plus@Integrate[#, x] & /@ {x^2, x - 1/x}]]
Out[1]= Hold[Map[Function[Plus[Integrate[Slot[1],x]]],
List[Power[x,2],Plus[x,Times[-1,Times[1,Power[x,-1]]]]]]]
The Hold
is necessary. It's helpful to pass the output of FullForm
to TextString
so that it's easier to copy+paste the output: TextString[FullForm[Hold[ expression ]]]
.