-
Notifications
You must be signed in to change notification settings - Fork 89
Lexical structure (ref)
Programs are written using the Unicode character set, using the UTF-8 encoding. Every Nemerle source file is reduced to a sequence of lexical units (tokens) separated by sequences of whitespace characters (blanks).
There are four classes of lexical tokens:
- identifiers
- keywords
- literals
- blanks
/* A comment. */
// Also a comment
foo // identifier
foo_bar foo' foo3 // other identifiers
'a 'foo'bar'baz // more identifiers
42 // integer literal
1_000_000 // _ can be used for readability
1_42_00 // or unreadability...
0x2a // hexadecimal integer literal
0o52 // octal integer literal
0b101010 // binary integer literal
'a' // character literal
'\n' // also a character literal
"foo\nbar" // string literal
"foo" "bar" // same as "foobar"
@"x\n" // same as "x\\n"
@"x
y" // same as "x\n y"
<#This string type can contains any symbols include "
and new line. It not support escape codes
like "\n".#> // same as "This string type can contains any symbols include \"\nand new line. "
// + "It not\nsupport escape codes\nlike \"\\n\"."
@if // keyword used as identifier
<#Test <# Inner #> end#> // same as "Test <# Inner #> end" (i.e. this string type support recursion.
3.14f // float literal
3.14d, 3.14 // double literal
3.14m // decimal literal
10 // int
10u // unsigned int
10b, 10sb, 10bs // signed byte
10ub, 10bu // unsigned byte
10L // long
10UL, 10LU // unsigned long
Spaces, vertical and horizontal tabulation characters, new-page characters, new-line characters and comments (called blanks altogether) are discarded, but can separate other lexical tokens.
A traditional comment begins with a /*
, and ends with
*/
. Traditional comments cannot be nested. An end-of-line
comment starts with //, and ends with the line terminator (ASCII LF
character).
There is a set of preprocessing directives used for conditional compilation and changing line numbering context. They are the same as in C#. Allowed directives are:
#define
#undef
#if
#elif
#else
#endif
#line
#error
#warning
#region
#endregion
#pragma
Ordinary identifiers consist of letters, digits, underscores and
apostrophe, but cannot begin with a digit. Identifiers may be quoted with
the @
character, which is stripped. It removes any lexical and
syntactic meaning from the following string of characters until blank,
thus enabling programmer to use keywords as identifiers.
There is an important difference between identifiers starting with
underscore character (_
) and the other ones. When you define
local value with name starting with _
and won't use it, compiler
won't complain about it. It will warn about other unused values though.
Symbolic identifiers consist of following characters:
=
, <
, >
, @
, ^
,
|
, &
, +
, -
, *
,
/
, $
, %
, !
, ?
,
~
, .
, :
, #
.
Symbolic identifiers are treated as standard identifiers except to the
fact that they are always treated as infix operators.
Following identifiers are used as keywords, and may not be used in
any other context unquoted:
_
, abstract
, and
, array
,
as
, base
, catch
, class
,
def
, delegate
, do
, else
,
enum
, event
, extern
, false
,
finally
, for
, foreach
, fun
,
if
, implements
, in
, interface
,
internal
, lock
, macro
, match
,
module
, mutable
, namespace
, new
,
null
, out
, override
, params
,
private
, protected
, public
, ref
,
sealed
, static
, struct
, syntax
,
this
, throw
, true
, try
, type
,
typeof
, unless
, using
, variant
,
virtual
, void
, when
, where
,
while
, assert
, ignore
.
Following infix identifiers are reserved keywords:
=
, $
, ?
, |
, ->
,
=>
, <[
, ]>
, &&
, ||
.
There are few kinds of literals:
- String literals, enclosed in " ", @" " or <# #>.
- Character literals, enclosed in '.
- Numeric literal divide into Integer literals, Floating point literals
Represents string constant. Nemerle supports three forms of string:
- Regular string literals.
- Verbatim string literals.
- Recursive string literals.
A regular string literal consists of zero or more characters enclosed in double quotes and may include both simple escape sequences (such as \n for the newline character) and hexadecimal and Unicode escape sequences (See character literals for details).
A verbatim string literal consists of an @ character followed by a double-quote character, zero or more characters, and a closing double-quote character. In a verbatim string literal, the characters between the double-quotes are recognized verbatim, the only exception is a sequence "" (used to indicate '"' character) (Note that simple escape sequences and hexadecimal and Unicode escape sequences are not recognized in verbatim string literals). A verbatim string literal may span multiple lines.
Examples:
def s1 = "Nemerle string !"; // Nemerle string !
def s2 = @"Nemerle string !"; // Nemerle string !
def s3 = "Nemerle\tstring !"; // Nemerle string !
def s4 = @"Nemerle\tstring !"; // Nemerle\tstring !
def s5 = "I heard \"zonk !\""; // I heard "zonk !"
def s6 = @"I heard ""zonk !"""; // I heard "zonk !"
def s7 = "\\\\trunk\\ncc\\ncc.exe"; // \\trunk\ncc\ncc.exe
def s8 = @"\\trunk\ncc\ncc.exe"; // \\trunk\ncc\ncc.exe
def s9 = "\"Nemerle\"\nrocks\n!"; // "Nemerle"
// rocks
// !
def s10 = @"""Nemerle"" // same as s9
rocks
!";
String s10 is a verbatim string literal that spans 3 lines.
A recursive string literals similar to verbatim string literal but allow use quote symbols (which make it more flexible) and allow nested strings.
def s1 = @"Nemerle\tstring !"; // Nemerle\tstring !
def s2 = <#Nemerle\tstring !#>; // Nemerle\tstring !
def s3 = @"I heard ""zonk !"""; // I heard "zonk !"
def s4 = <#I heard "zonk !"#>; // I heard "zonk !"
def s5 = "\"Nemerle\"\nrocks\n!"; // "Nemerle"
// rocks
// !
def s6 = <#"Nemerle"
rocks
!#>; // same as s5
def s6 = <#"Nemerle"<#Nested#>string#> // "Nemerle"<#Nested#>string
You can use $
operator before the string literal to enable string interpolation feature (the `$' operator is now a shorthand to Nemerle.IO.sprint
).
def x = 40;
def y = 42;
System.Console.Write ($ "$(x + 2) == $y\n")
Any expression can be used in $(...), but there might be problems with embedded strings and so on. It is meant to be used with simple expressions like array/field access, method call, etc. If you need use embedded strings use <# #> string type. For example:
WriteLine($<#Test $("concate" + "nation")!#>); // => Test concatenation!
One can also use the ..$
notation. It helps simplifying printing of sequences (which implement System.Collections.Generic.IEnumerable[T]).
The syntax: ..$sequence
Or: ..$(seq; separatorStringExpression)
Or: ..$(seq; separatorStringExpression; elementConversionFunction)
For example:
using System.Console;
def x = 1;
def lst = [1, 2, 3, 52];
WriteLine($<#lst = ..$(lst; "; ");#>); // This string similar: @"lst = ..$(lst; ""; "");" but it simply read.
def sep = "; ";
def cnv = x => "0x" + x.ToString("X");
WriteLine($@"x = $x; lst = .. $(lst; sep; cnv);");
WriteLine($".$x;");
WriteLine($"lst = '..$lst';");
The output of the snippet above will be:
lst = 1; 2; 3; 52; x = 1; lst = 0x1; 0x2; 0x3; 0x34; .1; lst = '1, 2, 3, 52';See more examples in https://github.com/rsdn/nemerle/tree/master/ncc/testsuite/positive/printf.n
A character literal consists of one character enclosed in single-quotes (' ') or escape character of form '\X' where X can be one of the following: [FIXME:]
- \, ', " - this allows representation of (respectively) backslash, single-quote double-quote
- 0 - zero character
- $ - dollar sign
- 0X - where X is an octal ASCII code (up to three digits) of the character we want to represent (N)
- xX - where X is an hexadecimal ASCII code (exactly two digits) of the character we want to represent (N)
- xX - where X is an hexadecimal UNICODE code (at most four digits) of the character we want to represent
- uX - where X is an hexadecimal UNICODE code (exactly four digits) of the character we want to represent (N)
- UX - where X is an hexadecimal UNICODE code (exactly eight digits) of the character we want to represent (N)
- a - matches a bell (alarm) (N)
- b - matches a backspace \u0008
- r - matches carriage return \u000D
- v - matches vertical tab \u000B (N)
- t - matches horizontal tab
- f - matches form feed \u000C (N)
- n - matches a new line \u000A
- e - matches an escape \u001B
- cX - matches an ASCII control character; for example \cC represents control-C (N)
char
.
Numeric literal are allowed to have suffix describing their type as seen in tokens section. Suffix is insensitive to case and order of it's symbols.
The underscore can be used to separate groups of digits.
Integer literals, possibly prefixed with 0x, 0o or 0b to denote hexadecimal, octal or binary encoding respectively. Prefixes are case insensitive too.
decimal_literal = digits [ { '_' digits } ] [suffix]
digits = { decimal_digit }
suffix = integer_suffix
integer_suffix = 'b' | 'sb' | 'ub' | 's' | 'us' | 'u' | 'l' | 'lu'
Floating point literals, defined as:
floating_point_literal =
[ digits_ ] '.' digits_ [ exponent ] [ suffix ]
| digits_ exponent [ suffix ]
| digits_ suffix
exponent = exponential_marker [ sign ] digits
digits =
{ digit }
digits_ = digits [ { '_' digits } ]
exponential_marker =
'e' | 'E'
sign = '+' | '-'
digit = decimal_digit
suffix = floating_point_suffix
floating_point_suffix = 'f' | 'd' | 'm'
literal = 'true' | 'false' | 'null' | '()'
true
and false
have type bool
and represent respectively true/false boolean value.
null
represents a special instance of any reference type that you cannot dereference.
()
represents the only instance of type void
.