Skip to content

Latest commit

 

History

History
914 lines (686 loc) · 41.1 KB

language.md

File metadata and controls

914 lines (686 loc) · 41.1 KB

Language Spec

c4wa compiler operates on a subset of C language. In this section, we attempt to describe this subset. Obviously, details might change as work on the compiler continues.

To make it clear from the start, there is no expectation that any existing C code, other than completely trivial, would pass through c4wa compilation unchanged. However, for a typical C code, which doesn't rely on external functions or libraries ( excluding malloc and everything you can easily implement or import from your runtime), doesn't use any compiler- or OS-specific features, and doesn't make too much use of the more obscure (unions, gotos, etc.) or recently added (long long, _Generic, etc.) C language features, making it c4wa -compatible shouldn't take too much effort.

Instead of attempting to support any existing C code (an attempt which would be futile anyway without supporting significant chunk of standard library), emphasis was made to make it easier to write new code

  1. with as few limitations as possible compared to standard C,
  2. such that this new code would still compile with standard C compiler.

TL;DR

To get this out of the way as early as possible, here are some of the most commonly used features of C language NOT supported by c4wa.

  • No standard library; other than a handful of built-in utilities, all functions must be implemented or imported
  • switch
  • typedef
  • union
  • enum
  • static variables or functions
  • while() {...} loop
  • size_t
  • wide char
  • Array initializers
  • Assignment operators =, +=, ++ etc. in expressions
  • Assignment of structs or using struct (not pointer) as an argument
  • long/float literals
  • labels and goto
  • Pointers to arrays, arrays of arrays
  • Function names as variables, indirect function calls
  • Bit Fields
  • Pragmas
  • Rarely used qualifiers restrict, volatile and specifiers auto and register
  • Almost all new features introduced in C99 and later standards (except runtime-length arrays, intermingled declarations, and one-line comments which are all supported)

A bit more details

As already mentioned, with a few exceptions, c4wa out of the box doesn't support any C standard library methods. Typically, you can import missing functionality from your runtime environment; that's how we could support a printf function, for example. One particular instance where you cannot rely on imported functionality is dynamic memory allocation, and c4wa does provide certain memory management utilities, covered in detail below.

cw4a supports Web Assembly primitive types (i32, i64, f32 and f64 which translate to C as int, long, float and double correspondingly), also char and short (which are internally i32, except for struct members and arrays, also some operations work for them differently), pointers, including void * pointers (also i32 internally), structures and arrays. Not all possible combinations are supported though, like you can't have pointer to an array, etc.

Any integer type could be unsigned (unlike standard C though, primitive type cannot be omitted, unsigned a is invalid). sizeof is supported (but may return results different from native C compiler due to different pointer size and alignment rules in WASM).

typedef isn't supported. All structures must have a name, and the only way to declare a variable of type struct is to use syntax struct NAME. A stuct can have other struct as its member or itself as a pointer. Recursive declarations are allowed. There are no unions.

c4wa supports all C operators, but assignment isn't treated as an operator, so you can't have syntax like a = ptr[i ++] etc. Chain assignments a = b = c ... are supported with some limitations though.

Usual pointer arithmetic is supported; & operator can be used with some limitations. Note that pointers in c4wa are 32 bit.

The ony "native" loop type in Web Assembly is do ... while(); you are encouraged to use it whenever practical since this creates cleaner and simpler WAT/WASM code. Since it is so common in C, we do nevertheless support a regular for loop, but not while() { } loop. Use for(; ... ;) syntax if you must. You can use comma , to have multiple initializations or increments, though semantic isn't entirely consistent with C standard.

You can define multiple variables in one definition like int *a, b, c[2] and you can initialize variables when you define them, e.g. int * x = alloc_smth(), but not both at the same time. There is no initialization syntax for arrays or structures (except literal strings, which are zero-terminated char arrays); neither arrays nor structs could be assigned to.

If you reach the end of a non-void function without returning a value, this will trigger "unreachable" run time error in WASM even if return value is never actually used.

Though you can simply ignore return value of a non-void function, you can't use any other expression as an operator. So for example if you have functions foo and bar returning int,

int x = foo(); 
foo();           // ignore result, OK
bar ();          // same
foo() + bar();   // attempt to use expression as an operator, not allowed 

Compiling multiple source files

If you specify more than one source files, compiler will yield one "bundle" WAT or WASM file. Functions called from a file they aren't defined in must be declared extern; more on this below. You can't currently have a global variables shared between multiple files in a way which would be consistent with C compiler and won't depend on files order, so better not try.

C Preprocessor

c4wa relies on external preprocessor installed on your system; by default it is invoked as "gcc -E -C", this could be changed with command line option -Xpreprocessor.command. Preprocessor is not required though, c4wa first checks all incoming files for any preprocessor directives and only runs them through preprocessor when necessary, or if there is any -D<name>[=<value>] command line option.

When preprocessor is used by c4wa, symbol C4WA is always defined. You can use it to create separate branches for c4wa and standard C compiler. Compiler option -xh will print all defined symbols, while option -v will print full preprocessor command.

c4wa also comes with a few include files of its own; they are installed as part of ZIP distribution and path is automatically added to the preprocessor.

Finally, c4wa does process so-called "line directives" inserted by a preprocessor and therefore will use proper line numbers when reporting syntax errors.

Built-in functions, built-in libraries and system libraries

While there is nothing resembling C standard library in c4wa, it does support a few utilities. They come in the form of built-in functions, built-in libraries and system libraries.

Built-in functions are typically such that could be directly mapped to WASM instructions (in other words, they are inline functions). There is a full list further down in the documentation.

Built-in libraries, on the other hand, are separate pieces of functionality that could be optionally added to the output. They don't become available unless explicitly "linked" with -l<library name> command line option. Technically, "linking" with such library is functionally equivalent to adding additional source files to compile, except these source files are part of the compiler installation.

Note that functions provided by libraries still need to be declared as extern before usage; built-in functions, on the other hand, do not need to be declared. Some libraries might have dedicated header files (with a name matching a standard C header file, e.g. stdarg.h or stdlib.h)

System libraries apply to a few instances where something which behaves as a built-in function in reality is implemented with a library. This is the only situation where some (very small) library will be added to your output transparently. Other than that, WAT/WASM output will only have functions you implemented in C and nothing more.

Import and export

Syntax of C doesn't exactly match Web Assembly concepts of "imported" and "exported" symbols (global variables, functions and memory); instead of introducing new incompatible syntax, c4wa solves this problem by reinterpreting existing attributes, as follows:

Function definition could be extern; this makes function exported. A function which is not extern will not be exported. Obviously, you should always have at least one extern function, but you can have as many as you want.

c4wa will only output generated code for functions which are exported (that is, declared extern) or are called from an exported function. If you have no exported functions, you'll get an empty module and a warning.

Function declaration could be static or extern; either attribute will cause declared function not to be imported. If neither attribute is present, it will be considered imported.

For example, then you declare function like double atan2(double, double) (no attributes), it is interpreted as imported, and if not provided by the run-time, this will trigger a error.

static and extern declarations are for functions defined elsewhere in the same file (in case of static) or in another file (extern). You never need static declaration to compile with c4wa , but you might need it for compatibility ith standard C compiler. extern declaration could come handy if you have more than one source file to compile, or when using a built-in library.

Be careful: an attempt to declare function without extern, while perfectly legal in standard C, will lead c4wa to treat your function as imported, and if it is defined later (including in a library), it'll trigger a compiler error.

double atan2(double, double); // no attribute, function considered imported

..................
double x = atan2(2.0, 3.0);   // no problems so far, function is declaraed

..................
double atan2(double y, double x) {  // oops, can't define imported function, compiler error;
                                    // change declaration to add `static` or `extern`

Global variables could be static, extern, or neither. extern variable will be exported, and neither extern nor static will be imported (just like a function declaration). static global variables are neither imported not exported.

(Global variable could also be const, it which case it is implicitly static unless declared as extern).

Memory behaviour is determined by compiler options (see here). It could be imported, exported (current default), purely internal or not be present at all.

All objects are exported and imported under their actual names in C (except memory, which doesn't have a C identifier and so export/import name is determined by compiler option module.memoryStatus). When importing, module name is set by compiler option module.importName (default is c4wa). So for example, if you want to import function atan2 from JavaScript runtime, you declare it in C source as double atan2(double, double) (remember: no attributes), and then use this code in JavaScript to import:

WebAssembly.instantiate(wasm_bytes, {c4wa: {atan2: Math.atan2}});

Any C attribute not listed above is explicitly not allowed (so for example you can't have static function definition).

Memory

A simple C program might not require a linear memory at all; you can use compiler option module.memoryStatus=none to not add any memory declaration in the output. However, many features, such as taking address of a local variable, using structs or arrays, calling functions with variable number of arguments (like printf), and obviously allocating memory directly, won't work without linear memory.

Composition of linear memory

Linear memory

Generated WAT/WASM file will have two special blocks of linear memory with configurable sizes.

  • Stack, size module.stackSize (default 1024), for stack variables;
  • Data, for string literals; size is flexible depending on actual space used.

Note that very first byte of memory (address 0) isn't used by the stack; this is done so no pointer could have value 0.

The rest of memory, from byte number module.stackSize + (actual data size) onwards, can only be accessed by using __builtin_memory variable or one of provided memory managers.

Low-level memory access

You can access linear memory directly from your C program by utilizing built-in global variable __builtin_memory. (It has type void * and actual numeric value 0). This has to be done very carefully, obviously, because you must make sure you assign memory correctly, and because you may override stack section or data section.

For that reason, __builtin_memory should always be used together with another built-in global variable __builtin_offset which is set to the offset where data segment ends. Its type is int.

Memory managers

Memory manager is a module built on the top of low-level access to __builtin_memory to implement methods like malloc and free for dynamic memory access.

There are currently three memory managers, in order of increasing complexity:

Library name Description
mm_incr Incremental memory allocation; nothing is ever released
mm_fixed Fixed-sized chunk allocation
mm_uni Universal memory manager

Incorporating universal memory manager with command line option -lmm_uni pretty much allows a programmer to use malloc and free as one normally would. In many ways, this is not the most optimized solution though, and it could be an overkill for simpler tasks.

Memory alignment

From version 0.5, c4wa supports different alignment options, with command line argument -a; valid values are 1, 2, 4, 8. Value 1 (default) means "no alignment", whereas with -a 8 all 64-bit memory access would be aligned to 8 bytes, 32-bit to 4 bytes, etc.

In the current WASM spec, memory alignment is merely a "hint", which allows runtime to optimize memory access. You are free to use unaligned memory (alignment = 1), or even provide incorrect hints (which essentially was the case up to version 0.4). Perhaps for this reason, in my testing using aligned memory access brings little, if any, performance benefit.

Generally, alignment=1 results in a slightly simpler WASM code using slightly less memory; that's the reason it is the default. For performance-critical applications, you may want to do a comparison with other alignment options for your target runtime, and select whatever works best for you. I suspect that for now there will be very little difference, but situation might change with newer runtimes.

In the meantime, the other side of the complete indifference of available runtimes to alignment hints is that it's impossible to verify them; to address this, c4wa now has a built-in interpreter (invoked with -e). With this option, c4wa will attempt, after compilation, to actually execute the program (function main will be called with no arguments and import function printf will be made available). It'll throw an exception on any memory access inconsistent with alignment hint. Unfortunately, because this simple interpreter is a lot slower than a typical runtime, non-trivial applications need to be scaled down to make it practical to test.

Note also that when -e is active, generated WAT and/or WASM files will only be saved with explicit -o option. Simply running c4wa-compile -e file.c will invoke the interpreter, but will save nothing regardless of the results.

Finally, if you're using low-level memory access (see above), you are then responsible for following the alignment rules. (which are, of course, only relevant if alignment > 1). To facilitate this, there is a built-in global __builtin_alignment and preprocessor symbol C4WA_ALIGNMENT both set according to option -a.

stack variables

Web Assembly supports unlimited number of local variables, so when you have a local variable in C, we directly map it to a local variable in Web Assembly (see below on how WAT names for these variables are selected).

In some situations though we have to store value in the stack, keeping its address as a local variable. In this case we refer to this variable as stack variable. This happens in these two cases:

First, this happens if you declare a variable of type array (not pointer) or struct. In this case, your variable is allocated in the stack (first module.stackSize bytes of linear memory). Actual WASM local variable holds a pointer to this memory.

&

However, cw4a would also assign a regular primitive type variable to the stack if you attempt to take an address of this variable.

Consider this C code:

void foo(int par) { ... }

void bar() {
    int a = 14;
    foo(a);
}

Compiled to WAT, function bar will look like this:

(func $bar
  (local $a i32)
  (set_local $a (i32.const 14)) 
  (call $foo (get_local $a)))

Let's now change parameter of foo to a pointer and argument to &a:

void foo(int * par) { ... }

void bar() {
    int a = 14;
    foo(&a);
}

This simple change will result in a very different WAT output for bar :

(func $bar
  (local $@stack_entry i32)
  (local $a i32)
  (set_local $@stack_entry (global.get $@stack))
  (set_local $a (global.get $@stack))
  (global.set $@stack (i32.add (global.get $@stack) (i32.const 4)))
  (i32.store (get_local $a) (i32.const 14))
  (call $foo (get_local $a))
  (global.set $@stack (get_local $@stack_entry)))

What happened here? We can't pass an address of a local variable; the only way to return a value from a Web Assembly function other than through a return value is through linear memory: to pass a memory address (or index) and have function write data to this address.

So, there is still a local variable $a but now it holds a memory address. Any attempt to access or change it will necessitate memory access. Additionally, we need to adjust stack pointer, preserve stack pointer value at function entrance and restore it at all function exit points.

Still, If everything works as it should, whether a local variable is allocated in the stack or not should make no difference in execution, but performance might well suffer.

You can't take an address of an array (since it's already the address) or a global variable (since they can't be put on the stack)

One peculiarity of c4wa is that expression &a[1] is interpreted as (&a)[1] and not &(a[1]) as it should. This is related to left recursion in Antlr4, and I haven't been able to solve this yet without significant changes to the grammar. For practical use, this is hardly a problem, you can always use parenthesis or simply replace this expression with a + 1, which is what it is anyway.

stack arrays

You can bypass manual memory allocation by using stack. Variable-length arrays are supported: When you declare an array, type variable[size], size doesn't have to be a compile-time constant, it could be any valid integer expression, subject to the limit imposed by allocated stack space.

For example, if you need to allocate integer array of size N and fill it in with consecutive numbers 0 ... N-1, either of these two alternatives will work:

extern void * malloc(int);

int * arr = malloc(N * sizeof(int)); // make sure to link with a suppored memory manager!
for (int i = 0; i < N; i ++
    arr[i] = i; 

or

int arr[N];
for (int i = 0; i < N; i ++)
    arr[i] = i; 

in the 2nd version you don't need to worry about selecting or implementing a memory manager; however, you need to be mindful of available stack size (see above); we are not checking for stack overflow, so if you take too much memory you'd start overwriting your DATA section.

Arrays are permitted in structs, but must have fixed (= known at compile time) size.

Use case: returning complex data types from exported functions

While c4wa compiler allows us to write a code where C functions could exchange complex data structures via linear memory, it can't provide a ready-to-use solution if there is a need to exchange such data types between Web Assembly code and the runtime, since it knows nothing of the runtime.

Consider one example. Let's say we need to call a function from JS runtime to return a boundary box for a certain 2D region, as determined by 4 numbers: xmin, xmax, ymin, ymax.

A regular C function would look like this:

void find_boundary_box(int * p_xmin, int * p_xmax, int * p_ymin, int * p_ymax) { ... }  

We can't easily use it as an exported function though, because then the runtime would need to know exactly which memory addresses it could pass as parameters, and this would mean that memory allocation inside C/WASM code would need to be coordinated with the runtime. It's not undoable, but it's complicated and a bad design.

One obvious alternative is to simply split this into 4 separate function for each integer to be returned, with some internal cache to avoid unnecessary repeated calculations. This has an advantage of being limited to C/WASM code and not requiring any new logic in terms of communications with the runtime; but it's still complicated, requires a separate logic to save and invalidate static cache, etc.

Finally, we could allocate new memory region, store 4 integers there, and return memory address to the runtime. This is a lot better, but one remaining issue is that if we want to reclaim this memory later, it'll again require a rather complicated logic since it couldn't be released in a function where it was allocated.

Using stack allocation solves this last problem, because stack memory is already tracked globally. So here is what we can do:

C-code:

extern int * find_boundary_box() {
    int boundary_box[4];
    ......................
    boundary_box[0] = xmin;
    boundary_box[1] = xmax;
    boundary_box[2] = ymin;
    boundary_box[3] = ymax;
    
    return boundary_box;
}

JavaScript runtime:

// ............................
const wasm = await WebAssembly.instantiate(bytes, import_object);
const exports = wasm.instance.exports;
const linear_memory = new Uint8Array(exports.memory.buffer);
const boundary_box = exports.find_boundary_box();
const [xmin, xmax, ymin, ymax] = [...Array(4).keys()].map(i => read_i32(linear_memory, boundary_box + 4 * i));

JavaScript function read_i32 could be imported from here.

If you are verifying your code in standard C compiler (which is recommended), it will probably complain about returning stack value from a function find_boundary_box. You can safely ignore it, or if it bothers you, restructure your code slightly to avoid this warning:

extern int * find_boundary_box() {
    int boundary_box[4];
    int * ret_val = boundary_box;
    ......................
    boundary_box[0] = xmin;
    boundary_box[1] = xmax;
    boundary_box[2] = ymin;
    boundary_box[3] = ymax;
    
    return ret_val;
}

Memory functions

These functions behave as normal C function in C code with given signatures but c4wa internally replaces them with Web Assembly memory operators:

Name Arguments Return Value Description
memset void * addr, char value, int size void Same as in C library
memcpy void * dest, void * src, int size void Same as in C library
memgrow int n_pages int Increase memory size by specified number of pages (1 page = 64K); returns old number of pages
memsize none int Get current memory size in pages

Note that memset and memcpy are known as bulk-memory operations and as of now are still considered experimental; they may not be supported by all runtimes. If you compile WAT file which includes these operators with wat2wasm, you must include option --enable-bulk-memory.

(P.S. 2022-01: latest version of wat2wasm no longer needs or recognizes --enable-bulk-memory)

c4wa supports these operations by default. However, for better compatibility, it provides a transparent emulation, which can enabled with compiler option
-Xwasm.bulk-memory=false .

Since memgrow and memsize are WASM-specific, when cross-compiling with a native C compiler you should provide a suitable replacement, e.g.

#ifndef C4WA
static int __memory_size = 1;
#define memgrow(size) __memory_size += (size)
#define memsize() __memory_size
#endif 

Built-in functions

In addition to memory functions memset, memcpy, memgrow, memsize discussed above, there are a few other built-in functions:

  • min and max work with any numerical arguments (of the same type), and will return result of the same type as arguments;
  • floor, ceil, sqrt, fabs. These functions work for float or double arguments, and will return same type as passed. Note that this is different from Standard C library which has special float versions of these functions, such as sqrtf; however these float functions are so rarely used, it hardly seems practical to add these names to c4wa, and in any case, any incompatibility could be easily addressed with a macro;
  • abort triggers "RuntimeError: unreachable" exception;
  • __builtin_clz, __builtin_ctz, __builtin_clzl, __builtin_ctzl, __builtin_popcount, __builtin_popcountl (see gcc documentation). Note that while in gcc behaviour of CLZ/CTZ functions is explicitly undefined if argument is 0 (in practice, implementations typically return 0), in WASM these functions return full number of bits in the argument (so 32 for first two, 64 for the last). Note also that in GNU C compiler, builtin functions don't need to be declared, thus you don't need any extra glue to cross-compile with a GNU-compatible compiler (just be mindful of argument 0).

Strings and chars

Web Assembly has special DATA section and data instruction to store strings in memory. Memory for DATA is allocated at compile time based on actual total lengths of all string literals (plus terminating zero byte).

All string literals in C code are placed in DATA section with terminating \0; identical strings are assigned same memory address. Just like in standard C, when assigned to a variable or passed as an argument to a function, string literals have type char *.

Again, like in newer C compilers, consecutive string literals are joined together, so the following code is valid:

char * file = 
    "Line 1\n"
    "Line 2\n"
    "Line 3\n";

Unlike most C implementations, string literals are writable. The following code will work in Web Assembly but will probably trigger a Bus Error with native C compiler:

char * name = "peter";
name[0] = 'P';

Char literals are supported, including all standard escape sequences.

Note that strings and chars are 8-bit. If you include a Unicode character in a string, it'll be decoded into bytes with UTF-8 encoding. You can't have a Unicode character as char literal.

You can use built-in function memcpy to copy string literal to char array, but remember to account for terminating zero byte:

char name[5];
memcpy(name, "John", 5);

For example, this is a c4wa-compatible implementation of strlen function:

int strlen(char * str) {
    int n = 0;
    do {
        str ++;
        n ++;
    }
    while(*str);
    return n;
}

You can freely pass strings to imported functions, which will then need to read actual characters from memory (and probably convert 8-bit data to Unicode strings); this is how printf function works in the testing suite.

It must be acknowledged that c4wa isn't a good environment to write a code dealing with strings. This is in part because C itself isn't, and in part because working with strings means often allocation and freeing up memory, and you do need a decent memory manager for that.

Functions with variable number of arguments

c4wa fully supports standard C syntax to define or declare functions with variable argument list. The following example (borrowed from here; note that on this occasion, original C code compiles in c4wa without a single change) will compile and produce same output in both c4wa and standard C:

#include <stdio.h>
#include <stdarg.h>

double average (int num, ...) {
    va_list arguments;
    double sum = 0;
    va_start ( arguments, num );
    for ( int x = 0; x < num; x++ )
        sum += va_arg ( arguments, double );
    va_end ( arguments );
    return sum / num;
}

extern int main() {
    printf( "%.2f\n", average ( 3, 12.2, 22.3, 4.5 ) );
    printf( "%.2f\n", average ( 5, 3.3, 2.2, 1.1, 5.5, 3.3 ) );
    return 0;
}

You can also have imported functions with variable arguments; when adding them to your runtime, actual implementation will have one additional argument after required ones, which is a memory address where all subsequent arguments shall be read (it'll be passed even if there are no additional arguments in the function call). Each optional argument, regardless of type, will occupy exactly 8 bytes (64 bits) in linear memory.

printf

The best example of this approach is function printf from the test suite.

For the purposes of c4wa, it is defined as follows:

void printf(char * format, ...);

(You can also include file stdio.h, which as of current version doesn't have anything except this one line).

Since there are no attributes, this is an imported function; since there is exactly one required argument, actual runtime implementation would have two arguments, format and offset.

Let's consider this call of printf :

int A;
unsigned long B;
double C;
char * D = "some string";
........................
printf("A = %d, B = %lx, C = %.6f, D = %s\n", A, B, C, D);

In this case, there are 5 actual arguments, but imported function will still be called with two arguments:

1-st argument format: memory address to read format string from (just like in C, any array, including string, is passed as memory address of its first element);
2-nd argument offset: memory address to read the read of arguments from.

To acquire actual values A, B,C and D, implementation will then need to gain access to linear memory and read arguments at the following memory locations:

Variable A at address offset (actual 32-bit value of A is converted to 64–bit);
Variable B at address offset + 8;
Variable C at address offset + 16;
String D is a string to be read from a location which value (32-bit converted to 64) is stored at address offset + 24.

When passing arguments, all integer values are converted to long, and all float values to double.

There is a sample node.js runtime implementation of printf here, which you can re-use. It doesn't archive 100% compatibility with C standard, but it is reasonably close. File run-wasm is an example of how it could be used in a runtime if WASM code is exporting memory.

Operators

c4wa supports all 40+ C operators, with only minimal and mostly inconsequential differences with standard C. Known inconsistencies and bugs are:

  • Incorrect prioritization of &;
  • Assignment operators: =, ++, --, +=, -=, *=, /=, %=, >>=, <<=, &=, ^=, |= could not be re-used in an expression; operators in c4wa have no immediate side effects (that is, other than via function calls). Thus, operators ++ and -- are postfix only (a++ is valid, ++a is not);
  • Comma , isn't technically an operator, it's an alternative to block { ... } to make a composite statement. a = b, c is illegal, but a = b, c = d or i ++, j ++ are ok;
  • Boolean expressions !!x, !(x == 0), !x == 0, x != 0, (x == 0) == 0 are always simplified to just x, whereas it should be 1 if x ≠ 0.

Boolean operators and values

Booleans should be reasonably consistent with C: 0 is false, 1 is true, etc. There isn't any built-in support for true or false constants, feel free to add your own via preprocessor or globals.

One thing to note, c4wa does support proper semantics for boolean && and || (so when evaluating A && B, if A evaluates to false, B is not evaluated, similarly for A || B), but at a price of generating more complex code (since there is no built-in support for such operations in Web Assembly). You may consider using bitwise & and | instead in some situations, which directly translate to WASM instructions resulting in simpler and faster code.

Casts and constants

The rules of automatic casting in c4wa are broadly consistent with a standard C compiler, but perhaps somewhat simplified.

If an assignment, function call, return expression or binary operation is used with inconsistent types, c4wa will automatically and silently apply a cast to bring lower-width value to higher-width, and will also convert any integer type to any float type regardless of width (so assigning long to float is OK), but not the other way around without an explicit cast.

int a;
long b;
float f;
double g;

b = a; // OK
a = b; // Syntax error
g = f; // OK
f = (float) g; // woudn't work without a cast 

(additionally, when integer types involved are only different by only one of them being unsigned, this will trigger compilation error without an explicit cast).

Now, all integer constants in c4wa have type int and all float constants (differentiated from integer constants by having a dot . ) have type double. This may create a problem when initializing a float, for example

float x = 1.14;

Is this legal? 1.14 is a double which can't be assigned to a float without a cast. c4wa solves this problem by applying a special rule to constants: they automatically adopt the type of the non-constant operand. So, when you assign 1.14 to a float, it automatically becomes float, as if you wrote

float x = (float) 1.14;

A few other examples or permissible and not permissible assignments:

long longNumber = -18;
float floatNumber = 1.234e2;
int intNumber = -57.4; // not even a warning, unlike standard C compiler

int * ptr = 0; // still OK

long * lptr = 1;   // Nope, this is explicitly not allowed. Only constant `0` could be assigned to a pointer.

Sometimes these two rules could be in conflict. For example, how shall we interpret comparison x > 1.0, where x is a float value? We could either interpret 1.0 as float and perform 32-bit float comparison, or convert x to double to do 64-bit float comparison.

Current implementation tends to prefer the latter, but this is still work in progress and subject to change.

To assign one pointer to another, they must have the same type, unless one of them is void *.

NULL

While there is no built-in NULL constant, you can define NULL as 0 and all customary C syntax would work as expected:

#define NULL 0

.................
int * p_x = NULL;

if (p_x) { ....

if (!p_x) { ....

if (p_x && *p_x > 0) { ....

The only caveat is that 0 pointer value could be dereferenced. The following code won't trigger any run-time errors but will silently overwrite some of your stack space:

int * p_int = 0;
* p_int = 57;

(In theory, we might have used -1 instead of 0 as an illegal pointer value, so an attempt to dereference it would have failed; this however would lead to quite a lot of complications, like properly interpreting pointers as booleans, if (ptr) {...}, etc; besides, even with ptr=-1, we would still have same problem with ptr[idx] where idx > 0.

It is way easier to keep NULL=0. While NULL checks by themselves are mostly OK as a design patter, a programmer shouldn't rely too much on runtime errors as validation. Also, in the future we may add a special "debug" mode with some additional run-time checks, including stack overflow and dereferencing 0).

Local variables mapping

Web Assembly binary format (WASM) doesn't have a concept of a variable name; it references variables by their consecutive numbers. However, in your output is text-based WAT file, it does have variable names. Since there is a close correspondence between local variables in C and in the generated Web Assembly, it is tempting to simply map local variables in C directly to WAT names, so that if for example you have variable long acnt_id in C code, it'd be mapped to (local $acnt_id i64) in WAT, as luckily that's exactly what c4wa does, most of the time. However, since release 0.4, c4wa supports block scope for local variables, and it makes things tad more complicated.

Consider this fragment of C code:

int a;

for (...) {
   double a = ...;
.....................   

Now we have two variables with names a, in exterior scope and inside the block; they even have different types, and so we must choose two separate WAT names for them. In this case, c4wa maps first a to $a, and all variables a inside the embedded blocks to some auto-generated names based off original variable name a and unique block id.

Now, let's consider the subsequent code in the same fragment:

int a;

for (...) {
   double a = ...;
.....................
}

double x = -57;   

When we get to variable x, we could map it directly to $x in WAT code; however, this would not be the most optimal solution, since we already had to add a double=f64 variable for interior a ; thus c4wa always attempts to re-use no longer needed (out of scope) variables, if type matches.

At the end, mapping to WAT names could get complicated; the end result however is WAT code which fully respects block scoping for variables and uses only minimal necessary number of local's.

Globals

Web Assembly supports global variables, so you can freely use them in C code. However, Web Assembly requires that all non-imported globals be initialized, and furthermore, you can only initialize globals to compile-time constants.

int Num_of_Points;        // imported, can't initialize
extern double Volume = 0; // exported, must initialize
static N = 10;            // internal, neither imported or exported
const test_mode = 1;      // non-mutable, implicitly 'static' unless declared 'extern'

Global values could be initialized to any compile-time constant; compile-time expressions may use sizeof.

static variables in functions aren't supported.

const

Unlike standard C, const attribute isn't part of type definition, but optional attribute of variable initialization (remember what in c4wa you can initialize only one variable, int a = 1, b = 1 isn't valid).

const int x = N + 1;         // OK, `x` can no longer be assigned to
const int x;                 // invalid
void main(const char * []);  // invalid 

Cross-compiling with C

Cross-compiling the code with standard C compiler is listed above as one of the design goals, and it should indeed be simple and straightforward; c4wa does not use any special syntax or markers not known to a C compiler. Nevertheless, a few adjustments might be necessary:

  • Due to incompatible dynamic memory semantics, you need to either emulate customary C functions malloc and free in c4wa (e.g. by using one of the provided memory managers) , or emulate linear memory in native C compiler;
  • Some built-in functions might not be available in C library or require an explicit declaration;
  • cw4a is tolerant to functions calling each other in any order within the same file.

In addition to including proper header files for C library functions (such as malloc, free or printf), you would need to somehow emulate WASM-specific functions which do not exist in standard C library. This is an example of the header you can include into your program:

#ifndef C4WA
static int __memory_size = 1;
#define memgrow(size) __memory_size += (size)
#define memsize() __memory_size
#define min(a,b) ((a) < (b))?(a):(b)
#define max(a,b) ((a) < (b))?(b):(a)
#endif

Comments

Both C-style /* ... */ and C++ line comments // ........ are supported.