v80.c: v80 assembler in c89 #13

gvvaughan · 2024-08-07T19:18:17Z

started work on #4

gvvaughan · 2024-08-07T19:41:30Z

@Kroc please feel free to scribble any feedback or suggestions all over this PR, it's far from ready to merge at the moment!

v1/v80.c

gvvaughan · 2024-08-07T21:26:50Z

Next task is to rewrite the grammar comment to be line oriented to see if I can reduce the amount of lexical book-keeping compared to the token based grammar I've half implemented so far...

gvvaughan · 2024-08-07T21:52:24Z

Pasting my question's and @Kroc's answers here for easy reference:

I have a 32 byte static token buffer for everything right now (label names, const names, numbers etc) to help enforce the token length limit, but presumably we want to handle strings of arbitrary length?
There are strings in v80 and they are 'arbitrary' in length, but line-length in v80 is hard capped at 127 cols to limit memory usage on 8-bit systems and the C implementation should enforce this too so that source code written on PC will assemble on Z80.

When v80 encounters a string, it simply writes the bytes to the code-segment one by one so the string is never stored anywhere whole -- with one exception: the file-name of an include .i statement is captured whole, but because CP/M doesn't have subfolders, the length of this is known to be limited. At the moment expressions are not allowed in include file-names, but this might be supported in the future.

When parsing expressions following .b, and the results don't fit in one byte, do you mask off the low eight bits? mask and right shift (but then that's the same as .w)? write big-endian order bytes? bail out with an error? something else?

It's an error -- when v80 encounters .b it sets a 'parameter size' variable for how many bytes (1, in this case) that expressions must fit into. If an expr > $ff then it's an error. Note that with .w using a string is an error, you can't have an ASCII string expanded to words.

“errors.txt” contains all possible errors in v80 and an explanation of what causes them so it’s a good source of detail on parsing behaviour

Are values (literal and/or resulting from expressions) limited to 16bits by the assembler? Or in principle could I configure an ISA for a 32bit machine?

yes v80 is limited to a 16-bit number internally for everything. Considering that v80 can only output bytes or words to the code-segment, 32-bit results don't actually have a practical use! Note that v80 allows underflow but errors on overflow! This is so that the negate operator can work because numbers like -7 is a negate unary operator followed by the positive number 7

Seems like the parser should be line oriented? Or can, say, an incomplete expression continue on a new line?

For memory and parsing-simplification reasons, expressions are limited to one line; the entire parser is line-orientated to allow for parsing a file larger than memory allows. v80 is 335KB of code which obviously doesn't fit into 64 KB of RAM :P

But you have to understand that v80 is purposefully limited to fit into 8-bit hardware and that a C89 version shouldn't be assembling code that can't be assembled on real 8-bit hardware otherwise that defeats the point!

Would you be interested in discussing using a context free grammar to simplify the implementation, so we don't have to track indentation levels for conditionals, whether tokens are the first on a new line or not for constants and labels etc?

v80 is not trying to be an ideal assembler; it's trying to be minimal so that it can support many systems. Things like context-free grammars, macros etc. are features for a better, more language-orientated assembler (hopefully written in v80) -- v80 exists to bootstrap 8-bit software on 8-bit machines instead of relying on PC-only toolchains. Ergo, it has no goal to be anything more than a brutally simple assembler that acts as the bedrock of a broader range of 8-bit software. If an 8-bit computer can't modify and assemble it's own software then it might as well be proprietary. An 8-bit computer that can only run software that has to be compiled on a PC is not a real computer and v80 aims to break that cycle by allowing code on a PC to also assemble on 8-bit hardware.

gvvaughan · 2024-08-07T22:19:53Z

@Kroc 'nother question about local labels (possibly leading to reducing heap usage quite a bit):

do you have documented support for jumping to local labels from outside of the non-local to which they apply?

In my fantasyvm assembler I have gone back and forth on supporting that, but currently keep all the local labels in their own table without using the non-local prefix. The local labels table is reset every time a new non-local label is defined, and unresolved local label references throw an error at that point. The downside is that if you really do need to jump into a local label from outside the current non-local label's scope, you end up having to promote some of the locals to non-local and there can be a cascade of promotions around that area as a result. I'm thinking about adding persistent locals that are recorded in the non-local label table if I find it problematic later.

Kroc · 2024-08-08T11:11:26Z

Local labels are simply appended to the last non-local label defined forming a complete label-name.
"release/readme.txt" documents each feature, are you referring to that?

1.4 Local Labels:
--------------------------------------------------------------------------------
Local labels can be "reused", as they automatically append themselves to
the last defined, non-local, label:

|   _local                  ; error: local label without label!
|
|   :label1
|   _first                  ; defines :label1_first
|   _second     jr _first   ; defines :label1_second, jumps to :label1_first
|
|   :label2
|   _first                  ; defines :label2_first
|   _second     jr _first   ; defines :label2_second, jumps to :label2_first

Note that the combined length of the local label name and its parent must not
exceed 31 characters, including label sigil:

|   :2345678901234567890    ; 20 chars
|   _234567890              ; 30 chars - OK
|   _23456789012            ; 32 chars - invalid symbol error!

It was done this way for ease of implementation, but I would like to add anonymous labels in the future or change the way local labels are implemented so that they don't take up so much heap space.

gvvaughan · 2024-08-08T16:48:11Z

Sort of. I wondered whether you want to be able to rely on, eg:

:nonlocal1
_local1
:nonlocal2
_local1             jr :nonlocal1_local1

And if that's not an explicit goal, I think there's some low hanging fruit in heap size savings with segregating local labels into a short-lived table that gets reset at every non-local label boundary. (and allowing local labels a full 31 characters since there's no longer any need to prepend the non-local label)

Kroc · 2024-08-08T17:13:36Z

The heap in v80 cannot deallocate anything, ever! If a label gets added, it cannot ever be removed, because once something else gets added to the heap (like a deferred expression, a new constant), the heap cannot shrink without deleting something else important. The space cannot be reused because that creates a fragmentation problem that would take hundreds of bytes of code to work with. The heap is append only.

Hope is not lost however; we could have label records include a sub-label linked list on the end of it so that only the local labels names are stored attached to the parent label by a linked list. The downside to this would be greater complexity and code size in label searches.

- need a line-based parser to watch v80.v80, so instead of reading the next token from the input stream on demand, we buffer the next line - redid the GRAMMAR to support a line-based parser - factored out a better memory management API and built a getdelim and getline work-alike implementation with it - the tokenizer now sets a start pointer into the buffered line, and a token length - reworked the error messages to match errors.txt docs more closely -- can't resist including the current token in the error message for ease of use - added support for nested .i, along with input file stack management - added support for .a, along with a placeholder output stream - redid most of the low-level string functions for consistency and robustness - lost constants and labels support -- they need a do-over with the line-based parser

gvvaughan · 2024-08-10T01:30:51Z

@Kroc Heap limitations make sense. For v80.c, I'll I'll use the same "append local to non-local name" for symbol table entries as you, effectively supporting jumping to local labels from another non-local block.

Largely rewrote v80.c today to take into account your earlier answers. Any other feedback welcome as I make progress...

gvvaughan · 2024-08-10T01:40:46Z

Hmm.. just occurred to me that you could have local symbols in their own linked list, and as long as each entry is the same size (32bytes for the label name, and 4 bytes for the next entry pointer) and a zero length name marking the end of the list when searching, then there's no need to deallocate anything. When a new non-local label is encountered, we can error out for unresolved local label references, and then put a 0x0 tombstone at the head of the list. New local labels would then overwrite the entries from the local label list in place starting at the head (making sure that if the next entry was allocated, it get's a 0x0 tombstone) and reusing following entries until they are all used up, and then additional local labels get pushed onto the head of the list as before.

The size of the local labels list would only ever be 36bytes * largest-number-of-locals-in-a-single-scope. Surely much better for very large programs, which are the ones most likely to overflow the heap?

gvvaughan · 2024-08-10T01:58:24Z

- fix a few little compilation failures when copiling with strict c89 mode only

Kroc · 2024-08-11T22:17:04Z

Thinking about it, what I'm trying to get at is that changes to v80's design in Z80 code can take weeks, even months -- it took six months of meticulous crafting instruction-by-instruction and I'm not the fastest developer already. Given that the assembler is now self-assembling, I don't want to break it without careful consideration, and rewriting what already works is equally time consuming, so there had to be clear net wins.

This brings me on to instructions; I hadn't thought far enough ahead about a C version (I didn't actually think anybody would take up the offer), but the C version should reuse the instruction table binary so that this work isn't duplicated for every ISA -- v80 is unique in that support for different CPU instruction sets requires minimal code changes. The instruction set is encoded as a binary tree (see "isa_z80.v80") with a small amount of CPU-specific code to handle parameters ("v80_z80.v80"). However, I'm in the process of rewriting this table (see branch "v2") logic to both greatly simplify the instruction tables (see "is2_6502.v80" in branch "v2" for just how much simpler) and hopefully save more bytes, so you'll want to hold off of parsing instructions for the moment.

gvvaughan · 2024-08-12T02:56:18Z

Oh, I didn't mean to imply you should change the algorithm, but I think it's definitely worth throwing an error when attempting to jump into the middle of a local label from another scope so that some space optimizations are still on the table in case you want to do that one day 😁

In the unlikely event that the C version catches up, I might bug you for some specs for the v2 tables then. I secretly want to add support for my fantasy vm ISA after all!

- added a line-wise tokenizer; keeping track of buffers and token start and end offsets by hand was too finicky - minimally tested

- define UINT_MAX if compiler/headers don't have it - set new global skipcol to UINT_MAX - add indent field to Include struct - new parse_condition sets skipcol if condition expression fails - parse_file sets files->indent from the column of the first token as each line is tokenized, not parsing any new lines until the first one with an indent no more than skipcol, when skipcol is reset to UINT_MAX - moved the line-too-long diagnostic to tokenize_line - when tokenize_line reaches a comment, return what was already tokenized, potentially avoiding line-too-long failures for comments - new diagnostic when a string token is found where a (non-byte-)expression is expected - expect a (non-byte-)expression after any keyword except .b, and also after a condition and when setting a constant

- exit with usage message for bad command line arguments - new xfopen helper to open a FILE* or exit with a diagnostic - do file extension substitution on input path to make an output filename if none was given, or fallback to v.out if there was no extension match in the table - keep all opened File objects on a stack and ensure they are all closed before exit - adjust grammar and implementation to allow multiple keywords on a single input line - snprintf is a C99 addition, carefully use sprintf instead - fix a variable declaration after a statement (C99 feature)

- adjust parser to work in two passes - for parsing pass 1, don't emit bytes - for parsing pass 2, don't set label addresses - reset include stack and pc value before each pass - elide __attribute__ annotations when __GNUC__ is not defined - simplify extreplace a little - fix a bug with closing files from the include stack - fix a bug with ERR_BADVALUE being too eager in .b and .w args - fix a bug with double for loop in .b argument parsing

gvvaughan · 2024-08-16T01:58:57Z

I should add tests to flush out bugs in another PR, and I don't have any code to read the opcode tables yet - but the parser handles v1/isa_6502.v80 and v1/isa_z80.v80 and produces plausible looking binary output files, so it seems to be minimally functional.

What's the usual way of building an assembler that does opcode lookup in the tables? And do you have a spec for v2 tables I can implement?

- found some code that looks like `$ $ + 1 _label` in the cpm v80 assembly files... changed the parser to support that as setting PC to the result of an expression (followed by a local label) - fixed a bug in keyword parsing, where we should return a token that can't be parsed as part of the keyword arguments so the caller can try a different leg of the recursive descent - don't attempt to close the standard streams

- diagnose number overflow at any point in evaluation of an expression - can't close and reopen stdin, so remove '-' sentinel from command line - use separate len and num fields in Token struct so that we always (for the duration of working with a specific line anyway) have the token text, even when there's a number value in the token now - use a single T_COND, storing the condition type (= - ! +) in the newly available num field - simplify parse_condition and parse_line accordingly - new simpler err_fatal_token replaces both err_fatal_token_str and err_fatal_token_value - simplify callers with new token_new_number and token_new_string - simplify tokenize_line - remove unused functions stack_zstreq, token_type and token_value

Kroc · 2024-08-16T15:49:42Z

Sorry for the slow response, I'm rather busy at home whilst my son is off school over summer. The process of parsing the instruction tables is covered by parseMnemonic in "v80_asm.v80" (

v80/v1/v80_asm.v80

Lines 1234 to 1377 in cebb049

    
           :parseMnemonic 
        
           ;=============================================================================== 
        
           ; parse an instruction into opcodes: 
        
           ; 
        
           ; the CPU-specific module (e.g. "v80_z80.wla") provides a binary tree, 
        
           ; :opcodes, that this routine walks to match instruction names to opcodes 
        
           ; and a CPU-specific set of flags that determines which parameters are required 
        
           ; 
        
           ; in:   A               first character of word to parse 
        
           ;       HL              heap addr 
        
           ; out:  HL              heap addr is advanced for any expressions deferred 
        
           ;       IY              binary code is appended to the code-segment, 
        
           ;       IX              and the virtual program-counter is advanced 
        
           ;       A, BC|DE        (clobbered) 
        
           ;------------------------------------------------------------------------------- 
        
                   ex.DE.HL                        ; swap heap to DE for now 
        
                   ld.HL   :opcodes                ; start at beginning of opcode tree 
        
                   ; the first character is already in A 
        
                   ; 
        
                   set5.A                          ; force lowercase (see desc. below) 
        
                   jr      _0                      ; jump into the parsing loop 
        
                   ;======================================================================= 
        
                   ; match; follow the branch: 
        
                   ;----------------------------------------------------------------------- 
        
                   ; once a character matches, the next two bytes are either 
        
                   ; an offset to the next branch to follow, or an opcode pair 
        
                   ; 
        
           _match  inc.HL                          ; step over the matched character 
        
                   ld.C*HL                         ; read the offset lo-byte | opcode-byte 
        
                   inc.HL                          ; move to next byte in tree 
        
                   ld.B*HL                         ; read the offset hi-byte | opcode-flags 
        
                   bit7.B                          ; is hi-bit of hi-byte set? 
        
                   jr?nz   _opcode                 ; if so, this is an opcode 
        
                   ; add the offset to the current position to jump to the new branch: 
        
                   ; NOTE: the offset in the binary tree is reduced by 1 to compensate 
        
                   ; for adding from the hi-byte addr, rather than the lo-byte addr 
        
                   ; 
        
                   adc.HL.BC 
        
                   ; if the hi-bit is set on the hi-byte, then it's an opcode + flag pair, 
        
                   ; not a jump! we branch away after the add to get a free flag-check 
        
                   ; 
        
                   ; TODO: this requires bit 6 of the opcode-flags to always be zero 
        
                   ;       otherwise the ADC can overflow, voiding this check. this 
        
                   ;       would leave us only 5 unique bits for any CPU 
        
                   ; 
        
                   ;jp?m      ,      @opcode         ; if hi-bit set, emit opcode 
        
                   ; get character from input file: 
        
                   ;----------------------------------------------------------------------- 
        
           _next   call    :readChar               ; read from input file 
        
                   cp      #SPC + 1                ; is it whitespace? (hold carry...) 
        
                   ; force lowercase, without also affecting 
        
                   ; numbers / [most] punctuation: 
        
                   ; 
        
                   ; this essentially forces ASCII codes 64-95 (@A-Z[\]^_) to codes 
        
                   ; 96-127 (`a-z{|}~) which makes A-Z lowercase with the caveat that 
        
                   ; some punctuation cannot be differentiated "@"<->"`", "[]"<->"{}", 
        
                   ; "\"<->"|" and "^"<->"~" but we aren't using any of those in the 
        
                   ; instruction names anyway 
        
                   ; 
        
                   ; it also means that ASCII codes 0-31 (non-visible) are promoted 
        
                   ; to 32-64 (visible), but we have already checked for ASCII codes 
        
                   ; 32 (space) or below and this is signalled by the carry flag; so 
        
                   ; even though the below instruction would change tab into ")", we 
        
                   ; will undo this afterwards 
        
                   ; 
        
                   set5.A                          ; force partial lowercase 
        
                   jr?nc   _0                      ; was this a non-visible char before? 
        
                   xor.A                           ; any whitespace = end-of-word (0) 
        
           _0      ld.BC   3                       ; this is faster than INC HL x 3! 
        
                   ; compare with opcode tree: 
        
                   ;----------------------------------------------------------------------- 
        
           _cp     cp*HL                           ; compare input char with tree char 
        
                   jr?z    _match                  ; characters match? 
        
                   ; if the hi-bit of the character from the opcode tree is set, it's 
        
                   ; either a continuation character (>128) or the end of a branch (=255) 
        
                   ; 
        
                   bit7*HL                         ; check bit 7 of character 
        
                   jr?nz   _cont                   ; handle continuation char / end 
        
                   ; no match; try the next character: 
        
                   ; 
        
           _skip   add.HL.BC                       ; skip 3 bytes in opcode tree 
        
                   jr      _cp                     ; compare next char in tree 
        
                   ;----------------------------------------------------------------------- 
        
                   ; handle continuation character / end-of-branch: 
        
                   ; 
        
                   ; a continuation character has no branch -- one character has to 
        
                   ; immediately follow another -- any mismatch is an unknown opcode 
        
                   ; 
        
           _cont   or      %10000000               ; *add* top bit to input char 
        
                   cp*HL                           ; redo comparison with tree 
        
                   inc.HL                          ; (move to next char in tree) 
        
                   jr?z    _next                   ; match, check next char 
        
                   jp      :errInvalIns            ; error for continuation mismatch 
        
                   ;======================================================================= 
        
                   ; emit opcode(s): 
        
                   ;----------------------------------------------------------------------- 
        
                   ; if a branch ends in an opcode then no more characters must follow, 
        
                   ; with one exception -- an apostrophe can be appended to an instruction 
        
                   ; for indicating shadow registers. this is a crude hack as no check is 
        
                   ; made to ensure it's a register at the end, but it saves hundreds of 
        
                   ; extra branches in the opcode tree 
        
                   ; 
        
           _opcode and.A                           ; if the last char is already 0, 
        
                   jr?z    _ok                     ; then no further check is needed 
        
           _get    call    :readChar               ; read one more character 
        
                   cp      ''                      ; if it is apostrophe, 
        
                   jr?z    _get                    ;  then ignore and go again 
        
                   cp      #SPC + 1                ; is it whitespace (or eof)? 
        
                   jp?nc   :errInvalIns            ; if not, invalid instruction! 
        
           _ok     ex.DE.HL                        ; swap heap back to HL 
        
                   ; the flags byte is a set of flags for CPU-specifics and what, if any, 
        
                   ; kind of parameter is required. regardless of ISA, a "0" (with hi-bit 
        
                   ; removed) always indicates no-parameters 
        
                   ; 
        
                   ld.A.B                          ; opcode flags byte 
        
                   and     %01111111               ; remove the top bit 
        
                   ; if flags byte is non-zero, analyse further (this routine is in 
        
                   ; the CPU-specific module, e.g. "v80_z80.v80" or "v80_6502.v80") 
        
                   ; 
        
                   jp?nz   :emitOpcode 
        
                   ; single opcode, no params: 
        
                   ;----------------------------------------------------------------------- 
        
                   ld*IY.C [ 0 ]                   ; emit opcode byte 
        
                   inc.IY                          ; move to next byte in code-segment 
        
                   inc.IX                          ; increment virtual program-counter 
        
                   ret

). Sorry that I don't have it better described somewhere but its a small amount of code; the tables themselves describe and demonstrate the structure so it's possible to use that alone as a guide. I'm getting near the end of the v2 instruction parser but have been struggling a lot with focus. The v2 parser is only guaranteed to make the instruction tables easier to read and write, performance is an unknown factor at the moment until I complete my prototype, so there's a small possibility v2 might be abandoned.

The "build.bat" script does some testing by building samples of the entire Z80/6502 instruction set and comparing against the same produced with WLA-DX maybe this would be a starting point? I haven't examined the PR enough to know what the build requirements of your C version are and if/how this would work as part of the current, rather crude, system. I use a batch file only so that v80 can be built out-of-the-box without having to install any dependencies or deal with high up-front demands like requiring knowledge of Docker -- remember that whatever is required to build v80 is itself a dependency of the 8-bit software at the end of the pipeline and the goal is to get away from gigabytes of constantly evolving build infrastructure :P

gvvaughan · 2024-08-16T18:40:34Z

No apologies necessary. I'm setting off on a 2-3 week road trip tomorrow, so any free time I would have had for coding will probably be spent on driving instead. Absolutely no hurry on anything from my perspective.

Build requirements for v80.c are a c89 C-compiler toolchain and a libc with support for stdio FILE*streams and a selection of c89 *printf calls (these could be coded around if it needs to build and run in an environment without stdio, but I'd rather not -- it's a lot of boring code) as well as stdlib.h for malloc, free and exit calls (could probably write a custom allocator if malloc and free are missing, managing without exit is probably a bit harder). If sys/param.h is available, it'll use the proper values for some constants, but has sensible fallbacks if not. If sys/stat.h is available, it'll check inode types when opening files for reading.

I was looking at your build.bat, and even though I enforce CP/M compatible filenames for .I arguments, you can pass any path to the compiled v80.c on the command line... I should probably take any directory prefix from the command line input file and prepend it to any filenames that come from .I args so you don't have to run it from the directory with the sources inside to find the include files.

I haven't tried building anything with WLA-DX or runcpm yet, so that's probably a good thing for me to get going to decide how to proceed, but I'd also like to write specific tests to exercise the tokenizer and parser in v80.c which probably needs a custom test harness anyway... which is why I don't want to pile that all on top of this PR.

I'm still not clear on how to assemble the *.v80 files to end up with a working assembler that contains the instruction lookup tables and the code that uses them to assemble instruction op-codes. It appears that that assembler needs to exist before it's possible to assemble the table lookup code?!?

And finally (for now ;-) ) -- I was thinking it might be easier to share the instruction opcode to binary mappings between v80.c and v80 proper if we define the instruction set separately somewhere that v80.c can load directly into a hash table, and I also provide some code to generate the lookup table sources (for v80 sources) rather than you hand coding them. That will let you tune the format for speed/space efficiency without the work of hand coding the tables too. WDYT?

- with _POSIX_C_SOURCE=1, use all local function implementations - add preprocessor guards to use library functions as available - unroll single use of TOKEN_TYPES x-macro - defer to standard ctype functions and use them as available - split xstrtou into two, and use standard strtoul library function if available - replace uses of zstrncpy and non-standard zstrlcpy with standard strlcat and strlcpy when available, or interface compatible local implementations otherwise (note: it can take some coaxing with feature macros to get declarations out of the standard headers!) - remove some newly unused functions - add some section comments

- provide fallback dirname() function in case libgen.h is missing - new global zincludedir - save a copy of the directory of infile argument to zincludedir (or "." if argv[1] has no directory component) - adjust parse_keyword_include and helpers to search zincludedir

- improve the option parser in v80.c, add a new `-i` option that preloads the symbol table with the named ISA - use a hash table for the symbol table instead of a linked list - new .m keyword support. `.m instruction body tokens` stores `instruction` as a key in the symbol table with the rest of the tokenized line as its value - new v1/tbl_6502.v80 defines the 6502 ISA using .m - new v1/tbl_z80.v80 defines the Z80 ISA using .m - when the parser encounters a (.m defined) instruction, it switches to parsing the associated macro body, usually injecting the bytes from the body into codesegment, evaluating expressions as necessary to calculate those bytes: Except for the following tokens + .b - consume a byte from the assembly source, evaluating an expression if necessary, and write the result to the codesegment + .w - consume a word from the assembly source, evaluating an expression if necessary, and write the resulting two bytes in little-endian order to the codesegment + .r - consume a word from the assembly source, evaluating an expression if necessary, treating that as a destination address, and write a single byte to the codesegment as a relative offset to that destination address

gvvaughan · 2024-08-28T14:26:34Z

Had a couple of unexpected evenings to finish the code!

This implements the instruction tables for v80.c, as well as loading and parsing. It produces sensible looking (but untested) cpm_z80.com binary from the assembly sources, so can now serve as a bootstrap mechanism.

I need to write some code to generate the isa_*.v80 tables for the v80 assembler from the tbl_*.v80 tables for the C assembler, and validate that the binary it generates runs and regenerates bit-identical content from itself when reassembling itself.

QQ: v80.c is becoming hard to navigate at this size when editing it, but also having everything in a single file makes it easier to compile. I'm tempted to pull the polyfills (for missing libc APIs) and maybe some of the data structures (linked lists, hash tables, perhaps the tokenizer) into individual pseudo-headers. That would mean adding -I$PWD/v1 to the compiler invocation to pull all that code back in (but still a single compilation unit), but would make editing and navigating the code a lot easier for me. Do you have a preference? I could be nudged either way quite easily...

Kroc · 2024-08-30T17:01:15Z

QQ: v80.c is becoming hard to navigate at this size when editing it, but also having everything in a single file makes it easier to compile. I'm tempted to pull the polyfills (for missing libc APIs) and maybe some of the data structures (linked lists, hash tables, perhaps the tokenizer) into individual pseudo-headers. That would mean adding -I$PWD/v1 to the compiler invocation to pull all that code back in (but still a single compilation unit), but would make editing and navigating the code a lot easier for me. Do you have a preference? I could be nudged either way quite easily...

Thank you for hard work! Yes, you should split the code where you are essentially "patching" the base C-functionality; I fully expect that additional replacement functions may be needed for certain combinations of operating system and compiler -- C89 compatibility was very variable in compilers even late into the 90s! Such monkey-patching and non-portable considerations shouldn't factor into the code of v80 itself so that others may have an easier time fixing for their choice of compiler/OS.

- add a more robust option processing loop - support --version - support -h, --help with some basic option help - rename --isa to --include

- polyfill/ a new directory with replacements for likely candidates for missing system headers and apis. Note: it's not a replacement for the system library, only fallbacks for apis used by this project - error.c: error handling - file.h, file.c: file handling - stack.c: simple generic stack datatype - hash.c: simple hash table datatype with buckets made with stack.c - symtab.c: symbols and a symbol table for them made with hash.c - token.h, token.c: token data type, and a line at a time tokenizer - parser.c: recursive descent parser for v80 assembly using token.c - main.c: command line processing, and driver for feeding the parser - Makefile: simple rules for making versions of v80 from the above to check that it works with almost nothing from libc using c89, and also using c99 with optimized libc functions instead of polyfills. - README.md: A little about how to build and use it all.

gvvaughan · 2024-09-04T02:41:54Z

Okay, all done @Kroc!

If I compile main.c from bootstrap to make a v80 executable on my machine:

$ cd bootstrap
$ cc -std=c89 -pedantic -ggdb3 -D_POSIX_C_SOURCE=1 -DNO_STRING_H -DNO_SYS_STAT_H -DNO_CTYPE_H -DNO_LIBGEN_H -DNO_SIZE_T -DNDEBUG -I. -o ./v80 main.c

And then use that to make a cpm_z80.com file for CP/M (note the use of the simplified tbl_z80.v80 table to populate the instruction lookup table):

$  ./v80 -i tbl_z80.v80 ../v1/cpm_z80.v80 v80c.com

It produces identical bytes after recompiling itself with ntvcm (according to vbindiff):

$ ntvcm -l ../bootstrap/v80c.com cpm_z80.v80

And also identical bytes to recompiling sources with your most recent v80.com release:

$ ntvcm -l ../release/v80.com cpm_z80.v80

Incidentally the byte encodings for the set* instructions are the same as the res* instructions in your v1/is2_z80.v80 file. I discovered and corrected those in my bootstrap/tbl_z80.com file when comparing binaries, but I haven't done a full audit to see if there are other typos in there.

If you like and merge this PR, I'll be happy to work on generating the is2_*.v80 files from the simpler tbl_*.v80 tables when you've finalized the format. Or to isa_*.v80 if you decide to abandon the v2 format.

Also, feel free to let me know if you have any suggestions for changes or improvements to what is already here.

Kroc · 2024-09-04T20:46:31Z

Incidentally the byte encodings for the set* instructions are the same as the res* instructions in your v1/is2_z80.v80 file. I discovered and corrected those in my bootstrap/tbl_z80.com file when comparing binaries, but I haven't done a full audit to see if there are other typos in there.

I had seen this and fixed it, but maybe that was only on the v2 branch :/ I can't remember things straight. My son will be back to school next week and I'll focus on integrating your C version then. I think we should merge it in the current state to a separate branch; are you able to update the PR to use a different branch (or this something I need to do?)

gvvaughan · 2024-09-04T21:02:06Z

Cool! I can definitely do it if you tell me what branch you'd like me to retarget to. I think you might also be able to do it with the edit button near the very top of the PR page? Let me know whenever you're ready!

v80.c: work in progress for a v80 assembler in c89

1f457eb

Kroc reviewed Aug 7, 2024

View reviewed changes

v1/v80.c Outdated Show resolved Hide resolved

Kroc marked this pull request as draft August 7, 2024 20:52

gvvaughan force-pushed the gvvaughan/v80-c-bootstrap branch from 66b46d8 to 625420c Compare August 10, 2024 01:26

ensure strict C89 compliance

bc97674

- fix a few little compilation failures when copiling with strict c89 mode only

gvvaughan added 4 commits August 13, 2024 23:08

implemented the rest of the parser, except conditions

5961166

- added a line-wise tokenizer; keeping track of buffers and token start and end offsets by hand was too finicky - minimally tested

gvvaughan marked this pull request as ready for review August 16, 2024 01:45

gvvaughan added 2 commits August 15, 2024 21:03

gvvaughan changed the title ~~v80.c: work in progress for a v80 assembler in c89~~ v80.c: v80 assembler in c89 Aug 16, 2024

gvvaughan added 2 commits August 27, 2024 10:34

gvvaughan added 2 commits August 30, 2024 13:19

improve command line option processing

726782f

- add a more robust option processing loop - support --version - support -h, --help with some basic option help - rename --isa to --include

Kroc changed the base branch from main to c September 12, 2024 15:31

Kroc merged commit 5e25949 into Kroc:c Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v80.c: v80 assembler in c89 #13

v80.c: v80 assembler in c89 #13

gvvaughan commented Aug 7, 2024

gvvaughan commented Aug 7, 2024

gvvaughan commented Aug 7, 2024 •

edited

Loading

gvvaughan commented Aug 7, 2024

gvvaughan commented Aug 7, 2024

Kroc commented Aug 8, 2024

gvvaughan commented Aug 8, 2024 •

edited

Loading

Kroc commented Aug 8, 2024

gvvaughan commented Aug 10, 2024

gvvaughan commented Aug 10, 2024 •

edited

Loading

gvvaughan commented Aug 10, 2024 •

edited

Loading

Kroc commented Aug 11, 2024

gvvaughan commented Aug 12, 2024

gvvaughan commented Aug 16, 2024

Kroc commented Aug 16, 2024 •

edited

Loading

gvvaughan commented Aug 16, 2024

gvvaughan commented Aug 28, 2024 •

edited

Loading

Kroc commented Aug 30, 2024

gvvaughan commented Sep 4, 2024 •

edited

Loading

Kroc commented Sep 4, 2024

gvvaughan commented Sep 4, 2024

v80.c: v80 assembler in c89 #13

v80.c: v80 assembler in c89 #13

Conversation

gvvaughan commented Aug 7, 2024

gvvaughan commented Aug 7, 2024

gvvaughan commented Aug 7, 2024 • edited Loading

gvvaughan commented Aug 7, 2024

gvvaughan commented Aug 7, 2024

Kroc commented Aug 8, 2024

gvvaughan commented Aug 8, 2024 • edited Loading

Kroc commented Aug 8, 2024

gvvaughan commented Aug 10, 2024

gvvaughan commented Aug 10, 2024 • edited Loading

gvvaughan commented Aug 10, 2024 • edited Loading

Kroc commented Aug 11, 2024

gvvaughan commented Aug 12, 2024

gvvaughan commented Aug 16, 2024

Kroc commented Aug 16, 2024 • edited Loading

gvvaughan commented Aug 16, 2024

gvvaughan commented Aug 28, 2024 • edited Loading

Kroc commented Aug 30, 2024

gvvaughan commented Sep 4, 2024 • edited Loading

Kroc commented Sep 4, 2024

gvvaughan commented Sep 4, 2024

gvvaughan commented Aug 7, 2024 •

edited

Loading

gvvaughan commented Aug 8, 2024 •

edited

Loading

gvvaughan commented Aug 10, 2024 •

edited

Loading

gvvaughan commented Aug 10, 2024 •

edited

Loading

Kroc commented Aug 16, 2024 •

edited

Loading

gvvaughan commented Aug 28, 2024 •

edited

Loading

gvvaughan commented Sep 4, 2024 •

edited

Loading