Decompiling program CGEN.COM
Decompiling the program CGEN.COM from Hi-Tech C compiler v3.09
On one of the Russian forums on old computers, C compilers for the Z80 processor were discussed. The author, under the name OrionExt, posted a link to the CGEN.COM listing, parsed with the IDA program. According to OrionExt, the Hi-Tech C compiler was written in C and built using itself, but without using the code optimization option.
I was interested in this question and continued the work he did on decompiling this program. As a result, everything turned out to be a little more complicated. The compiler is written in Hi-Tech C and compiled with optimization. Thank you to OrionExt for the work done and the initial information.
The CGEN.HUF file contains files obtained by disassembling the CGEN.COM binary executable file (code generator from Hi-Tech C com- piler v3.09 for CP/M). The source code in C and assembler is adapted for compilation of the CGEN1.COM executable file, which is byte-compatible with the original file when compiling the code in assembly.
For Russian-speaking users, the DOC catalog contains a trans- lation of the Hi-Tech C compiler user manual and a description of decompilation in Russian in the files Z80DOC3rus.pdf and Readme_ru.pdf.
The files are bundled using the enhuff program. To extract them into the working directory, use the dehuff program:
dehuff x CGEN.HUF
after that, the following files will appear in the working directory:
*.c
*.asm - Source codes of a disassembled program in C
and assembly languages;
Makefile - File for compiling a new executable program
in assembly language;
Make_c - File for compiling a new executable program
in C language;
lkcgen - Files for linking object files (called from
linkcgen makefile or makefile_с);
Readme_en.txt - File with English description (this file);
cgen.h - Include file with definitions of variables
and functions;
CGEN.SYM - The original program symbol file for the
debugger, for example, ZSID;
CGEN.COM - Original executable file from the package;
0.txt - Simple file for testing the created program;
STDIO.H - A modified version of the standard include
file;
cgen_all_c - Scripts for copying C and assembler source
cgen_all_asm code into a single file for easier viewing.
The following files are also present:
LIBRARY.HUF - Library as separate files.
SOURCE.HUF - C source files not present in CGEN.HUF.
Note, all source files at the end of a line use the CRLF characters accepted by CP/M.
To compile and link a new executable file, you need to run the command:
make
after its completion, the following files will be created:
cgen1.com - new executable file;
cgen1.map - memory map;
cgen1.sym - symbol file for the ZSID debugger;
cgen1.sym.sorted - symbol file sorted in ascending order of
addresses;
To test if the compiled program works, enter the command
cgen1 0.txt
and the screen will display the generated code in assembler z80.
The CGEN.COM program was written in C. Some of the functions of the program and the standard library were changed by the authors in order to optimize (reduce the size of the program, increase the speed of its work) and, of course, complicate its decompilation.
These modified standard library functions are located in libc1.asm, libc2.asm, libc3.asm, libc4.asm, and libc5.asm files. The source code is split into files based on the desire to get an exact copy of the original executable file.
By disassembling the original executable code and decompiling it into C source code, it became clear that the authors were making changes to the code at the assembly level.
To complicate the decompilation, some code was added to some of the functions that does not affect the logic of the function, however, it makes it difficult to understand its work.
For the same purpose, in some functions, the variables and code from the MS-DOS version of this program were deliberately left, not used in the CP/M version.
In some functions, to change its size, edits were made to the assembler code in the code generated by the compiler, which do not change the logic of the function, but exclude the use of the C version.
In the original executable file, in several places of the program, including the library function, the commands for restoring the stack after calling the functions were removed.
Changed the location of text constants used for information messages of the code generation program.
To fix the bugs introduced in different places, the code was added to correct their action. And not explicitly, but through access to the array. (I have not yet understood how it works.)
A rather strange (and difficult to understand) implementation of allo- cating dynamically allocated memory when building a symbol table was used.
In general, when creating the program, a rather complex scheme of protection against decompilation was used. As a result, the assembler source code is not quite relocatable yet.
The three recovered functions 1F4B.c, 2D09.c and 54B6.c turned out to be large for the optimizer and the corresponding assembler files are used for linking.
When compiling source files in C language, several warning messages are issued related to insufficient elaboration of structures when storing variable values in them. They are included as comments in the source files.
The code generated by the C compiler is added as comments to the source codes in assembly language, and almost all the differences are marked.
Command
make -f make_c
will compile and create an executable file cgen1.com from source codes in C, which does not work correctly yet, or rather, does not work at all.
Command execution
make clear
Removes all created object and executable files from the working directory, and the command
make compress
will create a package file including all the necessary files (if you have the enhuff program).
The contents of files with the .HUF extension are essentially a backup copy of the files in use.
The non-commercial purpose of this painstaking work is to popularize among potential fans of 8-bit computers the old Hi-Tech C v3.09 compiler (Hi-Tech Software) and extend its service life outside the CP/M environment (Digital Research, Inc.), for full work in the Unix-like operating system UZI-180 without using its CP/M emulator.
The solution to the problem is to recreate the relocatable object code, replace the CP/M system functions (I/O, memory allocation, etc.) with similar calls to UZI-180 and compile an executable file for this operating system. Subsequently, recreate the entire package of this wonderful compiler.
The Hi-Tech C compiler V3.09 is provided free of charge for any use, private or commercial, strictly as-is. No warranty or product support is offered or implied including merchantability, fitness for a particular purpose, or non-infringement. In no event will Hi-Tech Software or its corporate affiliates be liable for any direct or indirect damages.
You may use this software for whatever you like, providing you ACKNOWLEDGE that the copyright to this software remains with Hi-Tech Software and its corporate affiliates.
All copyrights to the algorithms used, binary code, trademarks, etc. belong to the legal owner - Microchip Technology Inc. and its subsidiaries. Commercial use and distribution of recreated source codes without permission from the copyright holderis strictly FORBIDDEN.
- create a completely relocatable source code of the CGEN.COM program;
- do the same work on the rest of the programs;
- write an instruction on using the Hi-Tech C V3.09 compiler, from the
point of view of generating a compact and optimal code, based on the
experience of recreating this program.
- Hi-Tech Software for writing a compiler and providing it for free use.
- OrionExt for initial disassembly of the CGEN.COM program.
- To all authors who are not indifferent to CP/M and have written
wonderful emulators: cpm (Keiji Murakami), iz-cpm (Iván Izaguirre),
zxcc (John Elliott), aliados (Julián Albo), cpm for osx (Thomas Harte),
tnylpo (Georg Brein), and etc.),
- Tony Nicholson for maintaining this compiler information.
- Author of a simple x86 and DOS emulator for the Linux terminal (emu2),
which allows you to run the DOS version of Hi-Tech C compiler v4.11
from a makefile under Linux or OS X.
Andrey Nikitin (nikitinprior@gmail.com)