Skip to content

nikitinprior/dcgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dcgen

Decompiling program CGEN.COM

Decompiling the program CGEN.COM from Hi-Tech C compiler v3.09

Introduction

On one of the Russian forums on old computers, C compilers for the Z80 processor were discussed. The author, under the name OrionExt, posted a link to the CGEN.COM listing, parsed with the IDA program. According to OrionExt, the Hi-Tech C compiler was written in C and built using itself, but without using the code optimization option.

I was interested in this question and continued the work he did on decompiling this program. As a result, everything turned out to be a little more complicated. The compiler is written in Hi-Tech C and compiled with optimization. Thank you to OrionExt for the work done and the initial information.

The CGEN.HUF file contains files obtained by disassembling the CGEN.COM binary executable file (code generator from Hi-Tech C com- piler v3.09 for CP/M). The source code in C and assembler is adapted for compilation of the CGEN1.COM executable file, which is byte-compatible with the original file when compiling the code in assembly.

For Russian-speaking users, the DOC catalog contains a trans- lation of the Hi-Tech C compiler user manual and a description of decompilation in Russian in the files Z80DOC3rus.pdf and Readme_ru.pdf.

Compiling the program

The files are bundled using the enhuff program. To extract them into the working directory, use the dehuff program:

dehuff x CGEN.HUF

after that, the following files will appear in the working directory:

*.c
*.asm		- Source codes of a disassembled program in C
		  and assembly languages;
Makefile	- File for compiling a new executable program
		  in assembly language;
Make_c	- 	  File for compiling a new executable program
		  in C language;
lkcgen		- Files for linking object files (called from
linkcgen	  makefile or makefile_с);
Readme_en.txt	- File with English description (this file);
cgen.h		- Include file with definitions of variables
		  and functions;
CGEN.SYM	- The original program symbol file for the 
		  debugger, for example, ZSID; 
CGEN.COM	- Original executable file from the package;
0.txt		- Simple file for testing the created program;
STDIO.H		- A modified version of the standard include
		  file;
cgen_all_c	- Scripts for copying C and assembler source
cgen_all_asm	  code into a single file for easier viewing.

The following files are also present:

LIBRARY.HUF 	- Library as separate files.
SOURCE.HUF 	- C source files not present in CGEN.HUF. 

Note, all source files at the end of a line use the CRLF characters accepted by CP/M.

To compile and link a new executable file, you need to run the command:

make

after its completion, the following files will be created:

cgen1.com        - new executable file;
cgen1.map        - memory map;
cgen1.sym        - symbol file for the ZSID debugger;
cgen1.sym.sorted - symbol file sorted in ascending order of
		   addresses; 

To test if the compiled program works, enter the command

cgen1 0.txt

and the screen will display the generated code in assembler z80.

Decompilation results

The CGEN.COM program was written in C. Some of the functions of the program and the standard library were changed by the authors in order to optimize (reduce the size of the program, increase the speed of its work) and, of course, complicate its decompilation.

These modified standard library functions are located in libc1.asm, libc2.asm, libc3.asm, libc4.asm, and libc5.asm files. The source code is split into files based on the desire to get an exact copy of the original executable file.

By disassembling the original executable code and decompiling it into C source code, it became clear that the authors were making changes to the code at the assembly level.

To complicate the decompilation, some code was added to some of the functions that does not affect the logic of the function, however, it makes it difficult to understand its work.

For the same purpose, in some functions, the variables and code from the MS-DOS version of this program were deliberately left, not used in the CP/M version.

In some functions, to change its size, edits were made to the assembler code in the code generated by the compiler, which do not change the logic of the function, but exclude the use of the C version.

In the original executable file, in several places of the program, including the library function, the commands for restoring the stack after calling the functions were removed.

Changed the location of text constants used for information messages of the code generation program.

To fix the bugs introduced in different places, the code was added to correct their action. And not explicitly, but through access to the array. (I have not yet understood how it works.)

A rather strange (and difficult to understand) implementation of allo- cating dynamically allocated memory when building a symbol table was used.

In general, when creating the program, a rather complex scheme of protection against decompilation was used. As a result, the assembler source code is not quite relocatable yet.

The three recovered functions 1F4B.c, 2D09.c and 54B6.c turned out to be large for the optimizer and the corresponding assembler files are used for linking.

When compiling source files in C language, several warning messages are issued related to insufficient elaboration of structures when storing variable values in them. They are included as comments in the source files.

The code generated by the C compiler is added as comments to the source codes in assembly language, and almost all the differences are marked.

Additional Information

Command

make -f make_c

will compile and create an executable file cgen1.com from source codes in C, which does not work correctly yet, or rather, does not work at all.

Command execution

make clear

Removes all created object and executable files from the working directory, and the command

make compress

will create a package file including all the necessary files (if you have the enhuff program).

The contents of files with the .HUF extension are essentially a backup copy of the files in use.

What is this all for

The non-commercial purpose of this painstaking work is to popularize among potential fans of 8-bit computers the old Hi-Tech C v3.09 compiler (Hi-Tech Software) and extend its service life outside the CP/M environment (Digital Research, Inc.), for full work in the Unix-like operating system UZI-180 without using its CP/M emulator.

The solution to the problem is to recreate the relocatable object code, replace the CP/M system functions (I/O, memory allocation, etc.) with similar calls to UZI-180 and compile an executable file for this operating system. Subsequently, recreate the entire package of this wonderful compiler.

Copyright

The Hi-Tech C compiler V3.09 is provided free of charge for any use, private or commercial, strictly as-is. No warranty or product support is offered or implied including merchantability, fitness for a particular purpose, or non-infringement. In no event will Hi-Tech Software or its corporate affiliates be liable for any direct or indirect damages.

You may use this software for whatever you like, providing you ACKNOWLEDGE that the copyright to this software remains with Hi-Tech Software and its corporate affiliates.

All copyrights to the algorithms used, binary code, trademarks, etc. belong to the legal owner - Microchip Technology Inc. and its subsidiaries. Commercial use and distribution of recreated source codes without permission from the copyright holderis strictly FORBIDDEN.

Plans for the future

  - create a completely relocatable source code of the CGEN.COM program;
  - do the same work on the rest of the programs;
  - write an instruction on using the Hi-Tech C V3.09 compiler, from the
point of view of generating a compact and optimal code, based on the
experience of recreating this program.

Appreciation

  - Hi-Tech Software for writing a compiler and providing it for free use.
  - OrionExt for initial disassembly of the CGEN.COM program.
  - To all authors who are not indifferent to CP/M and have written
wonderful emulators: cpm (Keiji Murakami), iz-cpm (Iván Izaguirre),
zxcc (John Elliott), aliados (Julián Albo), cpm for osx (Thomas Harte),
tnylpo (Georg Brein), and etc.),
  - Tony Nicholson for maintaining this compiler information.
  - Author of a simple x86 and DOS emulator for the Linux terminal (emu2),
which allows you to run the DOS version of Hi-Tech C compiler v4.11
from a makefile under Linux  or OS X.

Andrey Nikitin (nikitinprior@gmail.com)

About

Decompiling program CGEN.COM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published