Skip to content

Commit

Permalink
Add refactoring guide.
Browse files Browse the repository at this point in the history
  • Loading branch information
Rot127 committed Sep 9, 2024
1 parent 87bc6db commit fdb628c
Show file tree
Hide file tree
Showing 3 changed files with 113 additions and 68 deletions.
13 changes: 7 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,15 +86,15 @@ Support
Updating an Architecture
------------------------

The update tool for Capstone is called `auto-sync` and can be found in `suite/auto-sync`.
The update tool for Capstone is called `Auto-Sync` and can be found in `suite/auto-sync`.

Not all architectures are supported yet.
Run `suite/auto-sync/Updater/ASUpdater.py -h` to get a list of currently supported architectures.

The documentation how to update with `auto-sync` or refactor an architecture module
can be found in [docs/AutoSync.md](docs/AutoSync.md).
The documentation how to update with `Auto-Sync` or refactor an architecture module
can be found in [suite/auto-sync/README.md](suite/auto-sync/README.md).

If a module does not support `auto-sync` yet, it is highly recommended to refactor it
If a module does not support `Auto-Sync` yet, it is highly recommended to refactor it
instead of attempting to update it manually.
Refactoring will take less time and updates it during the procedure.

Expand All @@ -104,10 +104,11 @@ One for `x86` and another for all the other architectures.
Until now it was not worth it to refactoring this unique `x86` backend. So `x86` is not
supported currently.

Adding an architecture
Adding an Architecture
----------------------

If your architecture is supported in LLVM or one of its forks, you can use `Auto-Sync` to
add the new module.
Checkout [suite/auto-sync/README.md](suite/auto-sync/README.md).

Otherwise, you need to implement the disassembler on your own.
Otherwise, you need to implement the disassembler on your own and make it work with the Capstone API.
84 changes: 22 additions & 62 deletions suite/auto-sync/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ SPDX-License-Identifier: BSD-3

# Architecture updater - Auto-Sync

`auto-sync` is the architecture update tool for Capstone.
`Auto-Sync` is the architecture update tool for Capstone.
Because the architecture modules of Capstone use mostly code from LLVM,
we need to update this part with every LLVM release. `auto-sync` helps
we need to update this part with every LLVM release. `Auto-Sync` helps
with this synchronization between LLVM and Capstone's modules by
automating most of it.

Expand Down Expand Up @@ -57,14 +57,14 @@ Just ensure it is in your `PATH` as `llvm-mc` and `FileCheck` (not as `llvm-mc-1

## Architecture

Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how Auto-Sync works.
Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how `Auto-Sync` works.

This step is essential! Please don't skip it.

## Update an architecture

Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync.
Not all arch-modules support Auto-Sync yet.
Updating an architecture module to the newest LLVM release, is only possible if it uses `Auto-Sync`.
Not all arch-modules support `Auto-Sync` yet.

Check if your architecture is supported.

Expand Down Expand Up @@ -94,6 +94,22 @@ you will get build errors if you try to compile Capstone.

The last step to finish the update is to fix those build errors by hand.

## Refactor an architecture

Not all architecture modules support `Auto-Sync` yet.
Here is an overview of the steps to add support for it.

<hr>

To refactor one of them to use `Auto-Sync` please follow the [RefactorGuide.md](RefactorGuide.md)

## Adding a new architecture

Adding a new architecture follows the same steps as above. With the exception that you need
to implement all the Capstone files from scratch.

Check out an `Auto-Sync` supporting architectures for guidance and open an issue if you need help.

## Additional details

### Overview updated files
Expand Down Expand Up @@ -162,60 +178,4 @@ Documentation about the `.inc` file generation is in the [llvm-capstone](https:/
python3 -m usort format src/autosync
python3 -m black src/autosync
```

## Refactor an architecture for Auto-Sync framework

Not all architecture modules support Auto-Sync yet.
Here is an overview of the steps to add support for it.

<hr>

To refactor one of them to use `auto-sync`, you need to add it to the configuration.

1. Add the architecture to the supported architectures list in `ASUpdater.py`.
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)

Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:

```
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
```

The task after this is to:

- Replace leftover C++ syntax with its C equivalent.
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
- Edit the main header file of the architecture (`include/capstone/<ARCH>.h`) to include the generated enums (see below)
- Add any missing logic to the translated files.
- Make it build and write tests.
- Run the Differ again and always select the old nodes.

**Notes:**

- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):

```
// generate content <FILENAME.inc> begin
// generate content <FILENAME.inc> end
```

The update script will insert the content of the `.inc` file at this place.

- If you find yourself fixing the same syntax error multiple times,
please consider adding a `Patch` to the `CppTranslator` for this case.

- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.

- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.

- Sometimes the LLVM code uses a single function from a larger source file.
It is not worth it to translate the whole file just for this function.
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.

## Adding a new architecture

Adding a new architecture follows the same steps as above. With the exception that you need
to implement all the Capstone files from scratch.

Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.

84 changes: 84 additions & 0 deletions suite/auto-sync/RefactorGuide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Refactor guide

This is a step by step overview how to refactor an architecture.

It can also be used to add a new architecture module. As long as it is supported by LLVM or a fork of it.

Please always contact us in the [Auto-Sync tracking issue](https://github.com/capstone-engine/capstone/issues/2015)
before working on a module.
We can provide support and save you a lot of time.

Don't hesitate to ask any questions in our [Telegram Community channel](https://t.me/CapstoneEngine).

Especially if you feel stuck or struggle to understand where an issue is coming from.
The update process is, although already simplified, relatively complex.

## Refactoring

Note:
- If we talk about C++ files in the steps below, we always refer to the files in the LLVM repo.
- `PrinterCapstone` is the class defined in `llvm-capstone/llvm/utils/TabelGen/PrinterCapstone.cpp`
- Always attempt to make the translated C file behave as closely as possible to the original C++ file! This greatly helps debugging and assures that Capstone behaves almost exactly the same as original LLVM.

- ### Prepare
- Read `CONTRIBUTING.md`
- Read `docs/ARCHITECTURE.md`
- Read `suite/auto-sync/README.md`
- Read `suite/auto-sync/ARCHITECTURE.md`
- Read `suite/auto-sync/intro.md`
- Delete all files in `arch/<ARCH>/`, except the `ARCHModule.*` and `ARCHMapping.*`.
- `cd suite/auto-sync/`
- ### Generate `inc` files
- `pip install -e .`
- Clone and build `llvm-tblgen` (see docs)
- Quickly check options of the updater `ASUpdater -h`
- Add Arch name in `Target.py`
- In [llvm-capstone](https://github.com/capstone-engine/llvm-capstone) handle arch in `PrinterCapstone.cpp::decoderEmitterEmitFieldFromInstruction()` (add decoder function)
- Generate: `ASUpdater -s IncGen -a ARCH`
- Errors? Check if the error message tells you what to do. If no hint exists, ask us.
- Check if `inc` files in `build` look good.
- ### Translation and Patching
- Check for template functions in `<ARCH>InstPrinter.cpp` and `<ARCH>Disassember.cpp`
- Copy new config in `arch_conf.json` (LoongArch for a minimal example).
- Don't forget to add `ARCHIntPrinter.cpp` to the list of the `AddCSDetail` tests!
- Add as a minimum the `<ARCH>InstPrinter.cpp`, `<ARCH>InstPrinter.h` and `<ARCH>Disassembler.cpp` to the translation list.
- Tip: The variables use in there are defined in `path_vars.json`
- Add architecture specific includes in `Patches/Includes.py`. Copy the code from another architecture for the beginning.
- Prepare API header (`<arch>.h`) for patching:
- Check the generated `inc` files. Files names like `<ARCH>GenCS<something>Enum.inc` contain enumerations for the header. Those get patched into the main header file of the architecture.
- Remove old values and add `// generated content <...> begin` comments for patching. Checkout `longarch.h` as example.
- Commit all changes so far.
- The next step will write to the `arch/` and `include/capstone/<arch>.h` header!
- Run generation, translation and copy/patch the files: `ASUpdater -a <ARCH> -w --copy-translated -s IncGen Translate PatchArchHeader`
- ### Clean up
- #### Check: All necessary files
- Arch header:
- Invalid characters in enum identifiers? Replace char in `PrinterCapstone::normalizedMnemonic`
- In `arch/<ARCH>`
- Missing identifier/symbols? -> Check if they are somewhere in the generated files. If yes, included them and update `Include.py`. If not, you have to find the LLVM source file where they are defined and add it to the `arch_config.json` to translate it.
- OR it needs the `SystemOperands.inc` file. Also can be generated by adding the arch to the list in `inc_gen.json`.
- Note: When you start the next step, you likely don't want to generate, translate and copy files again. Because your had-made fixes get overwritten. So ensure you no longer use the `-w` flag for the `ASUpdater` and you checked thoroughly that all necessary files got translated!
- Commit to save changes so far.
- #### Remove and fix C++ syntax
- Remove all **obvious irrelevant** C++ code from the translated files (e.g. class initializes)
- Double check non-obvious cases, if they are important. Rember: removing something might lead to bugs later!
- If in doubt, ask us.
- If you fix the same syntax over and over again, consider adding a Patch for the `CppTranslator`.
- Common problems:
- Missing namespace prefix `unsigned GR32Regs[]` should be `unsigned ARCH_GR32Regs[]`. See `namespace begin/end` comments in the code.
- TODO: Add more.
- If in doubt, check the original C++ file in the LLVM repo.
- ### Make it build
- Add `ARCHLinkage.h` and the functions in the `InstPrinter.c`, `ArchDisassembler.c`.
- Add essential code in `ARCHMapping.c`. Esential is everything **not** releated to details.
- If unsure how to do Capstone <-> LLVM code things, always check LoongArch. If LoongArch doesn't handle this case, check Mips, SystemZ
- ### Run tests & Fixing bugs
- Update regression MC tests: Map LLVM `mattr` and `mcpu` names to the CS identifiers if necessary. -> Edit the `mcupdater.json` config file.
- Update tests: `ASUpdater -s MCUpdate -a Arch -w`
- Run MC tests: `cstest tests/MC/Arch`
- ### Add details
- Effectively copy behavior from `LoongArchMapping.c` or `SystemZMapping.c` but change values.
- Changes to the API (structs in `arch.h`) are only allowed if it was wrong before. Otherwise only extensions.
- Don't forget to update the Python bindings.
- Run detail tests to check results.
- Run detail tests with coverage. `ArchMapping.c` should be covered near 100%

0 comments on commit fdb628c

Please sign in to comment.