diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 3b33813c3b..e016a6134d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -86,15 +86,15 @@ Support Updating an Architecture ------------------------ -The update tool for Capstone is called `auto-sync` and can be found in `suite/auto-sync`. +The update tool for Capstone is called `Auto-Sync` and can be found in `suite/auto-sync`. Not all architectures are supported yet. Run `suite/auto-sync/Updater/ASUpdater.py -h` to get a list of currently supported architectures. -The documentation how to update with `auto-sync` or refactor an architecture module -can be found in [docs/AutoSync.md](docs/AutoSync.md). +The documentation how to update with `Auto-Sync` or refactor an architecture module +can be found in [suite/auto-sync/README.md](suite/auto-sync/README.md). -If a module does not support `auto-sync` yet, it is highly recommended to refactor it +If a module does not support `Auto-Sync` yet, it is highly recommended to refactor it instead of attempting to update it manually. Refactoring will take less time and updates it during the procedure. @@ -104,10 +104,11 @@ One for `x86` and another for all the other architectures. Until now it was not worth it to refactoring this unique `x86` backend. So `x86` is not supported currently. -Adding an architecture +Adding an Architecture ---------------------- If your architecture is supported in LLVM or one of its forks, you can use `Auto-Sync` to add the new module. +Checkout [suite/auto-sync/README.md](suite/auto-sync/README.md). -Otherwise, you need to implement the disassembler on your own. +Otherwise, you need to implement the disassembler on your own and make it work with the Capstone API. diff --git a/suite/auto-sync/README.md b/suite/auto-sync/README.md index 7973209364..24f82728ec 100644 --- a/suite/auto-sync/README.md +++ b/suite/auto-sync/README.md @@ -5,9 +5,9 @@ SPDX-License-Identifier: BSD-3 # Architecture updater - Auto-Sync -`auto-sync` is the architecture update tool for Capstone. +`Auto-Sync` is the architecture update tool for Capstone. Because the architecture modules of Capstone use mostly code from LLVM, -we need to update this part with every LLVM release. `auto-sync` helps +we need to update this part with every LLVM release. `Auto-Sync` helps with this synchronization between LLVM and Capstone's modules by automating most of it. @@ -57,14 +57,14 @@ Just ensure it is in your `PATH` as `llvm-mc` and `FileCheck` (not as `llvm-mc-1 ## Architecture -Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how Auto-Sync works. +Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how `Auto-Sync` works. This step is essential! Please don't skip it. ## Update an architecture -Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync. -Not all arch-modules support Auto-Sync yet. +Updating an architecture module to the newest LLVM release, is only possible if it uses `Auto-Sync`. +Not all arch-modules support `Auto-Sync` yet. Check if your architecture is supported. @@ -94,6 +94,22 @@ you will get build errors if you try to compile Capstone. The last step to finish the update is to fix those build errors by hand. +## Refactor an architecture + +Not all architecture modules support `Auto-Sync` yet. +Here is an overview of the steps to add support for it. + +
+ +To refactor one of them to use `Auto-Sync` please follow the [RefactorGuide.md](RefactorGuide.md) + +## Adding a new architecture + +Adding a new architecture follows the same steps as above. With the exception that you need +to implement all the Capstone files from scratch. + +Check out an `Auto-Sync` supporting architectures for guidance and open an issue if you need help. + ## Additional details ### Overview updated files @@ -162,60 +178,4 @@ Documentation about the `.inc` file generation is in the [llvm-capstone](https:/ python3 -m usort format src/autosync python3 -m black src/autosync ``` - -## Refactor an architecture for Auto-Sync framework - -Not all architecture modules support Auto-Sync yet. -Here is an overview of the steps to add support for it. - -
- -To refactor one of them to use `auto-sync`, you need to add it to the configuration. - -1. Add the architecture to the supported architectures list in `ASUpdater.py`. -2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`) - -Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step: - -``` -./Updater/ASUpdater.py -a -s IncGen Translate -``` - -The task after this is to: - -- Replace leftover C++ syntax with its C equivalent. -- Implement the `add_cs_detail()` handler in `Mapping` for each operand type. -- Edit the main header file of the architecture (`include/capstone/.h`) to include the generated enums (see below) -- Add any missing logic to the translated files. -- Make it build and write tests. -- Run the Differ again and always select the old nodes. - -**Notes:** - -- Some generated enums must be included in the `include/capstone/.h` header. -At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets): - - ``` - // generate content begin - // generate content end - ``` - -The update script will insert the content of the `.inc` file at this place. - -- If you find yourself fixing the same syntax error multiple times, -please consider adding a `Patch` to the `CppTranslator` for this case. - -- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own. - -- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them. - -- Sometimes the LLVM code uses a single function from a larger source file. -It is not worth it to translate the whole file just for this function. -Bundle those lonely functions in `DisassemblerExtension.c`. - -## Adding a new architecture - -Adding a new architecture follows the same steps as above. With the exception that you need -to implement all the Capstone files from scratch. - -Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help. + diff --git a/suite/auto-sync/RefactorGuide.md b/suite/auto-sync/RefactorGuide.md new file mode 100644 index 0000000000..5538a92792 --- /dev/null +++ b/suite/auto-sync/RefactorGuide.md @@ -0,0 +1,84 @@ +# Refactor guide + +This is a step by step overview how to refactor an architecture. + +It can also be used to add a new architecture module. As long as it is supported by LLVM or a fork of it. + +Please always contact us in the [Auto-Sync tracking issue](https://github.com/capstone-engine/capstone/issues/2015) +before working on a module. +We can provide support and save you a lot of time. + +Don't hesitate to ask any questions in our [Telegram Community channel](https://t.me/CapstoneEngine). + +Especially if you feel stuck or struggle to understand where an issue is coming from. +The update process is, although already simplified, relatively complex. + +## Refactoring + +Note: +- If we talk about C++ files in the steps below, we always refer to the files in the LLVM repo. +- `PrinterCapstone` is the class defined in `llvm-capstone/llvm/utils/TabelGen/PrinterCapstone.cpp` +- Always attempt to make the translated C file behave as closely as possible to the original C++ file! This greatly helps debugging and assures that Capstone behaves almost exactly the same as original LLVM. + +- ### Prepare + - Read `CONTRIBUTING.md` + - Read `docs/ARCHITECTURE.md` + - Read `suite/auto-sync/README.md` + - Read `suite/auto-sync/ARCHITECTURE.md` + - Read `suite/auto-sync/intro.md` + - Delete all files in `arch//`, except the `ARCHModule.*` and `ARCHMapping.*`. + - `cd suite/auto-sync/` +- ### Generate `inc` files + - `pip install -e .` + - Clone and build `llvm-tblgen` (see docs) + - Quickly check options of the updater `ASUpdater -h` + - Add Arch name in `Target.py` + - In [llvm-capstone](https://github.com/capstone-engine/llvm-capstone) handle arch in `PrinterCapstone.cpp::decoderEmitterEmitFieldFromInstruction()` (add decoder function) + - Generate: `ASUpdater -s IncGen -a ARCH` + - Errors? Check if the error message tells you what to do. If no hint exists, ask us. + - Check if `inc` files in `build` look good. +- ### Translation and Patching + - Check for template functions in `InstPrinter.cpp` and `Disassember.cpp` + - Copy new config in `arch_conf.json` (LoongArch for a minimal example). + - Don't forget to add `ARCHIntPrinter.cpp` to the list of the `AddCSDetail` tests! + - Add as a minimum the `InstPrinter.cpp`, `InstPrinter.h` and `Disassembler.cpp` to the translation list. + - Tip: The variables use in there are defined in `path_vars.json` + - Add architecture specific includes in `Patches/Includes.py`. Copy the code from another architecture for the beginning. + - Prepare API header (`.h`) for patching: + - Check the generated `inc` files. Files names like `GenCSEnum.inc` contain enumerations for the header. Those get patched into the main header file of the architecture. + - Remove old values and add `// generated content <...> begin` comments for patching. Checkout `longarch.h` as example. + - Commit all changes so far. + - The next step will write to the `arch/` and `include/capstone/.h` header! + - Run generation, translation and copy/patch the files: `ASUpdater -a -w --copy-translated -s IncGen Translate PatchArchHeader` +- ### Clean up + - #### Check: All necessary files + - Arch header: + - Invalid characters in enum identifiers? Replace char in `PrinterCapstone::normalizedMnemonic` + - In `arch/` + - Missing identifier/symbols? -> Check if they are somewhere in the generated files. If yes, included them and update `Include.py`. If not, you have to find the LLVM source file where they are defined and add it to the `arch_config.json` to translate it. + - OR it needs the `SystemOperands.inc` file. Also can be generated by adding the arch to the list in `inc_gen.json`. + - Note: When you start the next step, you likely don't want to generate, translate and copy files again. Because your had-made fixes get overwritten. So ensure you no longer use the `-w` flag for the `ASUpdater` and you checked thoroughly that all necessary files got translated! + - Commit to save changes so far. + - #### Remove and fix C++ syntax + - Remove all **obvious irrelevant** C++ code from the translated files (e.g. class initializes) + - Double check non-obvious cases, if they are important. Rember: removing something might lead to bugs later! + - If in doubt, ask us. + - If you fix the same syntax over and over again, consider adding a Patch for the `CppTranslator`. + - Common problems: + - Missing namespace prefix `unsigned GR32Regs[]` should be `unsigned ARCH_GR32Regs[]`. See `namespace begin/end` comments in the code. + - TODO: Add more. + - If in doubt, check the original C++ file in the LLVM repo. +- ### Make it build + - Add `ARCHLinkage.h` and the functions in the `InstPrinter.c`, `ArchDisassembler.c`. + - Add essential code in `ARCHMapping.c`. Esential is everything **not** releated to details. + - If unsure how to do Capstone <-> LLVM code things, always check LoongArch. If LoongArch doesn't handle this case, check Mips, SystemZ +- ### Run tests & Fixing bugs + - Update regression MC tests: Map LLVM `mattr` and `mcpu` names to the CS identifiers if necessary. -> Edit the `mcupdater.json` config file. + - Update tests: `ASUpdater -s MCUpdate -a Arch -w` + - Run MC tests: `cstest tests/MC/Arch` +- ### Add details + - Effectively copy behavior from `LoongArchMapping.c` or `SystemZMapping.c` but change values. + - Changes to the API (structs in `arch.h`) are only allowed if it was wrong before. Otherwise only extensions. + - Don't forget to update the Python bindings. + - Run detail tests to check results. + - Run detail tests with coverage. `ArchMapping.c` should be covered near 100%