Zig-based alternative to linker scripts #3206

skyfex · 2019-09-10T10:19:31Z

Sorry for the cheesy title, feel free to change it

With the current state and plans of Zig, it's looking like the only case where you need a language other than Zig for writing firmware/software, is if you need a custom linker script. What's worse, the documentation for GCC/LLD linker scripts are not all that great, and they can be hard to understand. The scripting language is also not really powerful enough on its own to do more complex things, so I've seen a case where the C preprocessor has been used to generate linker scripts.

If linker scripts could be replaced with Zig code, Zig would end up being an extremely elegant solution for embedded firmware in particular.

As far as I see it, there's two parts to solving this:

Linking can not be handled by compile time evaluation, but compile time constant values/structures should be available to the linker scripts. Some values, such as fixed memory regions and addresses, could be calculated at compile time. These values are generally useful for both the Zig runtime code itself, and to the linker script. There may be some work to define compile time functions, and pre-defined variable names or struct types that the linker can use.
Define a way for Zig code to run at link time, and define functions and data
structures that gives the Zig linker code information about the sizes and flags of sections , and instruct the linker how code and data should be laid out in memory.

This feature may be dependent on the Zig self-hosting linker #1535, but it would be nice if a proof-of-concept could be done with LLD. Does LLD have an API that can be used or does it only work with linker scripts?

andrewrk · 2019-09-10T14:18:44Z

I like where this is going.

This feature may be dependent on the Zig self-hosting linker #1535, but it would be nice if a proof-of-concept could be done with LLD. Does LLD have an API that can be used or does it only work with linker scripts?

It's always possible that this feature would ultimately generate a linker script (without necessarily exposing it to the programmer) for use with LLD.

skyfex · 2020-03-01T20:39:38Z

I thought I'd play around with this idea. Is there a way to write a file at compile time? I couldn't find anything on that, and just putting standard file IO code in comptime doesn't seem to work.

Another question is if there's a good way of extracting symbols and their values (for constants) in the build system before linking?

andrewrk · 2020-03-01T21:31:16Z

It's not possible by design to write a file in comptime code. I don't think that's needed anyway for this feature.

The feature might need access to metadata about symbols in object files, assembled assembly files, and compiled C files.
To do that the code will probably need to read ELF files and PE files.

markfirmware · 2020-05-06T01:34:28Z

@skyfex @andrewrk I have a project that generates the .ld file:

https://github.com/markfirmware/zig-vector-table/blob/c69f04caf843d9b7977980a5230f3ced415003dc/build.zig#L118-L202

markfirmware · 2020-05-27T16:47:36Z

@skyfex most recent: https://github.com/markfirmware/zig-vector-table/blob/master/linker.zig

skyfex · 2020-05-27T20:55:36Z

Excellent. I looked a bit at doing build scripts, but it seemed the documentation there was lacking so it was a bit difficult. This looks very much like what I was thinking.

What I would like is for the information used to generate the linker script be available both at runtime and in the build script. But I suppose that's just a matter of having it in a shared zig file which is imported in both build.zig and the program itself?

I also looked at the source code of LLD to see if Zig could somehow direct linking without a linker script at all. It looks like it should be pretty simple, it's just a matter of setting up "OutputSections":

https://github.com/llvm-mirror/lld/blob/0f13b95a30df8808f4507259b45f4afa0b480d75/ELF/OutputSections.h

The tricky part is that some of the sections attributes are defined as dynamic expressions, e.g:
Expr addrExpr;

As far as I understand is that this is because an output section could be located relative to the address and size of sections that precede it, and their size depends on the number and size of the input sections that contain them. The address is evaluated after all the sections has been set up.

So the problem of defining a format for a Zig linker script boils down to how you decide to generate those expressions.

Expr is defined here btw, and is just a function:

https://github.com/llvm-mirror/lld/blob/d51111a692e7ec957b91b9ba3ec626c54c5c50a5/ELF/LinkerScript.h

using Expr = std::function<ExprValue()>;

Where ExprValue is just a struct that evaluates to an address, possibly relative to a section and possibly with alignment.

Could a function in build.zig (or imported in build.zig) somehow be passed along to LLD and be assigned as an Expr function?

alexnask · 2020-05-27T21:26:19Z

What I would like is for the information used to generate the linker script be available both at runtime and in the build script. But I suppose that's just a matter of having it in a shared zig file which is imported in both build.zig and the program itself?

I think the cleanest way to do this would be to use exe.addBuildOption(...) in your build file with the generated linker script as well as outputing the .ld file.
The script will then be available in @import("build_options")

markfirmware · 2020-05-28T02:27:34Z

Whilst pondering your comments, I added the generated files to the repo for your perusal.

https://github.com/markfirmware/zig-vector-table/tree/master/generated/generated_linker_files

markfirmware · 2020-05-29T00:03:56Z

Regarding build_options, that might be a nice place to put things but they need to be primitive types - see #3127.

Regarding working more directly with lld, that will take more effort than I have available. It does beg the question though of how much of the linker machinery needs to exposed to the bare metal programmer? They need to know flash vs ram and that code, read-only data and initial mutable data go in flash, bss needs to get set to 0 in ram and there can be data in ram with undefined initial values. Then there is the root, the vector table, in flash that drags in everything else. These are abstractions that must be understood and it might be possible to understood them without referring to a linker.

I dont think the .ARM.exidx section is needed - I need to study that further.

pixelherodev · 2020-05-29T02:08:52Z

Just a piece of data: custom linker scripts can be very useful for e.g. JITs on AMD64.

For instance, if I'm emulating a 32-bit system, and I can ensure that no host data (the application itself) resides at an address used by the system I'm emulating, I can map physical memory to the emulated device 1:1 and never have to translate addresses (which can significantly improve translation times).

ikskuh · 2020-05-29T08:56:33Z

There's a lot more you can do with linker scripts, even:

Embedded Resources
Encoding meta-data for plugin systems (just use a custom section)
Add application icons
Encode multiple entry points for non-standard OS (also via custom section)
Encode compatibility information (use ELF files for firmware updates and encode the compatible hardware platforms in custom sections)

markfirmware · 2020-06-02T04:28:13Z

Ok, so for some more complex use cases, linker scripts will still be valuable.

skyfex · 2020-06-02T07:44:28Z

I'll share what we use linker scripts for where I work. We design microcontrollers, and these are the features we use in our linker scripts:

Each product has its own linker script, which defines memory areas, and uses an INCLUDE statement to include a shared linker script
The shared linker scripts lays out interrupt vectors, code/.text, C constructor/destructor functions, CRT stuff, and read-only data in flash memory
Also adds an ARM-specific section for data for exception tables (.ARM.extab) and unwinding stack (.ARM.exidx)
Lays out data which should be copied to RAM when chip boots (uses the AT instruction to store it in flash). Here we're using some alignment instructions to make sure certain sections are word-aligned
Lays out zero-initialised static memory, heap and stack. There's some arithmetic expressions and assertions to make sure heap and stack do not overlap

There's a big use-case for both us and our customers for customising the linker script when it comes to laying out static content, heap and stack in memory. The RAM is divided into blocks. Each block can be shut off to conserve power, so you may want to avoid blocks if you're not using all of the RAM. Or you can locate temporary data in blocks that will be shut off in sleep mode. There's also a cross-bar between each peripheral with DMA and each RAM block, so if you make sure DMA buffers is located in a specific block not otherwise used by the CPU, you can guarantee that the peripheral does not compete with the CPU for access to RAM.

markfirmware · 2020-06-03T23:55:30Z

Thank you @skyfex.

I think the simple case at https://github.com/markfirmware/zig-vector-table/blob/master/generated/generated_linker_files/generated_linker_script.ld is consistent with a subset of your use case with the exception that there are no alignment directives. I need to check to see if the relevant input sections are already marked with alignment requirements that are then respected by the linker.

The articulation of memory into blocks for security, power and mutation is something I will study next.

The most complex memory structure in a zig linker script that I've seen is https://github.com/vegecode/BurnedHead/blob/master/firmware/linker.ld @vegecode

Diltsman · 2020-06-23T01:48:16Z

Don't forget case where a data structure must be placed at a specific location. I generally have done this by placing the structure in a custom section and then using the linker script to place that section at the desired location.

ikskuh · 2020-07-21T19:43:30Z

for the record: if we want to get rid of linker scripts, we have to be able to model this in order to supported embedded systems:

.binary : AT (0x1000) { *(.source) } >0x2000

This means: Collect all symbols from the section .source, assign them addresses starting at 0x2000 and store them at 0x1000.

So consider this code:

export var i: u32 linksection(".source") = 10; // initialized to 10 to put into .data
export var p: *u32 linksection(".source") t= &i; // also initialized

so if we now link this with the script above considering little endian, we get the following result:

0x1000: 0x0A 0x00 0x00 0x00 # this is i
0x1004: 0x00 0x02 0x00 0x00 # this is p

&i will yield 0x2000, &p will yield 0x2004

So the linked address is different from the stored address, which is required for embedded systems to initialize the RAM at boot time. Zig code also needs to be able to get query section boundaries, builtin functions would be useful here:

const sectionInfo = @section(".data");
@TypeOf(sectionInfo) == struct {
    name: []const u8
    virtual_memory: []u8,
    stored_memory: []u8,
};

ikskuh · 2021-03-19T10:24:52Z

I made a concept art for link.zig file:

https://gist.github.com/MasterQ32/65e7a158a600a94dad22ab5a15be080f#file-linkerscript-zig

The core idea here is that you can have a imperative way of controlling the linking. The script here displays the power to modify section data in the linking step and computing stuff like checksums and such, also a fine grained control over load vs. virt ual addresses

We could implement linker scripts on top of that or just a possibility to use zig to declare the info for the linker.

My implementation idea would be to have a link_runner.zig similar to the build runner that will be compiled into a specialized version of zld where the linking logic is declared by the file instead of using the default one

markfirmware · 2021-04-06T19:03:19Z

What is zld? Is there a future for generated .ld files?

I have updated my approach to explicate input sections https://github.com/markfirmware/zig-vector-table/blob/a2c56a793792f6e078813107387fab389121e415/system_model.zig#L1-L7

I am still using simple declarative structures (the next step in complexity might be to use a builder pattern.) The only artificial non-zig steps that the embedded programmer must employ are to understand these compiler section names and add them to system_model.zig by trial and error. I am guessing that they are determined by llvm at this point.

andrewrk · 2021-04-06T19:04:47Z

Related to this issue, check out the readme of mold, specifically the part where it talks about linker scripts.

markfirmware · 2021-04-07T17:34:29Z

Thanks. I note from the linker scripts section:

It looks like there are two things that truely cannot be done by a post-link editing tool: (a) mapping input sections to output sections, and (b) applying relocations.

This is mainly what my .ld generation does.

matu3ba · 2023-07-19T19:06:48Z

I have collected the for Kernels most notable linker scripts from FreeRTOS, Zephyr, Linux and a multi-target one at ikskuh/link.zig-concept-art#1 (comment)

For reference:

1. vendor provided PIC linker scripts with multi-arch support (license reproduces verbatim)
1. https://github.com/torvalds/linux/blob/ccff6d117d8dc8d8d86e8695a75e5f8b01e573bf/arch/x86/kernel/vmlinux.lds.S with macros suggested as realistic one on https://mcyoung.xyz/2021/06/01/linker-script/ (GPL2) and you can see wildly different once, if you search via fd '.*.lds.S' in the Linux Kernel code
1. https://github.com/tock/tock/blob/036ecda4a3cac4081e86f150c9891908c99d505e/boards/kernel_layout.ld simple and heavily commented one suggested as realistic one on https://mcyoung.xyz/2021/06/01/linker-script/ (Apache/MIT)
1. zephyr has nice common linker scripts for language specific bits here including logging stuff etc https://github.com/zephyrproject-rtos/zephyr/tree/20e7c6db6cc29984b653c7bb103ed1cee8cc24b7/include/zephyr/linker/common-rom
1. freertos has a nice abstractions over linkers, see http://freertoshal.github.io/doxygen/group__LINKER.html and see for example here for a nice one which utilizes assertions heavily: https://github.com/FreeRTOS/FreeRTOS/blob/e39bb188dd35b58734f0f38b281ed44051dbc4b6/FreeRTOS/Demo/CORTEX_A9_Cyclone_V_SoC_DK/cycloneV-dk-oc-ram.ld#L4 (licenses reproduces verbatim)

@markfirmware I do see that you do not generate the linker script anymore in commit markfirmware/zig-vector-table@797e17d. What technical reasons did you have for not generating the linker script anymore?

71GA · 2024-01-25T13:23:06Z

In embedded, we deal with microcontrollers (MCU) which embed microprocessors (MCPU). So, when we say ARM Cortex-M4 MCU we actually mean an MCU which embeds one variant of ARM Cortex-M4 MCPU. It could embed any other ARM MCPU or RISC-V MCPU...

Everything starts when you power on the device! MCPU (ARM Cortex-M4 for example) turns on and spends some meaningles cycles to prepare for interpretation of ARM instructions and then it starts interpretting them... When it starts interpretting, we first feed it our custom ARM Cortex-M4 "startup code" (written using "unified ARM assembly" and "ARM instruction set" supported by MCPU).

But this "startup code" is like a punch card! If holes (individual ARM instructions from the supported ARM instruction set) are at the wrong positions (memmory locations) it will crash! This is what linker script has to solve. It enables programmers to use labels inside the "startup code" and linker script enables us to decide at which memmory location to position a single or multiple ARM instructions.

If this is not possible in Zig, then it can not be used on embedded without OS (baremetal). It could still be used on embeedded with an OS (which has its own "startup code" that it feeds to the MCPU at startup).

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 10, 2019

andrewrk added this to the 0.6.0 milestone Sep 10, 2019

andrewrk modified the milestones: 0.6.0, 0.7.0 Feb 13, 2020

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 27, 2020

ikskuh mentioned this issue Nov 10, 2020

Allow extern variables to volatile #7052

Open

ghost mentioned this issue Dec 27, 2020

Native Assembler: Improvements, Tweaks, Enhancements #7561

Open

andrewrk mentioned this issue May 9, 2021

completely eliminate dependency on LLD #8726

Open

4 tasks

andrewrk removed this from the 0.8.0 milestone May 19, 2021

andrewrk added this to the 0.9.0 milestone May 19, 2021

wizzard0 mentioned this issue Oct 23, 2021

Debug programs built by zig cannot be loaded into Instruments.app (Zig programs built for macOS do not have entitlements / proper bundling) #9977

Closed

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

skyfex changed the title ~~Linker scripts must die~~ Zig-based alternative to linker scripts Aug 23, 2022

andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023

andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023

ikskuh mentioned this issue Sep 21, 2023

Tracking Issue: Problems with Zig ZigEmbeddedGroup/microzig#143

Open

andrewrk mentioned this issue Jan 8, 2024

Entrypoint is now required to be _start on thumb-freestanding builds? #18482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zig-based alternative to linker scripts #3206

Zig-based alternative to linker scripts #3206

skyfex commented Sep 10, 2019

andrewrk commented Sep 10, 2019 •

edited

Loading

skyfex commented Mar 1, 2020

andrewrk commented Mar 1, 2020

markfirmware commented May 6, 2020

markfirmware commented May 27, 2020

skyfex commented May 27, 2020

alexnask commented May 27, 2020

markfirmware commented May 28, 2020

markfirmware commented May 29, 2020

pixelherodev commented May 29, 2020

ikskuh commented May 29, 2020

markfirmware commented Jun 2, 2020

skyfex commented Jun 2, 2020

markfirmware commented Jun 3, 2020

Diltsman commented Jun 23, 2020

ikskuh commented Jul 21, 2020

ikskuh commented Mar 19, 2021

markfirmware commented Apr 6, 2021

andrewrk commented Apr 6, 2021 •

edited

Loading

markfirmware commented Apr 7, 2021

matu3ba commented Jul 19, 2023

71GA commented Jan 25, 2024

Zig-based alternative to linker scripts #3206

Zig-based alternative to linker scripts #3206

Comments

skyfex commented Sep 10, 2019

andrewrk commented Sep 10, 2019 • edited Loading

skyfex commented Mar 1, 2020

andrewrk commented Mar 1, 2020

markfirmware commented May 6, 2020

markfirmware commented May 27, 2020

skyfex commented May 27, 2020

alexnask commented May 27, 2020

markfirmware commented May 28, 2020

markfirmware commented May 29, 2020

pixelherodev commented May 29, 2020

ikskuh commented May 29, 2020

markfirmware commented Jun 2, 2020

skyfex commented Jun 2, 2020

markfirmware commented Jun 3, 2020

Diltsman commented Jun 23, 2020

ikskuh commented Jul 21, 2020

ikskuh commented Mar 19, 2021

markfirmware commented Apr 6, 2021

andrewrk commented Apr 6, 2021 • edited Loading

markfirmware commented Apr 7, 2021

matu3ba commented Jul 19, 2023

71GA commented Jan 25, 2024

andrewrk commented Sep 10, 2019 •

edited

Loading

andrewrk commented Apr 6, 2021 •

edited

Loading