Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Marlin ARM binary size reduction (with proposed solution) #19427

Closed
pmjdebruijn opened this issue Sep 17, 2020 · 8 comments
Closed

[FR] Marlin ARM binary size reduction (with proposed solution) #19427

pmjdebruijn opened this issue Sep 17, 2020 · 8 comments
Labels
C: Build / Toolchain T: Feature Request Features requested by users.

Comments

@pmjdebruijn
Copy link
Contributor

Description

The size of an ARM compiled binary tends to be fairly large, especially for a full featured Marlin build.

Marlin 2.0.6.1 for an SKR Mini E3 builds to roughly 261K (with additional features enabled) which in itself doesn't fit into the official 256K of supported flash the STM32F103RC has, let alone the 28K of bootloader at the start, and 4K of EEPROM emulation at the end.

While there's 512K mode for the STM32F103RC chip, which mitigates this a little bit, it essentially means we're using an area of the flash that the vendor does not warrant to be properly functional, even if it seems so. In practical terms, even if there's problems in that second 256K block of flash, as long as the last 4K are fine, and the first few tens of K are fine, most people won't have issues with Marlin. Still it's not a particularly great solution.

So initially I was looking for compiler flags to reduce the binary size, but quickly found out that there were few gains to be made, however at the bottom of this article, there was a little gem however:
https://thborges.github.io/blog/marlin/2019/01/07/reducing-marlin-binary-size.html

Proposed Solution

I've tested this specifically for my SKR Mini E3 V1.2, like so:

echo 'Import("env")'                                                      > buildroot/share/PlatformIO/scripts/nanolib.py
echo 'env.Append(LINKFLAGS=["--specs=nano.specs"])'                      >> buildroot/share/PlatformIO/scripts/nanolib.py
sed -i 's@  buildroot/share/PlatformIO/scripts/STM32F103RC_SKR_MINI.py@&\n  buildroot/share/PlatformIO/scripts/nanolib.py@' platformio.ini

This reduces the resulting binary by several tens of K, where a featureful Marlin build can easily fit in 256K of flash with bootloader and EEPROM emulation accounted for.

And it seems to work just fine, since I've completed a few prints already. Though further broader testing is still needed.

Additional Info

Though looking a bit further it may not actually be that surprising why it works as seemingly flawlessly as it does, given that newlib-nano is a blend of two libraries that Marlin actually uses separately, specifically newlib and avr-libc:

https://keithp.com/newlib-nano/

Here's a library you can use when developing a system using a 32-bit processor with only a few kB of memory. You don't need an allocator, and you can still have stdio to a console and even other devices. This is a fork of newlib, with the stdio bits replaced with the stdio bits from avr-libc.

I would love to hear your thoughts on newlib-nano, and more specifically consider it as the new (future) standard libc for ARM boards, as it may benefit boards other than the SKR mini E3 as well. And if all goes well (fingers crossed), possibly might not even have few if any tangible disadvantages.

@pmjdebruijn pmjdebruijn added the T: Feature Request Features requested by users. label Sep 17, 2020
@thomas374b
Copy link

thomas374b commented Sep 18, 2020

Marlin is already very good at sorting unused code out. But sometimes you can squeeze a little more kB if you remove some (unused) object files from the linker statement. You may also gain some bytes if you advise the linker to remove debugging information .. put this to the linker script

/* Remove information from the standard libraries */
   /DISCARD/ :
{
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
}

Few bytes could be also gained by removing arguments to function calls and using global variables instead. But this may make the code more unreadable and more difficult to maintain.

I found out that giving -mthumb and -flto (link time optimization) needs to be given to both compilation as well as to linking.
The optimization for size (-Os) is best for both, code size and RAM usage. The program gcc should be used for linking not ld. This affects optimizing and commandline syntax. When I tried to optimize code size I ended up with the following set of flags.

CFLAGS
-ffunction-sections -fdata-sections -mthumb -fsingle-precision-constant -fmerge-all-constants --specs=nano.specs --specs=nosys.specs -falign-labels=4 -falign-jumps=4 -falign-functions=4 -mtune=cortex-m3 -fno-non-call-exceptions -ffreestanding -finline-small-functions -findirect-inlining -Os -mcpu=cortex-m3

CXXFLAGS
-std=gnu++17 -std=gnu++17 -fno-rtti -fno-exceptions -fno-use-cxa-atexit -fno-common -fno-threadsafe-statics

LDFLAGS
-mthumb -flto -u_printf_float -Wl,--gc-sections,-Map,Marlin.map,--cref,--check-sections,--unresolved-symbols=report-all,--warn-common,--relax -TLPC1768.ld --specs=nano.specs --specs=nosys.specs -static -Wl,--start-group -lstdc++ -lgcc -lc -lm -Wl,--end-group

IMPORTANT

  • Avoid -funwind-tables -mpoke-function-name, those create more debugging symbols.

  • Using -fshort-enums -funsigned-bitfieldsor aligning to something smaller than 32bit will not help reducing the code size.

  • Do not use any of -mint8 -fsigned-char, those can render the binary unuseable. Full review of of all used int-types is needed to avoid overflowing or comparison mismatch.

  • Be careful that you don't optimize constructors for static instantiated classes away when you play with the linker script. Gcc removes strictly anything what is not referenced. The result will be smaller but nothing will work anymore. I had a hard time to figure out what went wrong.

I was using gcc version 9.3.1 and building for BigTreeTech SKR v1.4 Turbo with LPC1769.

@pmjdebruijn
Copy link
Contributor Author

Thanks for adding that...

To be more specific regarding newlib vs newlib-nano, using my custom but otherwise identical Marlin configuration resulting in 261316 - 199072 = 62244 bytes size reduction.

@jmz52
Copy link
Contributor

jmz52 commented Sep 29, 2020

It's not really a Marlin issue but a libmapple (which is used for HAL STM32F1)
Switch to HAL STM32 and you'll get about 30% side reduction.
I was able to fit Delta math with graphics UI into 128KB of STM32F103RBT6.

@pmjdebruijn
Copy link
Contributor Author

@jmz52 unless I'm looking at the wrong thing, it's the same difference, it seems STM32 uses newlib nano by default

@sjasonsmith
Copy link
Contributor

We are beginning to transition STM32F1 boards over to the STM32 HAL. We are very near feature parity between the two, and two MKS Robin boards have environments in available for both HALs.

It likely won't be worth investing effort into improving the Maple-based STM32F1 builds, since our intent is to discontinue use of that framework, since it is deprecated by PlatformIO.

@Foxies-CSTL
Copy link
Contributor

We are very near feature parity between the two, and two MKS Robin boards have environments in available for both HALs.

Hi,
I also switched the environment of the "Hispeed" board of FLSun (evolved clone of the MKS Robin_mini) from STM32F1 to STM32 HAL. It's a success and it allowed me to remove the "add_nanolib.py" script and to have an identical environment to the MKS_Robin _nano board. I compile my binaries without errors with this environment when I use the Marlin UI (TFT MKS 32) but I have some problems when I set the Classic UI ("XPT2046 :: Init ()").
There you go, if that helps.

@sjasonsmith
Copy link
Contributor

We just updated to use a new Maple version which reduces FLASH usage by ~30kB, similar to this recommendation.

At this point we don't want to invest additional effort into more Maple improvements. We need to focus on migrating our STM32F1 boards into HAL/STM32, so that the Maple dependency can be eliminated entirely.

For this reason I'm going to close this out, even if there might be some potential for further improvements.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: Build / Toolchain T: Feature Request Features requested by users.
Projects
None yet
Development

No branches or pull requests

5 participants