Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: XCHAL_HAVE_ADDX config is virtually ineffective against GCC 10.1 or later #19

Merged
merged 2 commits into from
Dec 8, 2020
Merged

Fix: XCHAL_HAVE_ADDX config is virtually ineffective against GCC 10.1 or later #19

merged 2 commits into from
Dec 8, 2020

Conversation

jjsuwa-sys3175
Copy link

@earlephilhower
Copy link
Owner

This looks very nice, thanks! Did you build GCC with this machine definition change and verify it works, or do you need me to run a build pass on my server and try some trivial examples?

@jjsuwa-sys3175
Copy link
Author

Did you build GCC with this machine definition change and verify it works, or do you need me to run a build pass on my server and try some trivial examples?

partially built (C compiler only, with neither C++ nor libraries), and the example:

/* ADDX[248]/SUBX[248] */
unsigned int test_addx2(unsigned int a) {
    return a * 3;
}
unsigned int test_addx4(unsigned int a) {
    return a * 5;
}
unsigned int test_addx8(unsigned int a) {
    return a * 9;
}
unsigned int test_subx2(unsigned int a, unsigned int b) {
    return a * 2 - b;
}
unsigned int test_subx4(unsigned int a, unsigned int b) {
    return a * 4 - b;
}
unsigned int test_subx8(unsigned int a) {
    return a * 7;
}

will be compiled to:

	.file	"test.c"
	.text
	.literal_position
	.align	4
	.global	test_addx2
	.type	test_addx2, @function
test_addx2:
	addx2	a2, a2, a2
	ret.n
	.size	test_addx2, .-test_addx2
	.literal_position
	.align	4
	.global	test_addx4
	.type	test_addx4, @function
test_addx4:
	addx4	a2, a2, a2
	ret.n
	.size	test_addx4, .-test_addx4
	.literal_position
	.align	4
	.global	test_addx8
	.type	test_addx8, @function
test_addx8:
	addx8	a2, a2, a2
	ret.n
	.size	test_addx8, .-test_addx8
	.literal_position
	.align	4
	.global	test_subx2
	.type	test_subx2, @function
test_subx2:
	subx2	a2, a2, a3
	ret.n
	.size	test_subx2, .-test_subx2
	.literal_position
	.align	4
	.global	test_subx4
	.type	test_subx4, @function
test_subx4:
	subx4	a2, a2, a3
	ret.n
	.size	test_subx4, .-test_subx4
	.literal_position
	.align	4
	.global	test_subx8
	.type	test_subx8, @function
test_subx8:
	subx8	a2, a2, a2
	ret.n
	.size	test_subx8, .-test_subx8
	.ident	"GCC: (GNU) 10.2.0"

[cmdline: ./gcc/xgcc -B./gcc/ -Os -fno-inline-functions -mlongcalls -mtext-section-literals -Wextra -S test.c ]

@earlephilhower
Copy link
Owner

earlephilhower commented Dec 8, 2020

-edit- Got the labels backwards. d'oh!

I just applied all 3 of your patches and rebuilt the chain, then did testing using my audio library (MP3 decoder).

Patched (all 3) Gcc10.2 + newlib 4.0:
Orig Gcc10.2 + newlib 4.0:

Executable segment sizes:
IROM   : 413020          - code in flash         (default or ICACHE_FLASH_ATTR) 
IRAM   : 28437   / 32768 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...) 
DATA   : 1612  )         - initialized variables (global, static) in RAM/HEAP 
RODATA : 1420  ) / 81920 - constants             (global, static) in RAM/HEAP 
BSS    : 25888 )         - zeroed variables      (global, static) in RAM/HEAP 

Orig Gcc10.2 + newlib 4.0:
Patched (all 3) Gcc10.2 + newlib 4.0:

Executable segment sizes:
IROM   : 414060          - code in flash         (default or ICACHE_FLASH_ATTR) 
IRAM   : 28437   / 32768 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...) 
DATA   : 1612  )         - initialized variables (global, static) in RAM/HEAP 
RODATA : 1420  ) / 81920 - constants             (global, static) in RAM/HEAP 
BSS    : 25888 )         - zeroed variables      (global, static) in RAM/HEAP 

While it's not dramatic, it is a decrease and I'm not finding any problems. Than you for the patches! I'll apply them w/the newlib 4 PR.

@earlephilhower earlephilhower merged commit d753ad6 into earlephilhower:master Dec 8, 2020
@jjsuwa-sys3175
Copy link
Author

Thanks for merging @earlephilhower, i'll download that.

FYI: adding -free -fipa-pta to GCC options for the building target (Xtensa) saves a bit of IROM size: see arendst/Tasmota#9749 and arendst/Tasmota@e7cff92.


 # Environment variables for configure and building targets.  Only use $(call setenv,$@)
 ifeq ($(LTO),true)
-    CFFT := "-mlongcalls -flto -Wl,-flto -Os -g"
+    CFFT := "-mlongcalls -flto -Wl,-flto -Os -g -free -fipa-pta"
 else ifeq ($(LTO),false)
-    CFFT := "-mlongcalls -Os -g"
+    CFFT := "-mlongcalls -Os -g -free -fipa-pta"
 else
     $(error Need to specify LTO={true,false} on the command line)
 endif
-compiler.c.flags=-c {compiler.warning_flags} -std=gnu17 {build.stacksmash_flags} -Os -g -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -falign-functions=4 -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.waveform} {build.mmuflags} {build.non32xferflags}
+compiler.c.flags=-c {compiler.warning_flags} -std=gnu17 {build.stacksmash_flags} -Os -g -free -fipa-pta -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -falign-functions=4 -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.waveform} {build.mmuflags} {build.non32xferflags}
-compiler.cpp.flags=-c {compiler.warning_flags} {build.stacksmash_flags} -Os -g -mlongcalls -mtext-section-literals -fno-rtti -falign-functions=4 {build.stdcpp_level} -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.waveform} {build.mmuflags} {build.non32xferflags}
+compiler.cpp.flags=-c {compiler.warning_flags} {build.stacksmash_flags} -Os -g -free -fipa-pta -mlongcalls -mtext-section-literals -fno-rtti -falign-functions=4 {build.stdcpp_level} -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.waveform} {build.mmuflags} {build.non32xferflags}

@jjsuwa-sys3175 jjsuwa-sys3175 deleted the repair-TARGET_ADDX branch December 8, 2020 19:10
@igrr
Copy link

igrr commented Dec 8, 2020

@jcmvbkbc do you think this change should be applied upstream?

@jcmvbkbc
Copy link

jcmvbkbc commented Dec 9, 2020

@igrr I guess. I'm also curious what happened that the optimization that worked in 4.8 doesn't work any more, I'll have a look.

@jcmvbkbc
Copy link

Done.
I've fixed the original predicates and patterns instead of duplicating them.

@jjsuwa-sys3175
Copy link
Author

thx for upstreaming @jcmvbkbc.

ps. how do you like the other two; #20 and #21?

@jcmvbkbc
Copy link

@jjsuwa-sys3175 Let me take a look. Also feel free to ping/cc me on future xtensa port improvements.

jjsuwa-sys3175 added a commit to jjsuwa-sys3175/esp-quick-toolchain that referenced this pull request Dec 21, 2020
this PR supersedes earlephilhower#19, earlephilhower#20, earlephilhower#21 and earlephilhower#25, by backporting from upstream:

06ff8708f0b834cf1b35afa46113c6c973905cad "gcc: xtensa: fix PR target/98285"
64a54505ec8249178b9767d1420354f8eb55de50 "gcc: xtensa: rearrange DI mode constant loading"
40bf68bbe0bdba305fde4ab825a06c085ba486fc "gcc: xtensa: add optimizations for shift operations"
18e86fae2a14f78e70aae06afce6bb9853068bb1 "gcc: xtensa: implement bswapsi2, bswapdi2 and helpers"
jcs pushed a commit to jcs/esp-quick-toolchain that referenced this pull request Jun 19, 2021
this PR supersedes earlephilhower#19, earlephilhower#20, earlephilhower#21 and earlephilhower#25, by backporting from upstream:

06ff8708f0b834cf1b35afa46113c6c973905cad "gcc: xtensa: fix PR target/98285"
64a54505ec8249178b9767d1420354f8eb55de50 "gcc: xtensa: rearrange DI mode constant loading"
40bf68bbe0bdba305fde4ab825a06c085ba486fc "gcc: xtensa: add optimizations for shift operations"
18e86fae2a14f78e70aae06afce6bb9853068bb1 "gcc: xtensa: implement bswapsi2, bswapdi2 and helpers"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants