Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: both S and M in .section flags #19

Open
mcspr opened this issue Aug 18, 2021 · 28 comments
Open

Q: both S and M in .section flags #19

mcspr opened this issue Aug 18, 2021 · 28 comments

Comments

@mcspr
Copy link

mcspr commented Aug 18, 2021

Remembering not easily reproducible issue from back in April:
https://gitter.im/esp8266/Arduino?at=606c9a5c0147fb05c5de1d59

I noticed these flags after the section name string in the PSTR(...) macro:

#define PSTRN(s,n) (__extension__({static const char __pstr__[] __attribute__((__aligned__(n))) __attribute__((section( "\".irom0.pstr." __FILE__ "." __STRINGIZE(__LINE__) "." __STRINGIZE(__COUNTER__) "\", \"aSM\", @progbits, 1 #"))) = (s); &__pstr__[0];}))

Per https://sourceware.org/binutils/docs/as/Section.html#index-Section-Stack-3

For sections with both M and S, a string which is a suffix of a larger string is considered a duplicate. Thus "def" will be merged with "abcdef"; A reference to the first "def" will be changed to a reference to "abcdef"+3.

If such offset had happened on pstr... what does that look like for const char[] arrays used across the app? Is it an actually desired behaviour since the resulting pointer would (?) not be aligned?


Some extra info, sorry for blowing the text size :)

I still have the offending .bin & .elf generated back in april and still with gcc10.2 timestamped 201223, but not the rest of the build directory including the suggested .map file and neither the (possibly broken?) sources that generated it. What I meant it looked like:

4026fa70 <_mqttInitCommands()::__pstr__>:
4026fa70:   514d 5454 492e 464e 004f 0000               MQTT.INFO...

However, dumping the code with current toolchain & code:
(with only change being the namespace { ... } surrounding the function, and looking at objdump it inlined the function into the one it is called from)

40282678 <(anonymous namespace)::_mqttInitCommands()::__pstr__>:
40282678:   514d 5454 492e 464e 004f 0000               MQTT.INFO...

40282684 <(anonymous namespace)::_mqttInitCommands()::__pstr__>:
40282684:   514d 5454 522e 5345 5445 0000               MQTT.RESET..

And, there are some instances where the 'merging' seems to happen on strings like F("FACTORY.RESET") e.g. settingsInitCommands() here uses it, and terminal function contains F("RESET"):

4028187c <(anonymous namespace)::_settingsInitCommands()::__pstr__>:
4028187c:   4146 5443 524f 2e59                         FACTORY.

40281884 <(anonymous namespace)::_terminalInitCommands()::__pstr__>:
40281884:   4552 4553 0054 0000                         RESET...

asm code for the April build _mqttInitCommands() is slightly different as well, but I am not sure what to look at there for

Since then, I was not really been able to reproduce this issue (reliably, or even at all, and do not have some kind of small example that could showcase the issue), but I wonder if the 'merge' flag somehow messed with the pointers. Or it is something related to gcc / binutils / different order of .o files when linking / etc., and it was silently resolved due to updates.

@earlephilhower
Copy link
Owner

Hmmm...I'm not sure I get the nature of the problem, sorry.

The string coalescing will definitely generate non-32b aligned PSTRs, but all the PSTR code does support non-32b starting strings. I don't think this would affect const char []s in the code because those would be copied to RAM anyway as part of .rodata anyway (and PSTRs aren't in rodata section so can't be merged w/non PSTRs).

@mcspr
Copy link
Author

mcspr commented Aug 18, 2021

Yeah, sorry for too early of an issue as this is very likely not the case. Experimenting with suffix-merging some strings, even having a standalone one that would match the suffix, has everything working correctly and alignment also works itself out when strings are on a 4 byte boundary. And I think I missed the part that these are C strings with a '\0' as part of the blob.

While this is not technically a newlib problem... revisiting the broken .elf once more, it has some weird structure to the generated function right where the PSTR(...) happened to be. The missing piece does some arithmetic at the beginning which causes an exception where the address became 0x0f4e0000. The correctly working one simply loads 2 pointers to the lambda and the __pstr__, and also what I see with the current gcc10.3 toolchain.

void _mqttInitCommands() {
    terminalRegisterCommand(F("MQTT.RESET"), [](const terminal::CommandContext&) { ... });
    terminalRegisterCommand(F("MQTT.INFO"), [](const terminal::CommandContext&) { ... });
(gdb) disassemble 0x402083a4
Dump of assembler code for function _mqttInitCommands():
   0x402083a4 <+0>:     l32r    a3, 0x40208398
   0x402083a7 <+3>:     movi    a2, 0x7a7
   0x402083aa <+6>:     addi    a1, a1, -16
   0x402083ad <+9>:     slli    a2, a2, 17
   0x402083b0 <+12>:    s32i    a0, a1, 12
   0x402083b3 <+15>:    call0   0x40213488 <terminalRegisterCommand(__FlashStringHelper const*, void (*)(terminal::CommandContext const&))>
   0x402083b6 <+18>:    l32r    a3, 0x4020839c
   0x402083b9 <+21>:    l32r    a2, 0x402083a0
   0x402083bc <+24>:    call0   0x40213488 <terminalRegisterCommand(__FlashStringHelper const*, void (*)(terminal::CommandContext const&))>
   0x402083bf <+27>:    l32i    a0, a1, 12
   0x402083c2 <+30>:    addi    a1, a1, 16
   0x402083c5 <+33>:    ret.n
End of assembler dump.
(gdb) x (*0x4020839c)
0x40208380 <_FUN(terminal::CommandContext const&)>:     0x02f0c112
(gdb) p (const char*)(*0x402083a0)
$1 = 0x4026fa70 <_mqttInitCommands()::__pstr__> "MQTT.INFO"
(gdb) x (*0x40208398)
0x40209c70 <_FUN(terminal::CommandContext const&)>:     0x02f0c112

@mcspr
Copy link
Author

mcspr commented Aug 19, 2021

Also, I've noticed this patch at the gcc repo which kind of looks like the resulting code where l32r is supposed to be replaced by the movi + shift:
https://github.com/earlephilhower/esp-quick-toolchain/blob/master/patches/gcc10.2/gcc-xtensa-rearrange-DI-mode-constant-loading.patch

But, without a reproducible example, idk if I'd claim this a toolchain bug just yet.

@earlephilhower
Copy link
Owner

FWIW that patch also exists in the 10.3 toolchain. IIRC it was changes to reduce generated code size for some subset of constant values.

@mcspr
Copy link
Author

mcspr commented Aug 21, 2021

There's also esp8266/Arduino#8189 (comment) which mentioned semi-random issues, but the report got lost in the other discussion thread. There's a full build directory, too, which I grabbed and tried to compare certain pieces of code. Like, there is similar excvaddr=0x13840000 which seems to come from the similar instruction:

Reading symbols from ESP_Easy_mega_20210624_normal_300_ESP8266_4M1M.elf...
(gdb) disassemble _Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String
...
   0x4021d406 <+18>:    bgeu    a3, a2, 0x4021d40c <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+24>
   0x4021d409 <+21>:    j       0x4021d538 <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+324>
   0x4021d40c <+24>:    movi    a3, 0x4e1
   0x4021d40f <+27>:    slli    a3, a3, 18
   0x4021d412 <+30>:    addx4   a2, a2, a3
   0x4021d415 <+33>:    l32i.n  a2, a2, 0
   0x4021d417 <+35>:    jx      a2
...

But this piece does not look like this when I try building it with a clean directory:

(gdb) disassemble _Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String
...
   0x40216336 <+18>:    bgeu    a3, a2, 0x4021633c <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+24>
   0x40216339 <+21>:    j       0x40216468 <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+324>
   0x4021633c <+24>:    l32r    a3, 0x40216310
   0x4021633f <+27>:    addx4   a2, a2, a3
   0x40216342 <+30>:    l32i.n  a2, a2, 0
   0x40216344 <+32>:    jx      a2
...

(but idk how to continue inspecting, since 0x40216310 seems to point to some place inside of the function itself and it does not seem to do so in the calculation above with movi and resolves to some pre-linking address instead)

And the symptoms sound very similar:

cc @jjsuwa-sys3175 nonetheless, maybe the above makes some sense

@mcspr
Copy link
Author

mcspr commented Aug 22, 2021

Another interesting thing, which really points to the toolchain. Running the following script on Windows:
https://gist.github.com/mcspr/a433474aa40ab0f6433f9240455fd624#file-trying-py
That generates semi-random functions that accept pointers as arguments
(NOTICE file is created in the same directory as the script)

With xtensa-lx106-elf-g++ binary in PATH:

PS > py -3 .\trying.py
xtensa-lx106-elf-g++ (GCC) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


[7/1024] (766)

_Z11function171v:
1018:   movi    a4, 0x705
1019:   slli    a4, a4, 16

The same script running on a WSL Linux instance does nothing all of 1024 iterations.

edit: After failing once, script has an added --existing option to retry the build on the same test.cpp file. It can sometimes succeed, but most of the time it loses the same exact function pointer and replaces it with something random (...and it also has a different number for each compilation failure)
edit2: Script also changed to only generate a single function, and removed almost everything but the function definitions + call sites.

@jjsuwa-sys3175
Copy link

@mcspr thanks for quick test script

first, fetched from https://github.com/earlephilhower/esp-quick-toolchain, and locally-built on debootstraped x86_64 Debian Buster:

pi@debootstrap:/tmp/artifact$ python trying.py --cxx bin/xtensa-lx106-elf-g++
Using built-in specs.
COLLECT_GCC=bin/xtensa-lx106-elf-g++
COLLECT_LTO_WRAPPER=/tmp/artifact/libexec/gcc/xtensa-lx106-elf/10.3.0/lto-wrapper
Target: xtensa-lx106-elf
Configured with: ../configure --prefix=/tmp/artifact --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.0 (GCC) 

[1023/1024] (745))

ok, that seems no flaws.

next, downloaded 2 pre-built toolchains from https://github.com/esp8266/Arduino, "i686-mingw32" and "x86_64-mingw32":
(download locations:
https://github.com/earlephilhower/esp-quick-toolchain/releases/download/3.0.4-gcc10.3/i686-w64-mingw32.xtensa-lx106-elf-1757bed.210717.zip
https://github.com/earlephilhower/esp-quick-toolchain/releases/download/3.0.4-gcc10.3/x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210717.zip)

"i686-mingw32"

C:\Users\Administrator>python Z:\trying.py --cxx "C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe"
Using built-in specs.
COLLECT_GCC=C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe
COLLECT_LTO_WRAPPER=c:/program\ files\ (x86)/arduino/hardware/esp8266com/esp8266/tools/xtensa-lx106-elf/bin/../libexec/gcc/xtensa-lx106-elf/10.3.0/lto-wrapper.exe
Target: xtensa-lx106-elf
Configured with: /workdir/repo/gcc-gnu/configure --prefix=/workdir/xtensa-lx106-elf.win32 --build=x86_64-linux-gnu --host=i686-w64-mingw32 --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.0 (GCC)

[1023/1024] (461)

"i686-mingw32" seems no problem.

"x86_64-mingw32"

C:\Users\Administrator>python Z:\trying.py --cxx "C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe"
Using built-in specs.
COLLECT_GCC=C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe
COLLECT_LTO_WRAPPER=c:/program\ files\ (x86)/arduino/hardware/esp8266com/esp8266/tools/xtensa-lx106-elf/bin/../libexec/gcc/xtensa-lx106-elf/10.3.0/lto-wrapper.exe
Target: xtensa-lx106-elf
Configured with: /workdir/repo/gcc-gnu/configure --prefix=/workdir/xtensa-lx106-elf.win64 --build=x86_64-linux-gnu --host=x86_64-w64-mingw32 --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.0 (GCC)

[2/1024] (439)

_Z8functionv:
test.S:2066:    movi    a4, 0x43b
test.S:2068:    slli    a4, a4, 16

whoa!

what is the differences between?

@jjsuwa-sys3175
Copy link

we have the big question: "what's happening?"
ok, we do quick'n'dirty but effective probe insertion - "printf debugging" :)

place this as "gcc-xtensa-rearrange-DI-mode-constant-loading-debug-probe.patch" on /patches/gcc10.3/. and then rebuild the toolchain (especially for "x86_64-mingw32").

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index e56083e29..29bf38293 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1083,6 +1083,11 @@ xtensa_emit_move_sequence (rtx *operands, machine_mode mode)
 		{
 		  emit_move_insn (dst, GEN_INT (srcval >> shift));
 		  emit_insn (gen_ashlsi3_internal (dst, dst, GEN_INT (shift)));
+
+		  fprintf (stderr, "[xtensa_emit_move_sequence @ %s()] '" HOST_WIDE_INT_PRINT_DEC "[" HOST_WIDE_INT_PRINT_HEX "]' => '%d[0x%x] << %d'\n",
+			   current_function_name (),
+			   srcval, srcval, (int)(srcval >> shift), (int)(srcval >> shift), shift);
+
 		  return 1;
 		}
 	    }

[example]

pi@debootstrap:/tmp/artifact$ cat test.c
extern int foo(unsigned int x);

int test(void) {
    return foo(0x07FE0000UL);
}
pi@debootstrap:/tmp/artifact$ bin/xtensa-lx106-elf-gcc -Os -save-temps -mlongcalls -mtext-section-literals -falign-functions=4 -ffunction-sections -fdata-sections -Wall -Werror=return-type -free -fipa-pta -fno-exceptions -S test.c
[xtensa_emit_move_sequence @ test()] '134086656[0x7fe0000]' => '1023[0x3ff] << 17'
pi@debootstrap:/tmp/artifact$ cat test.s
        .file   "test.c"
        .text
        .section        .text.test,"ax",@progbits
        .literal_position
        .align  4
        .global test
        .type   test, @function
test:
        movi    a2, 0x3ff
        addi    sp, sp, -16
        slli    a2, a2, 17
        s32i.n  a0, sp, 12
        call0   foo
        l32i.n  a0, sp, 12
        addi    sp, sp, 16
        ret.n
        .size   test, .-test
        .ident  "GCC: (GNU) 10.3.0"

@mcspr
Copy link
Author

mcspr commented Aug 22, 2021

From the example above it's a definitive constant, but why does pointer becomes a constant thing? It feels like there's another layer broken, and the constants-loading patch just happened to detect it.

I have tried building locally, but no success so far with GCC=10.3 and a modified Makefile with gcc-gnu master build without patches. Running things inside of msys2 CLI with all pre-requisite tools installed, but it fails on patch stage even where there are zero patches related to the gcc. log.stage.patch has no errors :/ nvm, I actually missed some tools, it's building

@mcspr
Copy link
Author

mcspr commented Aug 26, 2021

it's building

It still fails the test above:

a433474aa40ab0f6433f9240455fd624> py -3 .\trying.py --cxx ".\x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210826\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe"
Using built-in specs.
COLLECT_GCC=.\x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210826\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe
COLLECT_LTO_WRAPPER=c:/users/maxim/documents/gist/a433474aa40ab0f6433f9240455fd624/x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210826/xtensa-lx106-elf/bin/../libexec/gcc/xtensa-lx106-elf/12.0.0/lto-wrapper.exe
Target: xtensa-lx106-elf
Configured with: /home/builder/dev/esp-quick-toolchain/repo/gcc-gnu/configure --prefix=/home/builder/dev/esp-quick-toolchain/xtensa-lx106-elf.win64 --build=x86_64-linux-gnu --host=x86_64-w64-mingw32 --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210825 (experimental) (GCC)

[1/1024] (287)

_Z8functionv:
test.S:1174:    movi    a3, 0x37b
test.S:1175:    slli    a3, a3, 17

I guess it's enough for an upstream gcc bug? Unless there are other things to test; build flags specific to mingw, mingw version, included 'blob' archives, etc.

btw, should the issue be moved to esp-quick-toolchain?


Running Debian 11 installation with mingw8, gcc10.2 and trying to build the current gcc master by hijacking the current config and replacing GCC=version with 05ace2946b4369b49026789d5a83635076b10422 (`master` at the time of writing).

Some sidenotes:

  • since this is just an experiment, running without most of the patches (mkdir patches/master) but the very basic ones (shortname and func-sect, will try more later. could they break though?). The mentioned constants patch is already included as of this point: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=64a54505ec8249178b9767d1420354f8eb55de50
  • make logging with parallel jobs sucks, but there's -Oline (or anything from that option) to the rescue. Output is not really sync'ed, still, so the main way to discover errors is to search for 'Waiting for unfinished jobs' in the log.stage.*
  • it seems touch .stage.%.% is needed during the development process, since the failing stage drags everything with it

Actually relevant to the build process:

  • Can't run libelf's configure script in WIN64 stage, failing on the very basic 'Checking C compiler' step. Very easily solved by unpacking libelf-0.8.13.tar.gz, running autoreconf -i and repacking it back
  • gdb fails to build in WIN64 stage, and needs this future patch https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=5f23a08201ed01570b34f5cff99a95fc7b9e2fdb, replacing the target file from gdb/gdbsupport/common-defs.h to gdb/common/common-defs.h
  • libstdc++-v3 fails to build due to upstream bug with fenv_t declaration. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100017#c20 patch should fix it, gist of it is -nostdinc++ added to the c++ build targets. But, not yet sure how to generalize it, and instead I just modified the generated src/c++17/Makefile while building the WIN64 stage
  • (also) not sure if multiple rebuilds of libstdc++-v3 are necessary for each stage? this would slightly speed up the process

@TD-er
Copy link

TD-er commented Aug 27, 2021

@mcspr thanks for pointing me here. Looks interesting to see if this indeed is a compiler/linker bug that's been bugging us for quite some time.

I was just wondering, since you started mentioning the flash string optimization, whether this is a bug that's only (mainly) manifests on Windows?
Is there something really different between building on Linux and Windows?
No idea if it is related at all, but I wonder why the builds made in Linux are always slightly (a few kB) larger than builds made in Windows.
Could it be that builds made in Linux do align blocks differently and by doing this maybe making it harder for these issues to reproduce them?

In another issue you also mentioned a thought about some code being recompiled, but the string which was first a flash string duplicate (or partial duplicate) then was no longer present. But the objects using such a (partial) duplicate flash string may still refer to it.
Have you been able to disprove this idea, or is it still a possible issue? When you suggested it, it sounded plausible.

@mcspr
Copy link
Author

mcspr commented Aug 27, 2021

In another issue you also mentioned a thought about some code being recompiled, but the string which was first a flash string duplicate (or partial duplicate) then was no longer present. But the objects using such a (partial) duplicate flash string may still refer to it.
Have you been able to disprove this idea, or is it still a possible issue? When you suggested it, it sounded plausible.

Just a speculation, first message proved to be incorrect and everything works as expected with the flash string(s) suffix merging.

With the gist above, compiler loses / replaces the real pointer with a different instruction via an optimization that is only supposed to happen on constants, which these pointers are really not.
(...and very likely, just like with the case of the flash string, pointed-at data will discarded as unused so that's why it's missing?)

ref. CONSTANT_P (src) & xtensa_tls_referenced_p(src) reading the gcc code:

Which sometimes seem to match the address that happens here.

edit: Or, it is missing some additional condition that will check whether the referenced thing can actually be optimized? Upon further observation, generated addresses under mingw build are fairly short:

[xtensa_emit_move_sequence @ function()] '511442944[0x1e7c0000]' => '1951[0x79f] << 18'

While on the linux build it's something like '0x7f4f316d9fc0', which obviously does not fit the operand for a very long time

@mcspr
Copy link
Author

mcspr commented Aug 28, 2021

re. re. reading the above, should the l32r+slli conversion also check for src's code (GET_CODE(X) == ...) and reject 'invalid' things?
For example, it should definitely work for const_int (example above where value is inline with the func call):

CONST_INT_P(src)

(https://code.woboq.org/gcc/gcc/rtl.h.html#770)
And also seems to result in the same thing when there is const int a = 12345; func(a);.

In case of PSTR(...) / function ptr, it's symbol_ref

SYMBOL_REF_P(src)

Which should probably be kept as-is?

@jjsuwa-sys3175
Copy link

jjsuwa-sys3175 commented Aug 28, 2021

should the l32r+slli conversion also check for src's code (GET_CODE(X) == ...) and reject 'invalid' things?

seems to be good workaround :)

patches/gcc10.3/gcc-xtensa-reject-symbol-refs-from-constant-synthesis.patch:

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index e56083e29..af265b5cc 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1074,7 +1074,8 @@ xtensa_emit_move_sequence (rtx *operands, machine_mode mode)
        {
          /* Try to emit MOVI + SLLI sequence, that is smaller
             than L32R + literal.  */
-         if (optimize_size && mode == SImode && register_operand (dst, mode))
+         if (optimize_size && mode == SImode && CONST_INT_P (src) 
+             && register_operand (dst, mode))
            {
              HOST_WIDE_INT srcval = INTVAL (src);
              int shift = ctz_hwi (srcval);

(edit: replaced !SYMBOL_REF_P with CONST_INT_P)

@mcspr
Copy link
Author

mcspr commented Aug 28, 2021

seems to be good workaround :)

patches/gcc10.3/gcc-xtensa-reject-symbol-refs-from-constant-synthesis.patch:

what a proper fix would be though?
also, based on the CONSTANT_P condition it matches label_ref, const, const_double and high, but I'm not sure all of those don't have the same problem as symbols? shound't the check be for just the CONST_INT_P(src)?

@jjsuwa-sys3175
Copy link

jjsuwa-sys3175 commented Aug 28, 2021

shound't the check be for just the CONST_INT_P(src)?

ah, that's more safer way:

-+         if (optimize_size && mode == SImode && !SYMBOL_REF_P (src) &&
-+             register_operand (dst, mode))
++         if (optimize_size && mode == SImode && CONST_INT_P (src)
++             && register_operand (dst, mode))

@mcspr
Copy link
Author

mcspr commented Aug 28, 2021

Also submitted at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102115

@jjsuwa-sys3175
Copy link

should we contact @jcmvbkbc, the Xtensa port's maintainer?

@mcspr
Copy link
Author

mcspr commented Aug 29, 2021

should we contact @jcmvbkbc, the Xtensa port's maintainer?

yeah, I wasn't sure what to do about that in the bugzilla interface. hope this ping is enough

@jcmvbkbc
Copy link

jcmvbkbc commented Sep 7, 2021

@mcspr @jjsuwa-sys3175 thanks for tagging me. I agree with the fix posted here and will commit it to the master and to the gcc-11 branch.

@jcmvbkbc
Copy link

jcmvbkbc commented Sep 7, 2021

I've also found that configuring gcc with --enable-checking=all would have caught it immediately.

@jjsuwa-sys3175
Copy link

jjsuwa-sys3175 commented Sep 7, 2021

hi @jcmvbkbc:
do you know the location suitable to discuss a topic of this kind (about non-configuration-specific GCC Xtensa port)?
your repository (https://github.com/jcmvbkbc/gcc-xtensa)? it seems to be no newer than 8.1.0 there...

@jcmvbkbc
Copy link

jcmvbkbc commented Sep 8, 2021

do you know the location suitable to discuss a topic of this kind (about non-configuration-specific GCC Xtensa port)?

@jjsuwa-sys3175 I don't. So far majority of issues were reported in a project-specific manner for the project that found them, mainly on project mailing lists. Perhaps reporting an issue on the gcc mailing list would work too, I can't recall anyone doing that for the xtensa target though. Regardless of how I'm notified of the issue I usually file a bugzilla entry for gcc or binutils. I understand that github issues looks simpler than gcc mailing list or gcc bugzilla.

your repository (https://github.com/jcmvbkbc/gcc-xtensa)? it seems to be no newer than 8.1.0 there...

This repo predates gcc switch to git so its history doesn't match bit-for-bit the current official gcc git tree. And github is not a part of my gcc workflow, so pushing code to the master there is not something that happens naturally. But that probably shouldn't inhibit reporting bugs there.

@mcspr
Copy link
Author

mcspr commented Sep 17, 2021

GCC issue is still pending, but... returning to the original question. Given the following example:

#ifdef NATIVE
#include <cstdint>
#include <cstdio>
#include <iterator>
#else
#include <Arduino.h>
#endif

void addresses(const char* const begin, const char* const end) {
    ::printf("%p:%p -> %.*s (%u)\n", begin, end, end - begin, begin, end - begin);
}

template <size_t Size>
void addresses(const char (&buf)[Size]) {
    addresses(std::begin(buf), std::end(buf));
}

void test() {
    addresses("");
    addresses(","); // broken
    addresses("999::");
    addresses("-5:doesntmatter");
}

#ifndef NATIVE
void setup() {
    Serial.begin(115200);
    delay(1000);
    ::printf("\n\n\n");
    test();
}

void loop() {
    delay(100);
}
#else
int main() {
    ::puts("\n\n\n");
    test();
}
#endif

Output is:

0x3ffe87fa:0x3ffe87fb ->  (1)
0x3ffe87fb:0x3ffe87e2 -> (␚␔ (4294967271)  <--- end < begin :/
0x3ffe87e2:0x3ffe87e8 -> 999:: (6)
0x3ffe87e8:0x3ffe87f8 -> -5:doesntmatter (16)

@jcmvbkbc
Copy link

Output is:

0x3ffe87fa:0x3ffe87fb ->  (1)
0x3ffe87fb:0x3ffe87e2 -> (␚␔ (4294967271)  <--- end < begin :/
0x3ffe87e2:0x3ffe87e8 -> 999:: (6)
0x3ffe87e8:0x3ffe87f8 -> -5:doesntmatter (16)

FTR, I cannot reproduce this with the mainline xtensa gcc.
@mcspr Can you compile it to the assembly (using -S in the g++ command line) and post the result here?

@mcspr
Copy link
Author

mcspr commented Oct 14, 2021

@mcspr Can you compile it to the assembly (using -S in the g++ command line) and post the result here?

Using current esp8266 arduino 3.0.2 toolchain based on gcc10.3, with platformio as builder for the example above
https://gist.github.com/mcspr/420ea92769f491485b9d73a9e319a366

main.cpp.S
(command c/p from the platformio output building main.cpp.o, replaced -c with -S and removed -g)

$ ~/.platformio/packages/toolchain-xtensa/bin/xtensa-lx106-elf-g++ -o main.cpp.S -S -fno-rtti -std=gnu++17 -fno-exceptions -Os -mlongcalls -mtext-section-literals -falign-functions=4 -U__STRICT_ANSI__ -D_GNU_SOURCE -ffunction-sections -fdata-sections -Wall -Werror=return-type -free -fipa-pta -DPLATFORMIO=50201 -DESP8266 -DARDUINO_ARCH_ESP8266 -DARDUINO_ESP8266_WEMOS_D1MINI -DF_CPU=80000000L -D__ets__ -DICACHE_FLASH -DARDUINO=10805 -DARDUINO_BOARD="PLATFORMIO_D1_MINI" -DFLASHMODE_DIO -DLWIP_OPEN_SRC -DNONOSDK22x_190703=1 -DTCP_MSS=536 -DLWIP_FEATURES=1 -DLWIP_IPV6=0 -DVTABLES_IN_FLASH -DMMU_IRAM_SIZE=0x8000 -DMMU_ICACHE_SIZE=0x8000 -Isrc -I/home/runner/.platformio/packages/framework-arduinoespressif8266/tools/sdk/include -I/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266 -I/home/runner/.platformio/packages/toolchain-xtensa/include -I/home/runner/.platformio/packages/framework-arduinoespressif8266/tools/sdk/lwip2/include -I/home/runner/.platformio/packages/framework-arduinoespressif8266/variants/d1_mini main.cpp

        .file   "main.cpp"
        .text
        .section        .rodata._Z9addressesPKcS0_.str1.1,"aMS",@progbits,1
.LC0:
        .string "%p:%p -> (%d)\n"
        .section        .text._Z9addressesPKcS0_,"ax",@progbits
        .literal_position
        .literal .LC1, .LC0
        .align  4
        .global _Z9addressesPKcS0_
        .type   _Z9addressesPKcS0_, @function
_Z9addressesPKcS0_:
        sub     a5, a3, a2
        mov.n   a4, a3
        mov.n   a3, a2
        l32r    a2, .LC1
        addi    sp, sp, -16
        s32i.n  a0, sp, 12
        call0   printf
        l32i.n  a0, sp, 12
        addi    sp, sp, 16
        ret.n
        .size   _Z9addressesPKcS0_, .-_Z9addressesPKcS0_
        .section        .rodata._Z4testv.str1.1,"aMS",@progbits,1
.LC2:
        .string ""
.LC5:
        .string ","
.LC8:
        .string "999::"
.LC11:
        .string "-5:doesntmatter"
        .section        .text._Z4testv,"ax",@progbits
        .literal_position
        .literal .LC3, .LC2+1
        .literal .LC4, .LC2
        .literal .LC6, .LC5+2
        .literal .LC7, .LC5
        .literal .LC9, .LC8+6
        .literal .LC10, .LC8
        .literal .LC12, .LC11+16
        .literal .LC13, .LC11
        .align  4
        .global _Z4testv
        .type   _Z4testv, @function
_Z4testv:
        l32r    a3, .LC3
        l32r    a2, .LC4
        addi    sp, sp, -16
        s32i.n  a0, sp, 12
        call0   _Z9addressesPKcS0_
        l32r    a3, .LC6
        l32r    a2, .LC7
        call0   _Z9addressesPKcS0_
        l32r    a3, .LC9
        l32r    a2, .LC10
        call0   _Z9addressesPKcS0_
        l32r    a3, .LC12
        l32r    a2, .LC13
        call0   _Z9addressesPKcS0_
        l32i.n  a0, sp, 12
        addi    sp, sp, 16
        ret.n
        .size   _Z4testv, .-_Z4testv
        .section        .rodata.setup.str1.1,"aMS",@progbits,1
.LC15:
        .string "\n\n\n"
        .section        .text.setup,"ax",@progbits
        .literal_position
        .literal .LC14, Serial
        .literal .LC16, .LC15
        .align  4
        .global setup
        .type   setup, @function
setup:
        movi.n  a7, 0
        l32r    a2, .LC14
        movi    a3, 0xe1
        addi    sp, sp, -16
        movi.n  a6, 1
        mov.n   a5, a7
        movi.n  a4, 0x1c
        slli    a3, a3, 9
        s32i.n  a0, sp, 12
        call0   _ZN14HardwareSerial5beginEm12SerialConfig10SerialModehb
        l32r    a2, .LC16
        call0   puts
        call0   _Z4testv
        l32i.n  a0, sp, 12
        addi    sp, sp, 16
        ret.n
        .size   setup, .-setup
        .section        .text.loop,"ax",@progbits
        .literal_position
        .align  4
        .global loop
        .type   loop, @function
loop:
        ret.n
        .size   loop, .-loop
        .weak   BUILTIN_LED
        .section        .rodata.BUILTIN_LED,"a"
        .align  4
        .type   BUILTIN_LED, @object
        .size   BUILTIN_LED, 4
BUILTIN_LED:
        .word   2
        .ident  "GCC: (GNU) 10.3.0"

@jcmvbkbc
Copy link

.LC5:
        .string ","
...
        .literal .LC6, .LC5+2
        .literal .LC7, .LC5
...
        l32r    a3, .LC6
        l32r    a2, .LC7
        call0   _Z9addressesPKcS0_

So the compiler output is good,

   0x40201067 <+23>:    l32r    a3, 0x40201048
   0x4020106a <+26>:    l32r    a2, 0x40201044
   0x4020106d <+29>:    call0   0x40201020 <addresses(char const*, char const*)>

but the final binary is not. Looks like link-time relaxation issue to me.
Does adding the option -Wl,--no-relax to the linker command line fix the issue?

@mcspr
Copy link
Author

mcspr commented Oct 14, 2021

but the final binary is not. Looks like link-time relaxation issue to me.
Does adding the option -Wl,--no-relax to the linker command line fix the issue?

It does

mcspr pushed a commit to esp8266/Arduino that referenced this issue Dec 2, 2021
Fixes a hard-to-track bug in GCC 10.x.
earlephilhower/newlib-xtensa#19
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102115

GCC 10.3 had an issue with addressing constant integer literals which would
result in crazy offsets being used and random crashes in production.
Update with an upstream GCC 11 bugfix by @jjsuwa-sys3175
earlephilhower/esp-quick-toolchain#31
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=dcb2873cd32b263643bfd9d1298b35d6cd028f0a
earlephilhower added a commit to earlephilhower/esp-quick-toolchain that referenced this issue Dec 18, 2021
As mentioned in the earlephilhower/newlib-xtensa#19 (comment)
Solves the issue with building on current Debian 11 by

    regenerating libelf ./configure, otherwise it would incorrectly detect the current compiler command and fail the checking whether the C compiler works... step
    adding binutils patch ignoring _FORTIFY_SOURCE
    (and per https://sourceware.org/git/?p=binutils-gdb.git;a=history;f=gdb/gdbsupport/common-defs.h, there are no further changes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants