Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang integrated assembler generates incorrect relocations for mips32 #65020

Open
haohaolee opened this issue Aug 27, 2023 · 21 comments · May be fixed by #83115
Open

clang integrated assembler generates incorrect relocations for mips32 #65020

haohaolee opened this issue Aug 27, 2023 · 21 comments · May be fixed by #83115

Comments

@haohaolee
Copy link

haohaolee commented Aug 27, 2023

When I build openssl with clang 16, the built binary openssl just crashes. Then I tried to narrow down to the issue, and it seems it is related to the integrated assembler of clang, because it generates different object from the one generated by gas, and finally the linker would complain that "can't find matching R_MIPS_LO16 relocation for R_MIPS_GOT16"

I have preprocessed the problematic assembly file, so it can be easily verified.
aes-mips.s.zip

Reproduced steps:

  1. clang -target mipsel-buildroot-linux-musl -fPIC -pthread -mabi=32 -mips2 -Wa,--noexecstack -Qunused-arguments -Wall -O3 -c -o aes-mips.o aes-mips.s
  2. llvm-readelf -r aes-mips.o
Relocation section '.rel.text' at offset 0x1be0 contains 12 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000340  00001405 R_MIPS_HI16            00000000   _gp_disp
00000344  00001406 R_MIPS_LO16            00000000   _gp_disp
00000760  00001405 R_MIPS_HI16            00000000   _gp_disp
00000764  00001406 R_MIPS_LO16            00000000   _gp_disp
00000b38  00001405 R_MIPS_HI16            00000000   _gp_disp
00000b3c  00001406 R_MIPS_LO16            00000000   _gp_disp
00000b80  00001405 R_MIPS_HI16            00000000   _gp_disp
00000b84  00001406 R_MIPS_LO16            00000000   _gp_disp
00000378  00001209 R_MIPS_GOT16           00000000   .rodata
00000798  00001209 R_MIPS_GOT16           00000000   .rodata
00000b50  00001209 R_MIPS_GOT16           00000000   .rodata
00000b98  00001209 R_MIPS_GOT16           00000000   .rodata

Relocation section '.rel.pdr' at offset 0x1c40 contains 7 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000000  00000102 R_MIPS_32              00000000   .text
00000020  00001302 R_MIPS_32              00000340   AES_encrypt
00000040  00000102 R_MIPS_32              00000000   .text
00000060  00001502 R_MIPS_32              00000760   AES_decrypt
00000080  00000102 R_MIPS_32              00000000   .text
000000a0  00001602 R_MIPS_32              00000b38   AES_set_encrypt_key
000000c0  00001702 R_MIPS_32              00000b80   AES_set_decrypt_key
  1. Note that there 4 R_MIPS_GOT16 without corresponding R_MIPS_LO16

If we tried the assembler from binutils.

  1. mipsel-buildroot-linux-musl-as -march mips2 -mabi 32 -call_nonpic -EL -KPIC --noexecstack -o gas-aes-mips.o aes-mips.s
  2. llvm-readelf -r gas-aes-mips.o
Relocation section '.rel.text' at offset 0x1c04 contains 16 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000340  00001105 R_MIPS_HI16            00000000   _gp_disp
00000344  00001106 R_MIPS_LO16            00000000   _gp_disp
00000378  00000a09 R_MIPS_GOT16           00000000   .rodata
0000037c  00000a06 R_MIPS_LO16            00000000   .rodata
00000740  00001105 R_MIPS_HI16            00000000   _gp_disp
00000744  00001106 R_MIPS_LO16            00000000   _gp_disp
00000778  00000a09 R_MIPS_GOT16           00000000   .rodata
0000077c  00000a06 R_MIPS_LO16            00000000   .rodata
00000b14  00001105 R_MIPS_HI16            00000000   _gp_disp
00000b18  00001106 R_MIPS_LO16            00000000   _gp_disp
00000b2c  00000a09 R_MIPS_GOT16           00000000   .rodata
00000b30  00000a06 R_MIPS_LO16            00000000   .rodata
00000b60  00001105 R_MIPS_HI16            00000000   _gp_disp
00000b64  00001106 R_MIPS_LO16            00000000   _gp_disp
00000b78  00000a09 R_MIPS_GOT16           00000000   .rodata
00000b7c  00000a06 R_MIPS_LO16            00000000   .rodata

Relocation section '.rel.pdr' at offset 0x1c84 contains 7 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000000  00000102 R_MIPS_32              00000000   .text
00000020  00001002 R_MIPS_32              00000340   AES_encrypt
00000040  00000102 R_MIPS_32              00000000   .text
00000060  00001202 R_MIPS_32              00000740   AES_decrypt
00000080  00000102 R_MIPS_32              00000000   .text
000000a0  00001302 R_MIPS_32              00000b14   AES_set_encrypt_key
000000c0  00001402 R_MIPS_32              00000b60   AES_set_decrypt_key
  1. For each rodata, there is a pair of LO16 and GOT16.
@haohaolee
Copy link
Author

Hi @MaskRay I am not sure if you happen to know how to figure out this or shed a light?

@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2023

@llvm/issue-subscribers-backend-mips

@brad0
Copy link
Contributor

brad0 commented Aug 29, 2023

@FlyGoat @wzssyqa Ping.

@brad0
Copy link
Contributor

brad0 commented Jan 17, 2024

cc @yingopq

@haohaolee
Copy link
Author

haohaolee commented Jan 19, 2024

FYI, I have spent some time on investigating this issue, and I can create a much simpler case to reproduce this too. Simply put, if the static local variable is defined before the code that references it, everything works fine; if the variable is defined after that, it breaks.

By the way, I am trying to fix it on my own, but I am a beginner to compiler(assembler), so I spent a lot of time studying the assembler code as well. Any suggestion is appreciated.

Best regards

@brad0
Copy link
Contributor

brad0 commented Jan 21, 2024

@wzssyqa Ping.

@yingopq
Copy link
Contributor

yingopq commented Jan 23, 2024

cc @yingopq

OK, I am researching this issue.

@haohaolee
Copy link
Author

cc @yingopq

OK, I am researching this issue.

Please let me know if you need more info

@yingopq
Copy link
Contributor

yingopq commented Jan 25, 2024

FYI, I have spent some time on investigating this issue, and I can create a much simpler case to reproduce this too. Simply put, if the static local variable is defined before the code that references it, everything works fine; if the variable is defined after that, it breaks.

By the way, I am trying to fix it on my own, but I am a beginner to compiler(assembler), so I spent a lot of time studying the assembler code as well. Any suggestion is appreciated.

Best regards

Could you support aes-mips.c and file above mentioned simple case? Thanks.

@haohaolee
Copy link
Author

@yingopq
There is no aes-mips.c, because this issue is only related to assembler. Here I have a way simpler example:

.text
.globl main
main:
.ent main
.frame $29, 32, $31
.set noreorder
.cpload $25
.set reorder

subu $29, $29, 32
sw $31, 28($29)
.cprestore 24
la $4, hello
jal printf

lw $31, 28($29)
addiu $29, $29, 32
jr $31
.end main

.rdata
hello: .asciz "Hello world\n"

The symbol hello is at the end of this file and will trigger this issue. If we move the definition of hello to the beginning of the file, everything is fine.

After digging into the code, I found that while parsing the assembly, all the symbols would be examined, and when hello is handled firstly in main function, if hello is defined already, the assembler can know it is a local symbol, and generate the correct instructions - a pair of R_MIPS_LO16 and R_MIPS_GOT16. If main cannot see the definition of hello, it will assume it is a global (external) symbol, and generate the incorrect instructions.

However, I am not familiar with LLVM, I have no idea how to fix up this after first round parsing, maybe a second parse?

@wzssyqa
Copy link
Contributor

wzssyqa commented Feb 4, 2024

It is not easy to "fix", but we can claim that this is not a bug ;)

In GAS's documents:

A local symbol is any symbol beginning with certain local label prefixes.
By default, the local label prefix is @samp{.L} for ELF systems or
@samp{L} for traditional a.out systems, but each target may have its own
set of local label prefixes.

So we can support .L scheme, and ask user to use the names with this style.

diff --git a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
index 3c673ae938fd..d22deb9a6587 100644
--- a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
+++ b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
@@ -2914,14 +2914,21 @@ bool MipsAsmParser::loadAndAddSymbolAddress(const MCExpr *SymExpr,
     }
 
     bool IsPtr64 = ABI.ArePtrs64bit();
+
+    if (StringRef(Res.getSymA()->getLoc().getPointer()).starts_with(".L"))
+           printf (".L symbol\n");
+
     bool IsLocalSym =
         Res.getSymA()->getSymbol().isInSection() ||
         Res.getSymA()->getSymbol().isTemporary() ||
+        StringRef(Res.getSymA()->getLoc().getPointer()).starts_with(".L") ||
         (Res.getSymA()->getSymbol().isELF() &&
          cast<MCSymbolELF>(Res.getSymA()->getSymbol()).getBinding() ==
              ELF::STB_LOCAL);
     bool UseXGOT = STI->hasFeature(Mips::FeatureXGOT) && !IsLocalSym;
 
+    printf ("%d\n", IsLocalSym);
+
     // The case where the result register is $25 is somewhat special. If the
     // symbol in the final relocation is external and not modified with a
     // constant then we must use R_MIPS_CALL16 instead of R_MIPS_GOT16

@wzssyqa
Copy link
Contributor

wzssyqa commented Feb 4, 2024

Ohh, currently $LABEL: is supported for O32.
So you can just use

$hello: .asciz "Hello world\n"

@haohaolee
Copy link
Author

Hi @wzssyqa thank you for your explanation and quick implementation for ".L", but I still have some questions:

  1. though GAS's doc says any symbol with PrivateGlobalPrefix or ".L" should be considered a local symbol, GAS's implementation did not follow its own doc (by considering a symbol without ".L" or PrivateGlobalPrefix a local one). Do I understand correctly?
  2. Since I found this issue when building openssl using LLVM for mips, do you suggest that I should create an issue for them to fix this in their assembly code? (They have a couple of assembly files to implement algorithms like AES or SHA)

@yingopq
Copy link
Contributor

yingopq commented Feb 5, 2024

I have a method and was implementing and testing. When mips asm parse instruction la, check whether .rdata section was accessed and parsed, if it was not, then while circle search the following statement, check if symbol( eg hello) was existed. If it was in .rdata section, IsLocalSym is true.

@haohaolee
Copy link
Author

I have a method and was implementing and testing. When mips asm parse instruction la, check whether .rdata section was accessed and parsed, if it was not, then while circle search the following statement, check if symbol( eg hello) was existed. If it was in .rdata section, IsLocalSym is true.

Sounds wonderful. Looking forward to it

@wzssyqa
Copy link
Contributor

wzssyqa commented Feb 5, 2024

Hi @wzssyqa thank you for your explanation and quick implementation for ".L", but I still have some questions:

  1. though GAS's doc says any symbol with PrivateGlobalPrefix or ".L" should be considered a local symbol, GAS's implementation did not follow its own doc (by considering a symbol without ".L" or PrivateGlobalPrefix a local one). Do I understand correctly?

Yes. We should fix it, and I find that the link refused to link these object.

  1. Since I found this issue when building openssl using LLVM for mips, do you suggest that I should create an issue for them to fix this in their assembly code? (They have a couple of assembly files to implement algorithms like AES or SHA)

Yes. For OpenSSL, we may need to put the .rodata before .text.
If you have interesting, you can feedback it to OpenSSL.

wzssyqa added a commit to wzssyqa/llvm-project that referenced this issue Feb 5, 2024
For asm code like:
xx:
	la	$2,$yy
$yy:
	nop

MIPS O32 use `$` as PrivateGlobalPrefix, while another
`$` is added for getOrCreateSymbol, thus:
  error: Undefined temporary symbol $$yy

We also set symbols that starts with `.L` as local for O32.

See: llvm#65020.
wzssyqa added a commit to wzssyqa/llvm-project that referenced this issue Feb 6, 2024
For asm code like:
xx:
	la	$2,$yy
$yy:
	nop

MIPS O32 use `$` as PrivateGlobalPrefix, while another
`$` is added for getOrCreateSymbol, thus:
  error: Undefined temporary symbol $$yy

We also set symbols that starts with `.L` as local for O32.

See: llvm#65020.
@haohaolee
Copy link
Author

Hi @wzssyqa
Thank you for the suggestion.
The current approach to fix OpenSSL building issue should be moving the problematic code block to the beginning of the file, but I guess OpenSSL forks may not be interested in this refactoring.
I am wondering if we can wait for this PR to be merged into clang, then we can ask OpenSSL upstream to adapt it (I suppose the change should be tiny at that time). What do you think?

MaskRay pushed a commit to wzssyqa/llvm-project that referenced this issue Feb 14, 2024
For asm code like:
xx:
	la	$2,$yy
$yy:
	nop

MIPS O32 use `$` as PrivateGlobalPrefix, while another
`$` is added for getOrCreateSymbol, thus:
  error: Undefined temporary symbol $$yy

We also set symbols that starts with `.L` as local for O32.

See: llvm#65020.
MaskRay added a commit that referenced this issue Feb 14, 2024
…la macro (#80644)

When parsing the `la` macro, we add a duplicate `$` prefix in
`getOrCreateSymbol`,
leading to `error: Undefined temporary symbol $$yy` for code like:

```
xx:
	la	$2,$yy
$yy:
	nop
```

Remove the duplicate prefix.

In addition, recognize `.L`-prefixed symbols as local for O32.

See: #65020.

---------

Co-authored-by: Fangrui Song <i@maskray.me>
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 15, 2024
…la macro (llvm#80644)

When parsing the `la` macro, we add a duplicate `$` prefix in
`getOrCreateSymbol`,
leading to `error: Undefined temporary symbol $$yy` for code like:

```
xx:
	la	$2,$yy
$yy:
	nop
```

Remove the duplicate prefix.

In addition, recognize `.L`-prefixed symbols as local for O32.

See: llvm#65020.

---------

Co-authored-by: Fangrui Song <i@maskray.me>
(cherry picked from commit c007fbb)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 16, 2024
…la macro (llvm#80644)

When parsing the `la` macro, we add a duplicate `$` prefix in
`getOrCreateSymbol`,
leading to `error: Undefined temporary symbol $$yy` for code like:

```
xx:
	la	$2,$yy
$yy:
	nop
```

Remove the duplicate prefix.

In addition, recognize `.L`-prefixed symbols as local for O32.

See: llvm#65020.

---------

Co-authored-by: Fangrui Song <i@maskray.me>
(cherry picked from commit c007fbb)
@wzssyqa
Copy link
Contributor

wzssyqa commented Feb 21, 2024

Hi @wzssyqa Thank you for the suggestion. The current approach to fix OpenSSL building issue should be moving the problematic code block to the beginning of the file, but I guess OpenSSL forks may not be interested in this refactoring. I am wondering if we can wait for this PR to be merged into clang, then we can ask OpenSSL upstream to adapt it (I suppose the change should be tiny at that time). What do you think?

It has been merged. If you have interesting, you can try to figure out a PR to OpenSSL.

@yingopq
Copy link
Contributor

yingopq commented Feb 27, 2024

I have a method and was implementing and testing. When mips asm parse instruction la, check whether .rdata section was accessed and parsed, if it was not, then while circle search the following statement, check if symbol( eg hello) was existed. If it was in .rdata section, IsLocalSym is true.

test result:

$ sudo ./build/bin/llvm-readelf -r 2.o

Relocation section '.rel.text' at offset 0x140 contains 6 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000000  00000405 R_MIPS_HI16            00000000   _gp_disp
00000004  00000406 R_MIPS_LO16            00000000   _gp_disp
00000018  00000209 R_MIPS_GOT16           00000000   .rodata
0000001c  00000206 R_MIPS_LO16            00000000   .rodata
00000020  0000050b R_MIPS_CALL16          00000000   printf
00000024  00000525 R_MIPS_JALR            00000000   printf

Relocation section '.rel.pdr' at offset 0x170 contains 1 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000000  00000302 R_MIPS_32              00000000   main

patch

diff --git a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
index 36aab383da68..f9f6704230f5 100644
--- a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
+++ b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
@@ -149,6 +149,7 @@ class MipsAsmParser : public MCTargetAsmParser {
                        // directive.
   bool IsLittleEndian;
   bool IsPicEnabled;
+  bool HasParseRdata;
   bool IsCpRestoreSet;
   int CpRestoreOffset;
   unsigned GPReg;
@@ -555,6 +556,7 @@ public:
     IsPicEnabled = getContext().getObjectFileInfo()->isPositionIndependent();
 
     IsCpRestoreSet = false;
+    HasParseRdata = false;
     CpRestoreOffset = -1;
     GPReg = ABI.GetGlobalPtr();
 
@@ -2920,11 +2922,40 @@ bool MipsAsmParser::loadAndAddSymbolAddress(const MCExpr *SymExpr,
         (Res.getSymA()->getSymbol().isELF() &&
          cast<MCSymbolELF>(Res.getSymA()->getSymbol()).getBinding() ==
              ELF::STB_LOCAL);
+
     // For O32, "$"-prefixed symbols are recognized as temporary while
     // .L-prefixed symbols are not (PrivateGlobalPrefix is "$"). Recognize ".L"
     // manually.
     if (ABI.IsO32() && Res.getSymA()->getSymbol().getName().starts_with(".L"))
       IsLocalSym = true;
+    else {
+      if (HasParseRdata == false) {
+        StringRef CurrentASMContent = StringRef(IDLoc.getPointer());
+
+        //Get local symbol name LocalSymbol from "la $number, localsymbolname\n ... "
+        size_t NewlineIndex = CurrentASMContent.find_first_of('\n');
+        size_t CommaIndex = CurrentASMContent.find_first_of(',');
+        size_t SymbolLength = NewlineIndex - CommaIndex - 2;
+        StringRef LocalSymbol =  CurrentASMContent.take_front(NewlineIndex).take_back(SymbolLength);
+
+        //Get and check if ".rdata" section exist.
+        size_t RdataIndex = CurrentASMContent.find( ".rdata");
+        if (RdataIndex != StringRef::npos) {
+          StringRef Rdata = CurrentASMContent.substr(RdataIndex);
+
+         //Check if rdata section contain local symbol.
+         if (1 == Rdata.contains(LocalSymbol)) {
+           //Check if "LocalSymbol:" exist.
+           size_t A = Rdata.find(LocalSymbol);
+            size_t B = Rdata.find(':', A);
+           if (B - A == LocalSymbol.size()) {
+              IsLocalSym = true;
+              LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Has definition of local symbol "  << LocalSymbol << " after 'la' instruction" << "\n");
+           }
+         }
+        }
+      }
+    }
     bool UseXGOT = STI->hasFeature(Mips::FeatureXGOT) && !IsLocalSym;
 
     // The case where the result register is $25 is somewhat special. If the
@@ -8073,6 +8104,7 @@ bool MipsAsmParser::parseDirectiveTpRelWord() {
   if (getLexer().isNot(AsmToken::EndOfStatement))
     return Error(getLexer().getLoc(),
                 "unexpected token, expected end of statement");
+  HasParseRdata = true;
   Parser.Lex(); // Eat EndOfStatement token.
   return false;
 }

@yingopq
Copy link
Contributor

yingopq commented Feb 27, 2024

@wzssyqa Could you help review this patch? Thanks!

yingopq added a commit to yingopq/llvm-project that referenced this issue Feb 27, 2024
… for mips32

When mips asm parse instruction la, check whether .rdata section
was accessed and parsed, if it was not, then check the following
statement, check if local symbol(eg "hello:") was in .rdata
section, if it was in it, IsLocalSym is true.

Fix llvm#65020
yingopq added a commit to yingopq/llvm-project that referenced this issue Apr 3, 2024
…or mips32

Modify asm scan starting point to ensure scaning .rdata section before
.text section.
Save begin location, check if have .rdata section, confirm where to
begin parse. If have it and .rdata after .text, begin parse from .rdata,
when go to Eof, jump to begin location and then continue parse. If did
not have .rdata or .rdata is before .text, jump to begin location and
then parse.

Fix llvm#65020
@haohaolee
Copy link
Author

@yingopq There is no aes-mips.c, because this issue is only related to assembler. Here I have a way simpler example:

.text
.globl main
main:
.ent main
.frame $29, 32, $31
.set noreorder
.cpload $25
.set reorder

subu $29, $29, 32
sw $31, 28($29)
.cprestore 24
la $4, hello
jal printf

lw $31, 28($29)
addiu $29, $29, 32
jr $31
.end main

.rdata
hello: .asciz "Hello world\n"

The symbol hello is at the end of this file and will trigger this issue. If we move the definition of hello to the beginning of the file, everything is fine.

After digging into the code, I found that while parsing the assembly, all the symbols would be examined, and when hello is handled firstly in main function, if hello is defined already, the assembler can know it is a local symbol, and generate the correct instructions - a pair of R_MIPS_LO16 and R_MIPS_GOT16. If main cannot see the definition of hello, it will assume it is a global (external) symbol, and generate the incorrect instructions.

However, I am not familiar with LLVM, I have no idea how to fix up this after first round parsing, maybe a second parse?

Hi @MaskRay , this is the simple demo of this issue. The current progress for this issue is that now we can use ".L" to denote local symbols, but gas can even work without using ".L". I suppose the latest PR is trying to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants