Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Never resolved function from blockaddress" linking xfs.ko #1215

Open
arndb opened this issue Dec 17, 2020 · 23 comments
Open

"Never resolved function from blockaddress" linking xfs.ko #1215

arndb opened this issue Dec 17, 2020 · 23 comments
Assignees
Labels
[ARCH] arm64 This bug impacts ARCH=arm64 [BUG] llvm A bug that should be fixed in upstream LLVM [FEATURE] LTO Related to building the kernel with LLVM Link Time Optimization [FIXED][LLVM] 15 This bug was fixed in LLVM 15.x Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. [TOOL] lld The issue is relevant to LLD linker

Comments

@arndb
Copy link

arndb commented Dec 17, 2020

Building randconfig arm64 kernels occasionally results in this error with LTO:

$ ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o xfs.lto.o --whole-archive xfs.o
ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM11.0.1' Reader: 'LLVM 11.0.1')

It turns out that this only happens in configurations that include xfs, but it also does happen for xfs as a loadable module, which makes it much easier to reproduce.

The attached archive contains a copy of the object files that go into an affected xfs.ko kernel module, with a oneline script for reproducing the problem.

xfs-lld-never-resolved-function.tar.gz
https://lore.kernel.org/lkml/CAK8P3a1Xfpt7QLkvxjtXKcgzcWkS8g9bmxD687+rqjTafTzKrg@mail.gmail.com/

@nickdesaulniers nickdesaulniers added [ARCH] arm64 This bug impacts ARCH=arm64 [FEATURE] LTO Related to building the kernel with LLVM Link Time Optimization [BUG] llvm A bug that should be fixed in upstream LLVM [TOOL] lld The issue is relevant to LLD linker labels Dec 17, 2020
@nickdesaulniers
Copy link
Member

cc @samitolvanen

@arndb
Copy link
Author

arndb commented Dec 17, 2020

This appears to fix all instances I found. I have no idea what is special about these two files, but I found this list experimentally. The other files in libxfs are not affected, but disabling LTO in only one of the two files is not sufficient.

I also found this only happens with CONFIG_DYNAMIC_FTRACE.

commit 18ed03a2163b8668553b9f63255a1697326b68ef
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Fri Dec 18 00:15:29 2020 +0100

    xfs: work around clang lto issue
    
    Something causes the link process to fail:
    
    ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM11.0.1' Reader: 'LLVM 11.0.1')
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 04611a1068b4..df03efdb731e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -12,6 +12,9 @@ obj-$(CONFIG_XFS_FS)		+= xfs.o
 # this one should be compiled first, as the tracing macros can easily blow up
 xfs-y				+= xfs_trace.o
 
+CFLAGS_REMOVE_libxfs/xfs_bmap.o		+= $(CC_FLAGS_LTO)
+CFLAGS_REMOVE_libxfs/xfs_inode_fork.o	+= $(CC_FLAGS_LTO)
+
 # build the libxfs code first
 xfs-y				+= $(addprefix libxfs/, \
 				   xfs_ag.o \

@bwendling
Copy link

bwendling commented Jan 10, 2021

The xfs_bmap.o module is mangled. If you look at the disassembly, you have this function:

define internal fastcc void @trace_xfs_read_extent(...
 ...
29:                                               ; preds = %20
  %30 = inttoptr i64 %27 to %struct.tracepoint_func*
  %31 = getelementptr inbounds %struct.tracepoint_func, %struct.tracepoint_func* %30, i64 0, i32 1
  %32 = load i8*, i8** %31, align 8
  %33 = call i32 @__traceiter_xfs_read_extent(i8* %32, %struct.xfs_inode* %0, %struct.xfs_iext_cursor* %1, i32 %2, i64 ptrtoint (i8* blockaddress(@xfs_iread_bmbt_block, %73) to i64)) #11
  br label %34

Note blockaddress(@xfs_iread_bmbt_block, %73) is referencing a block in another function.

@nickdesaulniers
Copy link
Member

Note blockaddress(@xfs_iread_bmbt_block, %73) is referencing a block in another function.

This is one thing I've always disliked (or perhaps simply misunderstood) about LLVM's blockaddress Constant. In C, labels are scoped to functions, but in LLVM IR, they're not necessarily (the two operands are the Function and BasicBlock). I haven't yet seen such a case where a Function contains a blockaddress whose Function operand was a different Function, but perhaps this is a case I should sit down and look at further.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Apr 28, 2021

I wasn't able to reproduce with the following configs:

$ grep -e DYNAMIC_FTRACE -e FUNCTION_TRACER -e LTO_CLANG_THIN -e XFS_FS -n .config
787:CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
791:CONFIG_LTO_CLANG_THIN=y
8069:CONFIG_XFS_FS=y
8197:# CONFIG_VXFS_FS is not set
8919:CONFIG_HAVE_FUNCTION_TRACER=y
8921:CONFIG_HAVE_DYNAMIC_FTRACE=y
8922:CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
8935:CONFIG_FUNCTION_TRACER=y
8937:CONFIG_DYNAMIC_FTRACE=y
8938:CONFIG_DYNAMIC_FTRACE_WITH_REGS=y

@arndb was this with thin LTO or full lto? Can you post the config if it's still reproducible?

Non-reproducible with full LTO either, though I did get:

incomplete ORC unwind tables in file: vmlinux
Failed to sort kernel tables

@samitolvanen
Copy link
Member

I can reproduce this with full LTO at least:

$ grep -E '(LTO_|XFS)' kernel-build/.config
CONFIG_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_HAS_LTO_CLANG=y
# CONFIG_LTO_NONE is not set
CONFIG_LTO_CLANG_FULL=y
# CONFIG_LTO_CLANG_THIN is not set
CONFIG_XFS_FS=m
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_VXFS_FS=m

@nickdesaulniers
Copy link
Member

I was able to repro this just now building ARCH=i386 allmodconfig built with full LTO while testing this. I had to disable FTRACE, COMPILE_TEST, and GCOV.

ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM13.0.0git' Reader: 'LLVM 13.0.0git')
  LTO [M] lib/test_xarray.lto.o
make[1]: *** [scripts/Makefile.modpost:134: fs/xfs/xfs.lto.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1882: modules] Error 2

@nickdesaulniers
Copy link
Member

I gave this a shot today, but wasn't able to repro with ToT LLVM (~clang-13) for aarch64+full LTO (or thin LTO) + XFS_FS. Strange.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Jun 11, 2021

I'm able to repro on i386 on next (see #1395 for config). (I see now that Arnd mentioned CONFIG_DYNAMIC_FTRACE; I should re-test with that).

@nickdesaulniers nickdesaulniers removed the asm goto related to the implementation of asm goto label Jun 11, 2021
@nickdesaulniers
Copy link
Member

on i386, it's xfs_iformat_extents referenced from trace_xfs_read_extent. (for arm64, it was the same fn referenced from)

define internal fastcc void @trace_xfs_read_extent(
...
%call43 = call i32 @__SCT__tp_func_xfs_read_extent(i8* inreg %7, %struct.xfs_inode* inreg %ip, %struct.xfs_iext_cursor* inreg %cur, i32 %state, i32 ptrtoint (i8* blockaddress(@xfs_iformat_extents, %for.inc) to i32)) #8

(so it's a blockaddress Constant, but not related to asm goto). This comes about after IPSCCP.

@xfs_iformat_extents's definition does appear in fs/xfs/libxfs/xfs_inode_fork.o.ll though, so I don't think the front half of the LTO compilation is necessarily making any mistakes (unless there's some kind of requirement for full LTO that blockaddresses's function operand always match the function that the blockaddress's user instruction is scoped to).

I extended LLVM to scan the functions defined in the module when the "Never resolved function from blockaddress" error occurs, and xfs_iformat_extents does exist...but only a declaration, not a definition.

I suspect there's a pass running during LTO which deletes what looks like dead functions without checking whether the function's address is taken via Function::hasAddressTaken.

I think we can rerun the lld invocation with -mllvm -print-after-all (and with -r removed) to see what pass is messing things up.

$ ld.lld -m elf_i386 -mllvm -import-instr-limit=5 -o fs/xfs/xfs.lto.o  --whole-archive fs/xfs/xfs.o -mllvm -print-after-all 2> log.txt

but grepping that for xfs_iformat_extents has no hits...was xfs_iformat_extents not packaged in fs/xfs/xfs.lto.o properly? But:

$ llvm-nm fs/xfs/xfs.o
...
libxfs/xfs_inode_fork.o:
...
-------- t xfs_iformat_extents

Super weird that xfs_iext_state_to_fork is defined in fs/xfs/libxfs/xfs_inode_fork.o.ll as well and does show up in

ld.lld -m elf_i386 -mllvm -import-instr-limit=5 -o fs/xfs/xfs.lto.o  --whole-archive fs/xfs/xfs.o -mllvm -print-after-all 2>&1 |grep xfs_iext_state_to_fork
...
define internal %struct.xfs_ifork* @xfs_iext_state_to_fork(%struct.xfs_inode* inreg %0, i32 inreg %1) #0 align 64 {
...
define internal %struct.xfs_ifork* @xfs_iext_state_to_fork(%struct.xfs_inode* inreg %0, i32 inreg %1) #0 align 64 {
...

and yet xfs_iformat_extents does not:

ld.lld -m elf_i386 -mllvm -import-instr-limit=5 -o fs/xfs/xfs.lto.o  --whole-archive fs/xfs/xfs.o -mllvm -print-after-all 2>&1 |grep xfs_iformat_extents

@nickdesaulniers
Copy link
Member

I suspect there's a pass running during LTO which deletes what looks like dead functions without checking whether the function's address is taken via Function::hasAddressTaken.
was xfs_iformat_extents not packaged in fs/xfs/xfs.lto.o properly?

I'm reading through how LLVM performs LTO; I don't have the smoking gun yet, but it seems it does "summary-based DCE" where summary is referring to either ModuleSummaryIndex or FunctionSummary. I'm suspicious of one of those being wrong, at the moment. A murder mystery to solve (at least) next week.

@nickdesaulniers
Copy link
Member

also, it looks like this is possible related to __this_address from fs/xfs/xfs_linux.h or _THIS_IP_. (reminiscent of #263 )

diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 1d174909f9bd..198aee12cb3a 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -145,7 +145,7 @@ xfs_iformat_extents(
                        }
 
                        xfs_iext_insert(ip, &icur, &new, state);
-                       trace_xfs_read_extent(ip, &icur, state, _THIS_IP_);
+                       /*trace_xfs_read_extent(ip, &icur, state, _THIS_IP_);*/
                        xfs_iext_next(ifp, &icur);
                }
        }

builds. It looks like IPSCCP is sinking _THIS_IP_ into trace_xfs_read_extent. The only explicit callers of xfs_iformat_extents (xfs_iformat_data_fork and xfs_iformat_attr_fork) both have hidden visibility, so I'd bet they're not being imported for LTO; but trace_xfs_read_extent which has has the blockaddress reference to xfs_iformat_extents has internel linkage. I'm guessing there's a call chain in which xfs_iformat_extents is called from a function externally visible, such that xfs_iformat_extents is imported for LTO but not trace_xfs_read_extent (resulting in the observed error). Now just to prove/disprove that hypothesis.

@twd2
Copy link
Member

twd2 commented Jul 20, 2021

Hi, I'm trying to enable LTO for RISC-V, and can reproduce this issue.

I used allyesconfig with COMPILE_TEST, FTRACE, KASAN, and GCOV disabled, and:

$ grep -E '(LTO_|XFS)' .config             
CONFIG_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_HAS_LTO_CLANG=y
# CONFIG_LTO_NONE is not set
CONFIG_LTO_CLANG_FULL=y
# CONFIG_LTO_CLANG_THIN is not set
CONFIG_XFS_FS=y
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_VXFS_FS=y

or this:

$ grep -E '(LTO_|XFS)' .config       
CONFIG_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_HAS_LTO_CLANG=y
# CONFIG_LTO_NONE is not set
CONFIG_LTO_CLANG_FULL=y
# CONFIG_LTO_CLANG_THIN is not set
CONFIG_XFS_FS=m
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_VXFS_FS=y

Command used: ARCH=riscv LLVM=1 LLVM_IAS=1 make HOSTCC=gcc vmlinux -j16
Kernel version: torvalds@2734d6c1b1a0
LLVM version: llvm/llvm-project@1b61d837b9d0
config-xfs.txt

@emojifreak
Copy link

amd64 in Linux 5.15.5 also produces this error by the script in #1516
With linux 5.15.5 allmodconfig, I only see this error with amd64 and arm64.

@nathanchance
Copy link
Member

I reduced down fs/overlayfs/file.c (as initially reported in #1516):

$ cat file.i
struct fd {
  int file;
} kmem_cache_alloc(), ovl_write_iter_real;
struct {
  int dep_map;
} * percpu_rwsem_release_sem;
struct {
  void *ki_complete;
} * is_sync_kiocb_kiocb;
void *ovl_write_iter___trans_tmp_4;
_Bool ovl_write_iter___trans_tmp_3;
void lock_release(int *, long);
static void percpu_rwsem_release(long ip) {
  lock_release(&percpu_rwsem_release_sem->dep_map, ip);
}
int file_remove_privs();
long ovl_write_iter() {
  long ret = file_remove_privs();
  if (ret)
    goto out_unlock;
  ret = (&ovl_write_iter_real)->file;
  if (ret)
    goto out_unlock;
  ovl_write_iter___trans_tmp_3 = is_sync_kiocb_kiocb->ki_complete == 0;
  if (ovl_write_iter___trans_tmp_3) {
    kmem_cache_alloc();
    if (ovl_write_iter___trans_tmp_4)
      goto out;
    percpu_rwsem_release(({
      __here:
        (long)&&__here;
    }));
  }
out:
out_unlock:
  return ret;
}

$ clang -O2 -flto -fsanitize=object-size -Wall -Wextra -c -o file.o file.i

$ ld.lld -m elf_x86_64 -r -o /dev/null --whole-archive file.o
ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM14.0.0git' Reader: 'LLVM 14.0.0git')

It does not reproduce without -fsanitize=object-size, which is why everyone is seeing it with allmodconfig.

@rickyz
Copy link

rickyz commented Dec 1, 2021

I took a look at this bug for fun. Here's a hand-minimized repro:

void f(long);

__attribute__((noinline)) static void fun(long x) {
  f(x + 1);
}

void repro(void) {
  fun(({
    label:
      (long)&&label;
  }));
}

$ clang -O2 -flto -c repro.c -o repro.i
$ ld.lld repro.i
ld.lld: error: Never resolved function from blockaddress (...)

Relevant part of the LLVM IR, generated with:
$ clang -O2 -flto -c repro.c -fno-discard-value-names -S

...
define dso_local void @repro() #0 {
entry:
  br label %label

label:                                            ; preds = %entry
  tail call fastcc void @fun()
  ret void
}

define internal fastcc void @fun() unnamed_addr #1 {
entry:
  tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
  ret void
}
...

We can see clang figured out that the first argument to fun() is always the same, so it inlined the value of the argument (which is the address of label in repro).

Searching LLVM code for the source of the error message gives
https://github.com/llvm/llvm-project/blob/4b55329/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L853
(there is another error with the same string, but building lld with a change to the error message shows that this is the one that triggers). Reading through this code, we see that BitcodeReader lazily reads an LLVM bitcode file. The reader appears to initially return placeholder objects that need to be "materialized" on demand.

The code has some special handling of blockaddress IR constants, as a blockaddress that appears within a function F may reference a basic block from a different function G (as is the case in the IR for the bug repro).

When materializing a function with a blockaddress(Fn, Fn_BB), the code needs to separately handle the cases where Fn either has or has not yet been materialized. That logic appears here:
https://github.com/llvm/llvm-project/blob/4b55329/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L2901-L2927

If Fn has been materialized (the Fn->empty() check), then the reader creates a BlockAddress referring to the appropriate Function/BasicBlock object. If Fn has not been materialized, then the reader creates a placeholder BasicBlock for the BlockAddress and adds Fn to a queue of functions to be materialized. When Fn is materialized, it will make sure to use the precreated BasicBlock object.

After adding some debug prints, it appears that lld's use of BitcodeReader breaks this lazy blockaddress handling. In the repro case, the linker does materialize repro before it attempts to materialize fun. However, before it attempts to materialize fun, it steals the basic blocks out of repro's Function object here:
https://github.com/llvm/llvm-project/blob/4b55329/llvm/lib/Linker/IRMover.cpp#L1116-L1117

This causes the Fn->empty() check mentioned above to pass when materializing repro, which makes the code think that Fn has not yet been materialized. That code path enqueues repro to be materialized. When that happens, this code:
https://github.com/llvm/llvm-project/blob/4b55329/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L162-L163
detects (using IsMaterializable() instead of empty()) that repro has already been materialized, and errors out.

This looks like a fairly old bug in lld. Fixing it looks tricky - it seems that BitcodeReader requires that materialized Functions aren't modified while there are still functions that will be materialized later, and lld violates that assumption. Maintaining this invariant seems like it might conflict with the goal of lazy reading. Maybe one idea is to maintain some reverse dependency information in bitcode files. Then materializing a function F could trigger immediate materialization of all functions with a blockaddress pointing into F.

@FCLC
Copy link

FCLC commented Jan 21, 2022

It seems that there is progress on this issue from the LLVM side. Great to see!

In the meantime for those experimenting with CLANG LTO kernels in dev/profiling environments, it's worth mentioning that disabling the XFS file system will allow you to build as normal as a temporary workaround, assuming you don't need XFS.

I was able to build against 5.16.2 with 13.0.1-+rc1-1~exp4 and additional KFLAGS: -O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -std=gnu99

Next when time permits I will be attempting to build with clang 14 followed by the ICX compiler, also LLVM based. This may help us track down these issues.

@nickdesaulniers nickdesaulniers added the Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. label Feb 16, 2022
@nickdesaulniers
Copy link
Member

@nickdesaulniers nickdesaulniers self-assigned this Mar 1, 2022
@nickdesaulniers nickdesaulniers added the [PATCH] Submitted A patch has been submitted for review label Mar 1, 2022
@nickdesaulniers nickdesaulniers added [FIXED][LLVM] 15 This bug was fixed in LLVM 15.x and removed [PATCH] Submitted A patch has been submitted for review labels Apr 12, 2022
@nickdesaulniers
Copy link
Member

nickdesaulniers commented Apr 12, 2022

This is fixed now in clang-15. I've requested a backport to clang-14.0.1 (we'll see if that request is accepted or not).

I think we still might want to disable XFS for LTO (or rather disable LTO on XFS) at least for older llvm versions? Maybe @arndb 's patch with a version check added, once we know whether 14.0.1 will also get the fix?

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Apr 12, 2022
IRLinker builds a work list of functions to materialize, then moves them
from a source module to a destination module one at a time.

This is a problem for blockaddress Constants, since they need not refer
to the function they are used in; IPSCCP is quite good at sinking these
constants deep into other functions when passed as arguments.

This would lead to curious errors during LTO:
  ld.lld: error: Never resolved function from blockaddress ...
based on the ordering of function definitions in IR.

The problem was that IRLinker would basically do:

  for function f in worklist:
    materialize f
    splice f from source module to destination module

in one pass, with Functions being lazily added to the running worklist.
This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.
This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs
list incorrectly, as it will never re-materialize an already
materialized (but spliced out) function.

Because of the possibility that blockaddress Constants may appear in
Functions other than the ones they reference, this patch adds a new
bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of
Functions that contain BlockAddress Constants that refer back to this
Function, rather then the Function they are scoped in. We then
materialize those functions when materializing `f` from the example loop
above. This might over-materialize Functions should the user of
BitcodeReader ultimately decide not to link those Functions, but we can
at least now we can avoid this ordering related issue with blockaddresses.

Fixes: llvm#52787
Fixes: ClangBuiltLinux/linux#1215

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D120781

(cherry picked from commit 23ec578)
@nathanchance
Copy link
Member

I would rather see something along the lines of:

diff --git a/arch/Kconfig b/arch/Kconfig
index 3e35c5523fe4..8d4a92b71516 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -690,6 +690,7 @@ config LTO_NONE
 config LTO_CLANG_FULL
        bool "Clang Full LTO (EXPERIMENTAL)"
        depends on HAS_LTO_CLANG
+       depends on CLANG_VERSION >= 140001
        depends on !COMPILE_TEST
        select LTO_CLANG
        help

as it better addresses the issue. XFS was not the only driver to trigger this (#1516) so adding something there feels wrong. We can talk about that once the backport request is addressed.

@nathanchance
Copy link
Member

It sounds like the backport request was rejected. Do we want to try to add a denylist of configs plus a version check or just the version check to CONFIG_LTO_CLANG_FULL?

@nickdesaulniers
Copy link
Member

Let's update the CLANG_VERSION for full LTO, once clang-15 is generally available in distributions?

@nathanchance
Copy link
Member

Sure, although that will likely be quite a while, given that LLVM 15.0.0 has to first be released. We can consider doing it sooner if we receive new reports about this issue.

mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
IRLinker builds a work list of functions to materialize, then moves them
from a source module to a destination module one at a time.

This is a problem for blockaddress Constants, since they need not refer
to the function they are used in; IPSCCP is quite good at sinking these
constants deep into other functions when passed as arguments.

This would lead to curious errors during LTO:
  ld.lld: error: Never resolved function from blockaddress ...
based on the ordering of function definitions in IR.

The problem was that IRLinker would basically do:

  for function f in worklist:
    materialize f
    splice f from source module to destination module

in one pass, with Functions being lazily added to the running worklist.
This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.
This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs
list incorrectly, as it will never re-materialize an already
materialized (but spliced out) function.

Because of the possibility that blockaddress Constants may appear in
Functions other than the ones they reference, this patch adds a new
bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of
Functions that contain BlockAddress Constants that refer back to this
Function, rather then the Function they are scoped in. We then
materialize those functions when materializing `f` from the example loop
above. This might over-materialize Functions should the user of
BitcodeReader ultimately decide not to link those Functions, but we can
at least now we can avoid this ordering related issue with blockaddresses.

Fixes: llvm/llvm-project#52787
Fixes: ClangBuiltLinux/linux#1215

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D120781
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] arm64 This bug impacts ARCH=arm64 [BUG] llvm A bug that should be fixed in upstream LLVM [FEATURE] LTO Related to building the kernel with LLVM Link Time Optimization [FIXED][LLVM] 15 This bug was fixed in LLVM 15.x Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. [TOOL] lld The issue is relevant to LLD linker
Projects
None yet
Development

No branches or pull requests

9 participants