-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flang: Operations with complex floating point numbers give unexpected results on MinGW #61976
Comments
Assembly from .text
.def @feat.00;
.scl 3;
.type 0;
.endef
.globl @feat.00
.set @feat.00, 0
.file "FIRModule"
.def _QQmain;
.scl 2;
.type 32;
.endef
.globl _QQmain
.p2align 4, 0x90
_QQmain:
.seh_proc _QQmain
subq $104, %rsp
.seh_stackalloc 104
.seh_endprologue
movl $1065353216, 92(%rsp)
movl $0, 88(%rsp)
movss 88(%rsp), %xmm0
movss 92(%rsp), %xmm1
movss %xmm1, 84(%rsp)
movss %xmm0, 80(%rsp)
movsd 80(%rsp), %xmm0
movapd %xmm0, 48(%rsp)
leaq 48(%rsp), %rcx
movl $2, %edx
callq _FortranAcpowi
movlpd %xmm0, 72(%rsp)
movss 72(%rsp), %xmm0
movss 76(%rsp), %xmm1
movss %xmm1, 100(%rsp)
movss %xmm0, 96(%rsp)
leaq _QQcl.2E2F746573742D706F772E66393000(%rip), %rdx
movl $-1, %ecx
movl $6, %r8d
callq _FortranAioBeginExternalListOutput
movq %rax, %rcx
movq %rcx, 40(%rsp)
movss 96(%rsp), %xmm1
movss 100(%rsp), %xmm2
callq _FortranAioOutputComplex32
movq 40(%rsp), %rcx
callq _FortranAioEndIoStatement
nop
addq $104, %rsp
retq
.seh_endproc
.section .rdata,"dr"
.weak _QQcl.2E2F746573742D706F772E66393000
_QQcl.2E2F746573742D706F772E66393000:
.asciz "./test-pow.f90"
.globl _QQEnvironmentDefaults
.p2align 3, 0x0
_QQEnvironmentDefaults:
.quad 0 And LLVM IR with ; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu"
@_QQcl.2E2F746573742D706F772E66393000 = linkonce constant [15 x i8] c"./test-pow.f90\00"
@_QQEnvironmentDefaults = constant ptr null
declare ptr @malloc(i64)
declare void @free(ptr)
define void @_QQmain() {
%1 = alloca { float, float }, i64 1, align 8
%2 = alloca { float, float }, i64 1, align 8
store { float, float } { float 0.000000e+00, float 1.000000e+00 }, ptr %2, align 4
%3 = load { float, float }, ptr %2, align 4
%4 = alloca <2 x float>, i64 1, align 8
store { float, float } %3, ptr %4, align 4
%5 = load <2 x float>, ptr %4, align 8
%6 = call <2 x float> @_FortranAcpowi(<2 x float> %5, i32 2)
%7 = alloca <2 x float>, i64 1, align 8
store <2 x float> %6, ptr %7, align 8
%8 = load { float, float }, ptr %7, align 4
store { float, float } %8, ptr %1, align 4
%9 = call ptr @_FortranAioBeginExternalListOutput(i32 -1, ptr @_QQcl.2E2F746573742D706F772E66393000, i32 6)
%10 = load { float, float }, ptr %1, align 4
%11 = extractvalue { float, float } %10, 0
%12 = extractvalue { float, float } %10, 1
%13 = call i1 @_FortranAioOutputComplex32(ptr %9, float %11, float %12)
%14 = call i32 @_FortranAioEndIoStatement(ptr %9)
ret void
}
declare <2 x float> @_FortranAcpowi(<2 x float>, i32)
declare ptr @_FortranAioBeginExternalListOutput(i32, ptr, i32)
declare i1 @_FortranAioOutputComplex32(ptr, float, float)
declare i32 @_FortranAioEndIoStatement(ptr)
!llvm.module.flags = !{!0, !1}
!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 8, !"PIC Level", i32 2} |
Not sure if the following helps for comparison. (I hope I understood @mstorsjo's hints on IRC correctly): struct cpx { float r, i; };
struct cpx _FortranAcpowi(struct cpx base, int exp);
int main(void)
{
struct cpx c;
c.r = 0;
c.i = 1;
struct cpx z = _FortranAcpowi(c, 2);
return 0;
} The assembly for .text
.def @feat.00;
.scl 3;
.type 0;
.endef
.globl @feat.00
.set @feat.00, 0
.file "test-pow.c"
.def main;
.scl 2;
.type 32;
.endef
.globl main # -- Begin function main
.p2align 4, 0x90
main: # @main
.seh_proc main
# %bb.0:
pushq %rbp
.seh_pushreg %rbp
subq $32, %rsp
.seh_stackalloc 32
leaq 32(%rsp), %rbp
.seh_setframe %rbp, 32
.seh_endprologue
callq __main
movabsq $4575657221408423936, %rcx # imm = 0x3F80000000000000
movl $2, %edx
callq _FortranAcpowi
xorl %eax, %eax
addq $32, %rsp
popq %rbp
retq
.seh_endproc
# -- End function
.addrsig |
Ok, so in the case of C, at least with this definition of the struct, Clang expects to pass the struct of two floats packed into one GPR as the first argument. While in the case of the Fortran generated call above:
Here the complex struct is packed, stored on the stack and the address of it is passed as the first argument. I think this might be the mismatch at play here. From the LLVM IR generated, in the Flang case, So I believe this boils down to Flang not knowing what the C level ABI is for complex numbers on this particular platform and how it's represented. |
It looks like https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/flang/lib/Optimizer/CodeGen/Target.cpp#L134-L136 is the code that might be responsible for picking the right type here - it expects the linux type of ABI for all x86_64 targets here. https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/flang/lib/Optimizer/CodeGen/Target.cpp#L102-L104 looks like what we'd want it to be for x86_64 on Windows. |
Thank you very much for tracking that down. I hope it is ok to extent that to the double precision case:
Result:
Generated assembly for
Mock C example for comparison:
Result:
Assembly for
|
@llvm/issue-subscribers-flang-ir |
Thank you @mstorsjo for pointing out the fix. The following change makes the (single precision) complex example work for me: diff --git a/flang/lib/Optimizer/CodeGen/Target.cpp b/flang/lib/Optimizer/CodeGen/Target.cpp
index 50d144909a1c..83b6261c1d34 100644
--- a/flang/lib/Optimizer/CodeGen/Target.cpp
+++ b/flang/lib/Optimizer/CodeGen/Target.cpp
@@ -232,6 +232,71 @@ struct TargetX86_64 : public GenericTarget<TargetX86_64> {
};
} // namespace
+//===----------------------------------------------------------------------===//
+// x86_64 (x86 64 bit) Windows target specifics.
+//===----------------------------------------------------------------------===//
+
+namespace {
+struct TargetX86_64Win : public GenericTarget<TargetX86_64Win> {
+ using GenericTarget::GenericTarget;
+
+ static constexpr int defaultWidth = 64;
+
+ CodeGenSpecifics::Marshalling
+ complexArgumentType(mlir::Location loc, mlir::Type eleTy) const override {
+ CodeGenSpecifics::Marshalling marshal;
+ const auto *sem = &floatToSemantics(kindMap, eleTy);
+ if (sem == &llvm::APFloat::IEEEsingle()) {
+ // i64 pack both floats in a 64-bit GPR
+ marshal.emplace_back(mlir::IntegerType::get(eleTy.getContext(), 64),
+ AT{});
+ } else if (sem == &llvm::APFloat::IEEEdouble()) {
+ // Use a type that will be translated into LLVM as:
+ // { double, double } struct of 2 double
+ marshal.emplace_back(mlir::TupleType::get(eleTy.getContext(),
+ mlir::TypeRange{eleTy, eleTy}),
+ AT{});
+ } else if (sem == &llvm::APFloat::IEEEquad()) {
+ // Use a type that will be translated into LLVM as:
+ // { fp128, fp128 } struct of 2 fp128, byval, align 16
+ marshal.emplace_back(
+ fir::ReferenceType::get(mlir::TupleType::get(
+ eleTy.getContext(), mlir::TypeRange{eleTy, eleTy})),
+ AT{/*align=*/16, /*byval=*/true});
+ } else {
+ TODO(loc, "complex for this precision");
+ }
+ return marshal;
+ }
+
+ CodeGenSpecifics::Marshalling
+ complexReturnType(mlir::Location loc, mlir::Type eleTy) const override {
+ CodeGenSpecifics::Marshalling marshal;
+ const auto *sem = &floatToSemantics(kindMap, eleTy);
+ if (sem == &llvm::APFloat::IEEEsingle()) {
+ // <2 x t> vector of 2 eleTy
+ marshal.emplace_back(fir::VectorType::get(2, eleTy), AT{});
+ } else if (sem == &llvm::APFloat::IEEEdouble()) {
+ // Use a type that will be translated into LLVM as:
+ // { double, double } struct of 2 double
+ marshal.emplace_back(mlir::TupleType::get(eleTy.getContext(),
+ mlir::TypeRange{eleTy, eleTy}),
+ AT{});
+ } else if (sem == &llvm::APFloat::IEEEquad()) {
+ // Use a type that will be translated into LLVM as:
+ // { fp128, fp128 } struct of 2 fp128, sret, align 16
+ marshal.emplace_back(
+ fir::ReferenceType::get(mlir::TupleType::get(
+ eleTy.getContext(), mlir::TypeRange{eleTy, eleTy})),
+ AT{/*align=*/16, /*byval=*/false, /*sret=*/true});
+ } else {
+ TODO(loc, "complex for this precision");
+ }
+ return marshal;
+ }
+};
+} // namespace
+
//===----------------------------------------------------------------------===//
// AArch64 linux target specifics.
//===----------------------------------------------------------------------===//
@@ -548,8 +613,12 @@ fir::CodeGenSpecifics::get(mlir::MLIRContext *ctx, llvm::Triple &&trp,
return std::make_unique<TargetI386>(ctx, std::move(trp),
std::move(kindMap));
case llvm::Triple::ArchType::x86_64:
- return std::make_unique<TargetX86_64>(ctx, std::move(trp),
- std::move(kindMap));
+ if (trp.isOSWindows())
+ return std::make_unique<TargetX86_64Win>(ctx, std::move(trp),
+ std::move(kindMap));
+ else
+ return std::make_unique<TargetX86_64>(ctx, std::move(trp),
+ std::move(kindMap));
case llvm::Triple::ArchType::aarch64:
return std::make_unique<TargetAArch64>(ctx, std::move(trp),
std::move(kindMap)); It still crashes for |
Got it to work for both float and double complex. The LLVM IR for both function arguments and return values looks equivalent to the C mock examples: From dc8830b6d616c03b41b287b3c34065a0592a566e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Markus=20M=C3=BCtzel?= <markus.muetzel@gmx.de>
Date: Thu, 6 Apr 2023 18:36:50 +0200
Subject: [PATCH] [flang][codegen]: Complex numbers in function arguments on
Windows.
Function arguments or return values that are complex floating point values
aren't correctly lowered for Windows 64-bit targets.
Add a target that is specific for that platform.
---
flang/lib/Optimizer/CodeGen/Target.cpp | 76 +++++++++++++++++++++++++-
1 file changed, 74 insertions(+), 2 deletions(-)
diff --git a/flang/lib/Optimizer/CodeGen/Target.cpp b/flang/lib/Optimizer/CodeGen/Target.cpp
index 50d144909a1c..e3f6757ac573 100644
--- a/flang/lib/Optimizer/CodeGen/Target.cpp
+++ b/flang/lib/Optimizer/CodeGen/Target.cpp
@@ -232,6 +232,74 @@ struct TargetX86_64 : public GenericTarget<TargetX86_64> {
};
} // namespace
+//===----------------------------------------------------------------------===//
+// x86_64 (x86 64 bit) Windows target specifics.
+//===----------------------------------------------------------------------===//
+
+namespace {
+struct TargetX86_64Win : public GenericTarget<TargetX86_64Win> {
+ using GenericTarget::GenericTarget;
+
+ static constexpr int defaultWidth = 64;
+
+ CodeGenSpecifics::Marshalling
+ complexArgumentType(mlir::Location loc, mlir::Type eleTy) const override {
+ CodeGenSpecifics::Marshalling marshal;
+ const auto *sem = &floatToSemantics(kindMap, eleTy);
+ if (sem == &llvm::APFloat::IEEEsingle()) {
+ // i64 pack both floats in a 64-bit GPR
+ marshal.emplace_back(mlir::IntegerType::get(eleTy.getContext(), 64),
+ AT{});
+ } else if (sem == &llvm::APFloat::IEEEdouble()) {
+ // Use a type that will be translated into LLVM as:
+ // { double, double } struct of 2 double, byval, align 8
+ marshal.emplace_back(
+ fir::ReferenceType::get(mlir::TupleType::get(
+ eleTy.getContext(), mlir::TypeRange{eleTy, eleTy})),
+ AT{/*align=*/8, /*byval=*/true});
+ } else if (sem == &llvm::APFloat::IEEEquad()) {
+ // Use a type that will be translated into LLVM as:
+ // { fp128, fp128 } struct of 2 fp128, byval, align 16
+ marshal.emplace_back(
+ fir::ReferenceType::get(mlir::TupleType::get(
+ eleTy.getContext(), mlir::TypeRange{eleTy, eleTy})),
+ AT{/*align=*/16, /*byval=*/true});
+ } else {
+ TODO(loc, "complex for this precision");
+ }
+ return marshal;
+ }
+
+ CodeGenSpecifics::Marshalling
+ complexReturnType(mlir::Location loc, mlir::Type eleTy) const override {
+ CodeGenSpecifics::Marshalling marshal;
+ const auto *sem = &floatToSemantics(kindMap, eleTy);
+ if (sem == &llvm::APFloat::IEEEsingle()) {
+ // i64 pack both floats in a 64-bit GPR
+ marshal.emplace_back(mlir::IntegerType::get(eleTy.getContext(), 64),
+ AT{});
+ } else if (sem == &llvm::APFloat::IEEEdouble()) {
+ // Use a type that will be translated into LLVM as:
+ // { double, double } struct of 2 double, sret, align 8
+ marshal.emplace_back(
+ fir::ReferenceType::get(mlir::TupleType::get(
+ eleTy.getContext(), mlir::TypeRange{eleTy, eleTy})),
+ AT{/*align=*/8, /*byval=*/false, /*sret=*/true});
+ } else if (sem == &llvm::APFloat::IEEEquad()) {
+ // Use a type that will be translated into LLVM as:
+ // { fp128, fp128 } struct of 2 fp128, sret, align 16
+ marshal.emplace_back(
+ fir::ReferenceType::get(mlir::TupleType::get(
+ eleTy.getContext(), mlir::TypeRange{eleTy, eleTy})),
+ AT{/*align=*/16, /*byval=*/false, /*sret=*/true});
+ } else {
+ TODO(loc, "complex for this precision");
+ }
+ return marshal;
+ }
+};
+} // namespace
+
//===----------------------------------------------------------------------===//
// AArch64 linux target specifics.
//===----------------------------------------------------------------------===//
@@ -548,8 +616,12 @@ fir::CodeGenSpecifics::get(mlir::MLIRContext *ctx, llvm::Triple &&trp,
return std::make_unique<TargetI386>(ctx, std::move(trp),
std::move(kindMap));
case llvm::Triple::ArchType::x86_64:
- return std::make_unique<TargetX86_64>(ctx, std::move(trp),
- std::move(kindMap));
+ if (trp.isOSWindows())
+ return std::make_unique<TargetX86_64Win>(ctx, std::move(trp),
+ std::move(kindMap));
+ else
+ return std::make_unique<TargetX86_64>(ctx, std::move(trp),
+ std::move(kindMap));
case llvm::Triple::ArchType::aarch64:
return std::make_unique<TargetAArch64>(ctx, std::move(trp),
std::move(kindMap));
--
2.38.0.windows.1
Can I do something more to help? |
Can you create a patch? https://llvm.org/docs/Phabricator.html |
One question, When Flang developers requested to join LLVM project years ago, they promised to add support for Windows. |
All Flang developers are volunteers. Flang is an open-source project and everyone is welcome to contribute. If Windows enablement is the most important feature for you then you should join the development effort. Now, since you have asked, we have been supporting Windows as much as possible. What has been done so far: The existence of Windows CI ensures that whatever changes are being made are tested on the Windows platform. I think all these (and more that I have missed) more than satisfies any promise that was made. |
I mean by volunteers everyone who contribute for free, Most of Flang contributors are paid developers (including you from AMD, if I am not mistaking) Windows CI was added after Windows support was added by volunteers, and the developers who added that CI (I think from Linaro) are not the one who promised Windows support. |
I work for Arm. If people are paid, they are also given priorities. For some it might be fixing all issues seen in tests, for some it might be enabling GPU offloading, for some it might be performance.
I don't know who made the promise. Having Windows support is a good thing. But there are a lot of people who have helped in adding support for Windows including developers at companies. Also writing the code in a reasonably portable way in the first place also helped to port. We should not forget all that effort.
Thanks for that. At the same time, we should not forget the effort of setting up the post-commit CI, enabling tests for Windows, and fixing any Windows issues seen while submitting patches. Sometimes this has not been easy without access to a local Windows machine. |
I uploaded a patch to Phabricator here: https://reviews.llvm.org/D147768 |
What about aarch64 on Windows? |
Apparently, that is working for you, isn't it? |
It is for Octave. Just hoping that fixing for x86_64 won't break aarch64 :) |
mingw-w64-flang is already patched. |
Those are different targets. Making changes for one doesn't affect the other. |
Function arguments or return values that are complex floating point values aren't correctly lowered for Windows x86 32-bit and 64-bit targets. See: #61976 Add targets that are specific for these platforms and OS. With thanks to @mstorsjo for pointing out the fix. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D147768
Should be fixed with this change: 774703e |
Function arguments or return values that are complex floating point values aren't correctly lowered for Windows x86 32-bit and 64-bit targets. See: llvm/llvm-project#61976 Add targets that are specific for these platforms and OS. With thanks to @mstorsjo for pointing out the fix. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D147768 (cherry picked from commit 774703e)
Function arguments or return values that are complex floating point values aren't correctly lowered for Windows x86 32-bit and 64-bit targets. See: llvm/llvm-project#61976 Add targets that are specific for these platforms and OS. With thanks to @mstorsjo for pointing out the fix. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D147768 (cherry picked from commit 774703e)
Function arguments or return values that are complex floating point values aren't correctly lowered for Windows x86 32-bit and 64-bit targets. See: llvm#61976 Add targets that are specific for these platforms and OS. With thanks to @mstorsjo for pointing out the fix. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D147768
Function arguments or return values that are complex floating point values aren't correctly lowered for Windows x86 32-bit and 64-bit targets. See: llvm#61976 Add targets that are specific for these platforms and OS. With thanks to @mstorsjo for pointing out the fix. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D147768 (cherry picked from commit 774703e)
@mmuetzel I am having this same problem on iOS AArch64 Darwin (I’m jailbroken but this would also apply to M1 Silicon Macs). I compiled your Fortran and C examples with flang 16.0.2 (which I compiled myself):
The middle result is the problem. As you can see, the C version compiled with clang-16.0.2 and the LFortran 0.19.0 version both give the correct result. Will the solution be to make a Flang Target patch for AArch64 like you did for Windows x86-64? Or does this indicate a larger issue? |
@badger200: I don't see how this is the same problem. Afaict, the three results you showed seem to be equivalent. The only difference seems to be how many significant digits are displayed. |
Function arguments or return values that are complex floating point values aren't correctly lowered for Windows x86 32-bit and 64-bit targets. See: llvm#61976 Add targets that are specific for these platforms and OS. With thanks to @mstorsjo for pointing out the fix. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D147768 (cherry picked from commit 774703e)
Function arguments or return values that are complex floating point values aren't correctly lowered for Windows x86 32-bit and 64-bit targets. See: llvm/llvm-project#61976 Add targets that are specific for these platforms and OS. With thanks to @mstorsjo for pointing out the fix. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D147768
The following code compiled with Flang 16.0.0 installed from MSYS2 gives unexpected results:
Result:
It almost looks like random memory is dereferenced for the real part, and the imaginary part is always 0.
It is working correctly with GFortran:
If there is anything I can do to help debugging, please let me know.
The text was updated successfully, but these errors were encountered: