From a30dc75616b3193a4b0ae7a9bdeabfb132fc44f1 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Tue, 5 Dec 2023 16:50:20 +0000 Subject: [PATCH] build based on d768d09 --- dev/.documenter-siteinfo.json | 2 +- dev/Details/Interprecision_1/index.html | 2 +- dev/Details/Stats/index.html | 2 +- dev/Details/Termination/index.html | 2 +- dev/Half_1/index.html | 2 +- dev/References/index.html | 2 +- dev/functions/MPArray/index.html | 2 +- dev/functions/MPGArray/index.html | 2 +- dev/functions/hlu!/index.html | 2 +- dev/functions/mpgeslir/index.html | 4 ++-- dev/functions/mpglu!/index.html | 2 +- dev/functions/mpglu/index.html | 2 +- dev/functions/mpgmir/index.html | 2 +- dev/functions/mplu!/index.html | 2 +- dev/functions/mplu/index.html | 2 +- dev/index.html | 2 +- 16 files changed, 17 insertions(+), 17 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index cf9dc6a3..cdfaede5 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-05T16:18:15","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-05T16:50:14","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/Details/Interprecision_1/index.html b/dev/Details/Interprecision_1/index.html index 7fb64ead..27f7aae0 100644 --- a/dev/Details/Interprecision_1/index.html +++ b/dev/Details/Interprecision_1/index.html @@ -14,4 +14,4 @@ 1024 9.6e-13 1.1e-12 3.4e-15 4.8e-15 1.3e-03 1.4e-03 2048 1.0e-12 1.2e-12 5.1e-15 4.5e-15 7.2e-03 6.8e-03 4096 2.1e-12 2.1e-12 6.6e-15 7.5e-15 2.4e-02 2.5e-02 - 8192 3.3e-12 3.2e-12 9.0e-15 1.0e-14 8.4e-02 8.9e-02 + 8192 3.3e-12 3.2e-12 9.0e-15 1.0e-14 8.4e-02 8.9e-02 diff --git a/dev/Details/Stats/index.html b/dev/Details/Stats/index.html index bf1d43ef..7f06a0bb 100644 --- a/dev/Details/Stats/index.html +++ b/dev/Details/Stats/index.html @@ -22,4 +22,4 @@ 1.21892e-04 5.25805e-11 2.56462e-14 - 1.33227e-15

As you can see, IR does well for this problem. The package uses an initial iterate of $x = 0$ and so the initial residual is simply $r = b$ and the first entry in the residual history is $|| b ||_\infty$. The iteration terminates successfully after four matrix-vector products.

There are more examples for this in [2].

+ 1.33227e-15

As you can see, IR does well for this problem. The package uses an initial iterate of $x = 0$ and so the initial residual is simply $r = b$ and the first entry in the residual history is $|| b ||_\infty$. The iteration terminates successfully after four matrix-vector products.

There are more examples for this in [2].

diff --git a/dev/Details/Termination/index.html b/dev/Details/Termination/index.html index a13d0177..4aba2cc2 100644 --- a/dev/Details/Termination/index.html +++ b/dev/Details/Termination/index.html @@ -1,2 +1,2 @@ -Terminating the while loop · MultiPrecisionArrays.jl

Terminating the while loop

We terminate the loop when

\[\| r \| < \tau \| b \|\]

where we use $\tau = 10 * eps(TH)$, where $eps(TH)$ is high precision floating point machine epsilon. The problem with this criterion is that IR can stagnate, especially for ill-conditioned problems, before the termination criterion is attained. We detect stagnation by looking for a unacceptable decrease (or increase) in the residual norm. So we will terminate the iteration if

\[\| r_{new} \| \ge .9 \| r_{old} \|\]

even if the small residual condition is not satisfied.

+Terminating the while loop · MultiPrecisionArrays.jl

Terminating the while loop

We terminate the loop when

\[\| r \| < \tau \| b \|\]

where we use $\tau = 10 * eps(TH)$, where $eps(TH)$ is high precision floating point machine epsilon. The problem with this criterion is that IR can stagnate, especially for ill-conditioned problems, before the termination criterion is attained. We detect stagnation by looking for a unacceptable decrease (or increase) in the residual norm. So we will terminate the iteration if

\[\| r_{new} \| \ge .9 \| r_{old} \|\]

even if the small residual condition is not satisfied.

diff --git a/dev/Half_1/index.html b/dev/Half_1/index.html index 6d676030..e3a567f4 100644 --- a/dev/Half_1/index.html +++ b/dev/Half_1/index.html @@ -64,4 +64,4 @@ 0.2875508 0.004160166 julia> println(norm(b-A*z,Inf)/norm(b,Inf)," ",norm(b-A*y,Inf)/norm(b,Inf)) -0.0012593127 1.4025759e-5

So, the relative error and relative residual norm for GMRES-IR is much smaller than that for IR.

+0.0012593127 1.4025759e-5

So, the relative error and relative residual norm for GMRES-IR is much smaller than that for IR.

diff --git a/dev/References/index.html b/dev/References/index.html index 8e7c421b..b3377eef 100644 --- a/dev/References/index.html +++ b/dev/References/index.html @@ -1,2 +1,2 @@ -References · MultiPrecisionArrays.jl

References

[1]
C. T. Kelley. MultiPrecisionArrays.jl (2023). Julia Package.
[2]
C. T. Kelley. Using MultiPrecisonArrays.jl: Iterative Refinement in Julia (2023), arXiv:2311.14616 [math.NA].
[3]
C. T. Kelley. SIAMFANLEquations.jl (2022). Julia Package.
[4]
C. T. Kelley. Solving Nonlinear Equations with Iterative Methods: Solvers and Examples in Julia. No. 20 of Fundamentals of Algorithms (SIAM, Philadelphia, 2022).
[5]
C. T. Kelley. Notebook for Solving Nonlinear Equations with Iterative Methods: Solvers and Examples in Julia, https://github.com/ctkelley/NotebookSIAMFANL (2022). IJulia Notebook.
[6]
N. J. Higham. Accuracy and Stability of Numerical Algorithms (Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1996); p. xxviii+688.
+References · MultiPrecisionArrays.jl

References

[1]
C. T. Kelley. MultiPrecisionArrays.jl (2023). Julia Package.
[2]
C. T. Kelley. Using MultiPrecisonArrays.jl: Iterative Refinement in Julia (2023), arXiv:2311.14616 [math.NA].
[3]
C. T. Kelley. SIAMFANLEquations.jl (2022). Julia Package.
[4]
C. T. Kelley. Solving Nonlinear Equations with Iterative Methods: Solvers and Examples in Julia. No. 20 of Fundamentals of Algorithms (SIAM, Philadelphia, 2022).
[5]
C. T. Kelley. Notebook for Solving Nonlinear Equations with Iterative Methods: Solvers and Examples in Julia, https://github.com/ctkelley/NotebookSIAMFANL (2022). IJulia Notebook.
[6]
N. J. Higham. Accuracy and Stability of Numerical Algorithms (Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1996); p. xxviii+688.
diff --git a/dev/functions/MPArray/index.html b/dev/functions/MPArray/index.html index d9d3cb36..5617691c 100644 --- a/dev/functions/MPArray/index.html +++ b/dev/functions/MPArray/index.html @@ -4,4 +4,4 @@ AL::Array{TL,2} residual::Vector{TH} onthefly::Bool -end

The constructor just builds an MPArray with TH=Float64. Set TL=Float16 to get double/half IR.

source
MultiPrecisionArrays.MPArrayMethod

MPArray(AH::Array{Float32,2}; TL = Float16, onthefly=true) Default single precision constructor for MPArray with TL=Float16

If your high precision array is single, then your low precision array is half (Duh!).

We do the triangular solves with on-the-fly interprecision transfer in this case because the bit of extra accuracy makes a difference and, at least for now, on-the-fly interprecision transfers are cheaper.

Data structures etc are the same as in the double-single/half case, but you don't have the option to go lower than half.

source
+end

The constructor just builds an MPArray with TH=Float64. Set TL=Float16 to get double/half IR.

source
MultiPrecisionArrays.MPArrayMethod

MPArray(AH::Array{Float32,2}; TL = Float16, onthefly=true) Default single precision constructor for MPArray with TL=Float16

If your high precision array is single, then your low precision array is half (Duh!).

We do the triangular solves with on-the-fly interprecision transfer in this case because the bit of extra accuracy makes a difference and, at least for now, on-the-fly interprecision transfers are cheaper.

Data structures etc are the same as in the double-single/half case, but you don't have the option to go lower than half.

source
diff --git a/dev/functions/MPGArray/index.html b/dev/functions/MPGArray/index.html index a9a66bb2..b44db396 100644 --- a/dev/functions/MPGArray/index.html +++ b/dev/functions/MPGArray/index.html @@ -1,2 +1,2 @@ -MPGArray: constructor · MultiPrecisionArrays.jl

MPGArray: constructor

MultiPrecisionArrays.MPGArrayMethod

MPGArray(AH::Array{Float64,2}; basissize=10, TL=Float32)

An MPGArray stores the high precision matrix, the low precision factorization, the Krylov basis, and a few other things GMRES needs. If the high precision matrix is double, the low precision is single by default. Half is an optioin which you get with TL=Float16.

source
MultiPrecisionArrays.MPGArrayMethod

MPGArray(AH::Array{Float32,2}; basissize=10, TL=Float16)

An MPGArray stores the high precision matrix, the low precision factorization, the Krylov basis, and a few other things GMRES needs. Since High precision is single, low is half. I'm leaving the kwarg for TL in there because it makes is easier to cut/paste calls to MPGArray different precisions into a CI loop.

source
+MPGArray: constructor · MultiPrecisionArrays.jl

MPGArray: constructor

MultiPrecisionArrays.MPGArrayMethod

MPGArray(AH::Array{Float64,2}; basissize=10, TL=Float32)

An MPGArray stores the high precision matrix, the low precision factorization, the Krylov basis, and a few other things GMRES needs. If the high precision matrix is double, the low precision is single by default. Half is an optioin which you get with TL=Float16.

source
MultiPrecisionArrays.MPGArrayMethod

MPGArray(AH::Array{Float32,2}; basissize=10, TL=Float16)

An MPGArray stores the high precision matrix, the low precision factorization, the Krylov basis, and a few other things GMRES needs. Since High precision is single, low is half. I'm leaving the kwarg for TL in there because it makes is easier to cut/paste calls to MPGArray different precisions into a CI loop.

source
diff --git a/dev/functions/hlu!/index.html b/dev/functions/hlu!/index.html index b1ec248e..4a16f8a5 100644 --- a/dev/functions/hlu!/index.html +++ b/dev/functions/hlu!/index.html @@ -1,2 +1,2 @@ -hlu!: Get LU to perform reasonably well for Float16 · MultiPrecisionArrays.jl

hlu!: Get LU to perform reasonably well for Float16

MultiPrecisionArrays.hlu!Method

hlu!(A::Matrix{T}; minbatch=16) where {T} Return LU factorization of A

C. T. Kelley, 2023

This function is a hack of generic_lufact! which is part of

https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/lu.jl

I "fixed" the code to be Float16 only and fixed pivoting to only MaxRow.

All I did in the factorization was thread the critical loop with Polyester.@batch and put @simd in the inner loop. These changes got me a 10x speedup on my Mac M2 Pro with 8 performance cores. I'm happy.

I set Polyester's minbatch to 16, which worked best for me. YMMV

source
+hlu!: Get LU to perform reasonably well for Float16 · MultiPrecisionArrays.jl

hlu!: Get LU to perform reasonably well for Float16

MultiPrecisionArrays.hlu!Method

hlu!(A::Matrix{T}; minbatch=16) where {T} Return LU factorization of A

C. T. Kelley, 2023

This function is a hack of generic_lufact! which is part of

https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/lu.jl

I "fixed" the code to be Float16 only and fixed pivoting to only MaxRow.

All I did in the factorization was thread the critical loop with Polyester.@batch and put @simd in the inner loop. These changes got me a 10x speedup on my Mac M2 Pro with 8 performance cores. I'm happy.

I set Polyester's minbatch to 16, which worked best for me. YMMV

source
diff --git a/dev/functions/mpgeslir/index.html b/dev/functions/mpgeslir/index.html index c04f17c6..440486e0 100644 --- a/dev/functions/mpgeslir/index.html +++ b/dev/functions/mpgeslir/index.html @@ -1,5 +1,5 @@ -mpgeslir: IR solver · MultiPrecisionArrays.jl

mpgeslir: IR solver

MultiPrecisionArrays.mpgeslirMethod

mpgeslir(AF::MPFact, b; reporting=false, verbose=true)

Use a multi-precision factorization to solve a linear system with plain vanilla iterative refinement.

MPFact is a union of all the MultiPrecision factorizations in the package. The triangular solver will dispatch on the various types depending on how the interprecision transfers get done.

source
MultiPrecisionArrays.mpgeslirMethod

mpgeslir(MPA::MPArray, b; reporting = false, verbose = true)

Use a multi-precision factorization to solve a linear system with plain vanilla iterative refinement.

This version is analogous to A\b and combines the factorization and the solve. You start with MPA=MPArray(A) and then pass MPA to mpgeslir and combine the factorization and the solve.

You can also get the multiprecision factorization directly with

MPF=mplu(A)

and then pass MPF to mpgeslir.

I use this to get some timing results and it's also convenient if you want to do factor and solve in one statement.

You can also get this with x = MPA\b.

If you set the kwarg reporting to true you can get the IR residual history. The output of

x = MPA\b

or

x=MPF\b

is the solition. The output of

mout = \(MPA,b; reporting=true)

or

mout = \(MPF,b; reporting=true)

is a structure. mpout.sol is the solution. mpout.rhist is the residual history. mpout also contains the datatypes TH for high precision and TL for low precision.

Example

julia> using MultiPrecisionArrays.Examples
+mpgeslir: IR solver · MultiPrecisionArrays.jl

mpgeslir: IR solver

MultiPrecisionArrays.mpgeslirMethod

mpgeslir(AF::MPFact, b; reporting=false, verbose=true)

Use a multi-precision factorization to solve a linear system with plain vanilla iterative refinement.

MPFact is a union of all the MultiPrecision factorizations in the package. The triangular solver will dispatch on the various types depending on how the interprecision transfers get done.

source
MultiPrecisionArrays.mpgeslirMethod

mpgeslir(MPA::MPArray, b; reporting = false, verbose = true)

Use a multi-precision factorization to solve a linear system with plain vanilla iterative refinement.

This version is analogous to A\b and combines the factorization and the solve. You start with MPA=MPArray(A) and then pass MPA to mpgeslir and combine the factorization and the solve.

You can also get the multiprecision factorization directly with

MPF=mplu(A)

and then pass MPF to mpgeslir.

I use this to get some timing results and it's also convenient if you want to do factor and solve in one statement.

You can also get this with x = MPA\b.

If you set the kwarg reporting to true you can get the IR residual history. The output of

x = MPA\b

or

x=MPF\b

is the solition. The output of

mout = \(MPA,b; reporting=true)

or

mout = \(MPF,b; reporting=true)

is a structure. mpout.sol is the solution. mpout.rhist is the residual history. mpout also contains the datatypes TH for high precision and TL for low precision.

Example

julia> using MultiPrecisionArrays.Examples
 
 julia> N=4096; A = I - 800.0 * Gmat(N); b=ones(N);
 
@@ -21,4 +21,4 @@
 julia> [mout.TH mout.TL]
 1×2 Matrix{DataType}:
  Float64  Float32
-
source
+
source
diff --git a/dev/functions/mpglu!/index.html b/dev/functions/mpglu!/index.html index 8e746e07..1a9aa881 100644 --- a/dev/functions/mpglu!/index.html +++ b/dev/functions/mpglu!/index.html @@ -1,2 +1,2 @@ -mpglu!: Factor a MPGArray and set it up for GMRES y allocating room · MultiPrecisionArrays.jl
+mpglu!: Factor a MPGArray and set it up for GMRES y allocating room · MultiPrecisionArrays.jl
diff --git a/dev/functions/mpglu/index.html b/dev/functions/mpglu/index.html index 5ba82eb6..d22bbfad 100644 --- a/dev/functions/mpglu/index.html +++ b/dev/functions/mpglu/index.html @@ -1,2 +1,2 @@ -mpglu: Combine MPGArray construction and factorization · MultiPrecisionArrays.jl
+mpglu: Combine MPGArray construction and factorization · MultiPrecisionArrays.jl
diff --git a/dev/functions/mpgmir/index.html b/dev/functions/mpgmir/index.html index cc713666..be17915b 100644 --- a/dev/functions/mpgmir/index.html +++ b/dev/functions/mpgmir/index.html @@ -70,4 +70,4 @@ julia> mpout.TL Float16 -source +source diff --git a/dev/functions/mplu!/index.html b/dev/functions/mplu!/index.html index e7eb3a7f..a285ae7b 100644 --- a/dev/functions/mplu!/index.html +++ b/dev/functions/mplu!/index.html @@ -1,2 +1,2 @@ -mplu!: Simple MPArray factorization · MultiPrecisionArrays.jl

mplu!: Simple MPArray factorization

MultiPrecisionArrays.mplu!Method

mplu!(MPA::MPArray)

Plain vanilla MPArray factorization: Factor the low precision copy and leave the high precision matrix alone. You get a factorization object as output and can use \ to solve linear systems.

The story on interprecision transfers is that you can set the Boolean onthefly when you construct the MPArray. If you use mplu then you get the defaults

  • If onthefly == false then the solver downcasts the residual

before the solve and avoids N^2 interprecision transfers.

  • If onthefly == true then the solver does interprecision transfers on the fly and incurs the N^2 interprecision transfer cost for that.

    onthefly == true is what you must use if you plan to use the low precision factorization as a preconditioner in IR-GMRES or you're working in Float16 and the matrix is very ill-conditioned.

    onthefly == nothing means you take the defaults.

source
+mplu!: Simple MPArray factorization · MultiPrecisionArrays.jl

mplu!: Simple MPArray factorization

MultiPrecisionArrays.mplu!Method

mplu!(MPA::MPArray)

Plain vanilla MPArray factorization: Factor the low precision copy and leave the high precision matrix alone. You get a factorization object as output and can use \ to solve linear systems.

The story on interprecision transfers is that you can set the Boolean onthefly when you construct the MPArray. If you use mplu then you get the defaults

  • If onthefly == false then the solver downcasts the residual

before the solve and avoids N^2 interprecision transfers.

  • If onthefly == true then the solver does interprecision transfers on the fly and incurs the N^2 interprecision transfer cost for that.

    onthefly == true is what you must use if you plan to use the low precision factorization as a preconditioner in IR-GMRES or you're working in Float16 and the matrix is very ill-conditioned.

    onthefly == nothing means you take the defaults.

source
diff --git a/dev/functions/mplu/index.html b/dev/functions/mplu/index.html index 5f4260ee..4cf1129d 100644 --- a/dev/functions/mplu/index.html +++ b/dev/functions/mplu/index.html @@ -1,2 +1,2 @@ -mplu: Combine MPArray construction and factorization · MultiPrecisionArrays.jl
+mplu: Combine MPArray construction and factorization · MultiPrecisionArrays.jl
diff --git a/dev/index.html b/dev/index.html index 55dc49c7..46c9172a 100644 --- a/dev/index.html +++ b/dev/index.html @@ -87,4 +87,4 @@ MPA=MPArray(A; TL=TL, onthefly=onthefly) MPF=mplu!(MPA) return MPF -end

The function mplu has two keyword arguments. The easy one to understand is TL which is the precision of the factoriztion. Julia has support for single (Float32) and half (Float16) precisions. If you set TL=Float16 then low precision will be half. Don't do that unless you know what you're doing. Using half precision is a fast way to get incorrect results. Look at the section on half precision in this Readme for a bit more bad news.

The other keyword arguemnt is onthefly. That keyword controls how the triangular solvers from the factorization work. When you solve

\[LU d = r\]

The LU factors are in low precision and the residual $r$ is in high precision. If you let Julia and LAPACK figure out what to do, then the solves will be done in high precision and the entries in the LU factors will be comverted to high precision with each binary operation. The output $d$ will be in high precision. This is called interprecision transfer on-the-fly and onthefly = true will tell the solvers to do it that way. You have $N^2$ interprecsion transfers with each solve and, as we will see, that can have a non-trivial cost.

When low precision is Float32, then the default is (onthefly = false). This converts $r$ to low precision, does the solve entirely in low precision, and then promotes $d$ to high precision. You need to be careful to avoid overflow and, more importantly, underflow when you do that and we scale $r$ to be a unit vector before conversion to low precisiion and reverse the scaling when we promote $d$. We take care of this for you.

mplu calls the constructor for the multiprecision array and then factors the low precision matrix. In some cases, such as nonlinear solvers, you will want to separate the constructor and the factorization. When you do that remember that mplu! overwrites the low precision copy of A with the factors, so you can't resuse the multiprecision array for other problems unless you restore the low precision copy.

+end

The function mplu has two keyword arguments. The easy one to understand is TL which is the precision of the factoriztion. Julia has support for single (Float32) and half (Float16) precisions. If you set TL=Float16 then low precision will be half. Don't do that unless you know what you're doing. Using half precision is a fast way to get incorrect results. Look at the section on half precision in this Readme for a bit more bad news.

The other keyword arguemnt is onthefly. That keyword controls how the triangular solvers from the factorization work. When you solve

\[LU d = r\]

The LU factors are in low precision and the residual $r$ is in high precision. If you let Julia and LAPACK figure out what to do, then the solves will be done in high precision and the entries in the LU factors will be comverted to high precision with each binary operation. The output $d$ will be in high precision. This is called interprecision transfer on-the-fly and onthefly = true will tell the solvers to do it that way. You have $N^2$ interprecsion transfers with each solve and, as we will see, that can have a non-trivial cost.

When low precision is Float32, then the default is (onthefly = false). This converts $r$ to low precision, does the solve entirely in low precision, and then promotes $d$ to high precision. You need to be careful to avoid overflow and, more importantly, underflow when you do that and we scale $r$ to be a unit vector before conversion to low precisiion and reverse the scaling when we promote $d$. We take care of this for you.

mplu calls the constructor for the multiprecision array and then factors the low precision matrix. In some cases, such as nonlinear solvers, you will want to separate the constructor and the factorization. When you do that remember that mplu! overwrites the low precision copy of A with the factors, so you can't resuse the multiprecision array for other problems unless you restore the low precision copy.