Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ctkelley authored Jul 23, 2024
1 parent e8829fd commit 692e192
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,21 +34,17 @@ __The half precision LU for Float16 in this package is much faster (more than 10

## What's new?

- v0.1.0: Better docs and ...
- I no longer export the constructors and the MPArray factorizations. You should only be using mplu, mplu!, mpglu, mpglu!, ...
- Notation and variable name change to conform with standard practice (TH --> TW for working precision, TL --> TF for factorization precision etc). If you just use the multiprecision factorizations with no options, you will not notice this.
- Explanation for why I am not excited about evaluating the residual in extended precision + a bit of support for that anyhow
- Replacing Polyester with [OhMyThreads](https://github.com/JuliaFolds2/OhMyThreads.jl) v0.5 or later. I am worried about [this](https://discourse.julialang.org/t/why-is-loopvectorization-deprecated/109547/74).

- v0.1.1: Better docs and updated termination criterion (normwise backward error)

- v0.1.2: Even better docs and ...
- v0.1.2: Better docs and ...
- Krylov-IR for high precision residuals

- v0.1.3: Still better docs and ..
- Fixing a performance bug.
- Add options to termination criterion. Change default to small residuals.
- Add options to termination criterion. __Change default back to small residuals.__

- v0.1.4: Continuous improvement for the docs and ...
- Enable fine control of termination criteria parameters __(XXX! Adults only!)__

## Can I complain about this package?

Yes, but ...
Expand Down Expand Up @@ -118,6 +114,10 @@ one incurs the storage penalty of making a low
precision copy of $A$ and reaps the benefit of only having to
factor the low precision copy.

While you might think this is a good idea for all problems, it is not so good for smaller problems. The reason is that IR swaps the factoriation cost for
a matrix-vector multiply and the two triangular solves for LU __in each IR iteration__. Triangular solves do not thread as well as factorizations or matrix-vector
multiplies and that can affect the performance in a significant way, even though it is only $N^2$ work. The details are [in the docs](https://ctkelley.github.io/MultiPrecisionArrays.jl/dev/Details/N2Work).



## Example
Expand Down

0 comments on commit 692e192

Please sign in to comment.