Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new MatrixFields module, along with unit tests and performance tests #1326

Merged
merged 1 commit into from
Jul 21, 2023

Conversation

dennisYatunin
Copy link
Member

@dennisYatunin dennisYatunin commented Jun 14, 2023

Purpose

This PR encompasses the first two tasks of the implicit interface updates SDI. That is, it refactors StencilCoefs into BandMatrixRow and ApplyStencil/ComposeStencil into MultiplyColumnwiseBandMatrixField, and it also adds a comprehensive set of tests. In the process of refactoring, the code for low-level matrix operations has been simplified and useful new functionality has been added. In order to avoid conflicts with pre-existing code, the new data structures and functions have been placed in a separate module called MatrixFields; this module can be removed once we are ready to replace all of the old code.

Content

API improvements

  • Matrix field rows now have simple constructors (e.g., DiagonalMatrixRow and TridiagonalMatrixRow) that can be used to construct matrix fields on the fly in broadcast expressions.
  • Matrix fields are now automatically promoted to common types for basic arithmetic operations. This involves promoting the entries in the rows, as well as padding rows with zeros when rows have unequal numbers of entries. In addition, LinearAlgebra.I (or, rather, LinearAlgebra.UniformScaling) is automatically promoted to a DiagonalMatrixRow when used with BandMatrixRows. So, for example, it is now possible to evaluate DiagonalMartrixRow(1) - 2 * TridiagonalMatrixRow(3, 4, 5) - I / 6.
  • Matrix fields are now fully integrated with RecursiveApply, which allows a single matrix field to store multiple matrices by using Tuples or NamedTuples as entries instead of scalars. This is achieved by using the new rzero function instead of zero, and also by making +, -, *, and / call radd, rsub, rmul, and rdiv when used on BandMatrixRows. This change will make it possible to, e.g., store the derivatives of all tracer tendencies with respect to vertical velocity as a single matrix field in ClimaAtmos, which will lead to simpler code and better performance.
  • All matrix-matrix and matrix-vector multiplications can now be handled using the operator, which is an alias for MultiplyColumnwiseBandMatrixField(). The implementation of this operator is much simpler than the implementation of ApplyStencil/ComposeStencil, and it is effectively localized to the single function multiply_matrix_at_index. In order to avoid inference failures for complex matrix field broadcast expressions, the recursion limit for this function needed to be disabled. Unfortunately, just as with the pre-existing code, inlining this function with @propagate_inbounds drastically increased compilation time, and it also caused inference failures for particularly complicated matrix broadcast expressions. Although not using @propagate_inbounds causes a 3--4x slowdown in the evaluation of matrix broadcast expressions, it avoids a 10--100x slowdown in their compilation.
  • The matrix of the divergence operator can now be properly represented as a matrix of covectors, rather than a matrix of scalars. Since the divergence operator turns vectors into scalars, multiplying its matrix by a matrix/vector of vectors should return a matrix/vector of scalars. However, this is not as simple as just multiplying a covector by a vector and returning a scalar, since the vector may need to be projected onto the dual axis of the covector before the multiplication. So, matrix-matrix and matrix-vector multiplication is now implemented using the rmul_with_projection function instead of rmul. In general, this function works just like rmul, but, when the first argument is a covector (an AdjointAxisVector) and the second is a vector or higher-order tensor (an AxisTensor), it uses the local geometry to project the second argument before the multiplication. In the future, we may want to generalize this to higher-order cotensors.
  • The MatrixFields module also includes functions for converting matrix fields to more conventional arrays that are compatible with LinearAlgebra.jl. The function column_field2array converts a matrix field defined on a column space to a BandedMatrix from BandedMatrices.jl, and it also converts a regular field defined on a column space to a Vector. The function field2arrays works for fields defined on multiple columns by calling column_field2array on each column.
  • Matrix fields now get displayed in the REPL as actual matrices. Specifically, when a matrix field contains Number entries, its first column is passed to column_field2array_view, which is identical to column_field2array, except that it allocates less memory by returning a view of the field's underlying data. Although column_field2array and column_field2array_view work for matrix fields with Tuple/NamedTuple entries, the show method for a BandedMatrix crashes when displaying such entries. In addition, this show method results in somewhat unreadable output for other struct entries, like vectors and covectors. So, all non-Number matrix fields still get displayed in the REPL like regular fields.

Related ClimaCore Changes

  • To ensure that promotion and conversion of BandMatrixRow entries works correctly for nested Tuple/NamedTuple entries, the rpromote_type and rconvert functions have been added to RecursiveApply. The first function uses rmaptype to recursively call promote_type, and the second function uses rmap and rzero to make a more type-stable version of Julia's built-in convert function.
  • To support manipulating matrices of covectors (and, in the future, matrices of higher-order cotensors), all basic arithmetic operations have been defined for AdjointAxisTensor. This includes +, -, *, /, \, and ==, as well as the zero function. The new methods undo the adjoint and fall back to the methods for AxisTensor.
  • To simplify the setup of the non-scalar matrix tests, map has been generalized from only working for a single Field to working for multiple Fields.

Tests

  • All of the following test files check for correctness, allocations, and type instabilities.
  • The file test/MatrixFields/band_matrix_row.jl tests low-level operations with BandMatrixRows, ensuring that constructors, conversions, arithmetic operations, and combinations with LinearAlgebra.I work for numbers and nested Tuple/NamedTuple values.
  • The file test/MatrixFields/rmul_with_projection.jl tests rmul_with_projection for all currently supported combinations of scalars, vectors, tensors, and covectors, as well as several combinations of nested Tuple/NamedTuple values.
  • The file test/MatrixFields/field2array.jl checks that column_field2array, column_field2array_view, and field2arrays work as expected for very simple matrix fields.
  • The file test/MatrixFields/matrix_field_broadcasting.jl tests a wide range of scalar and non-scalar matrix field broadcast expressions. It compares each scalar matrix field broadcast expression against an equivalent implementation using field2arrays and LinearAlgebra's mul!, making sure that the matrix field implementation is roughly as performant as the array implementation. It also compares each non-scalar matrix field broadcast expression (e.g., an expression with matrices of covectors or NamedTuples) against an equivalent scalar matrix field implementation.
  • I have also checked that the new implementation is at least as performant as the pre-existing implementation for those cases that the pre-existing API can support. For example, to compare the two implementations for the "diagonal matrix times bi-diagonal matrix times tri-diagonal matrix times quad-diagonal matrix times vector" test case, run the following code after test/MatrixFields/matrix_field_broadcasting.jl:
    using ClimaCore: MatrixFields, Operators
    function compare_to_old_implementation()
        ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec, ᶠvec = random_test_fields(Float64)
        result = @. ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        inputs = (ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec)
        func!(result, ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec) =
            @. result = ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        time, allocs = @benchmark call_func(func!, result, inputs)
        
        to_old_version(row::BMR) where {BMR <: MatrixFields.BandMatrixRow} =
            Operators.StencilCoefs{MatrixFields.outer_diagonals(BMR)...}(row.entries)
        apply = Operators.ApplyStencil()
        compose = Operators.ComposeStencils()
        ᶜᶜmat_old = to_old_version.(ᶜᶜmat)
        ᶜᶠmat_old = to_old_version.(ᶜᶠmat)
        ᶠᶠmat_old = to_old_version.(ᶠᶠmat)
        ᶠᶜmat_old = to_old_version.(ᶠᶜmat)
        result_old = @. apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        inputs_old = (ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec)
        func!_old(result_old, ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec) =
            @. result_old = apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        time_old, allocs_old = @benchmark call_func(func!_old, result_old, inputs_old)
    
        @assert allocs == allocs_old == 0
    
        @info "New implementation time: $time s"
        @info "Old implementation time: $time_old s"
    end
    compare_to_old_implementation()
    
    This code prints out the following:
    [ Info: New implementation time: 1.5848e-5 s
    [ Info: Old implementation time: 1.5283e-5 s
    

@dennisYatunin dennisYatunin force-pushed the dy/matrix_fields branch 2 times, most recently from 03a4c2e to a4c796a Compare June 16, 2023 20:16
bors bot added a commit that referenced this pull request Jun 16, 2023
1334: Extend and improve RecursiveApply, fix DSS inference r=charleskawczynski a=charleskawczynski

This PR:
 - Extends and improves RecursiveApply (incorporates some changes/functionality in #1326) by working with Tuples/NamedTuples in a type-stable way
 - Adds RecursiveApply tests
 - Fix DSS inference and corresponding test

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
bors bot added a commit that referenced this pull request Jun 16, 2023
1334: Extend and improve RecursiveApply, fix DSS inference r=charleskawczynski a=charleskawczynski

This PR:
 - Extends and improves RecursiveApply (incorporates some changes/functionality in #1326) by working with Tuples/NamedTuples in a type-stable way
 - Adds RecursiveApply tests
 - Fix DSS inference and corresponding test

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
@dennisYatunin dennisYatunin force-pushed the dy/matrix_fields branch 4 times, most recently from f68577d to 6265bdf Compare June 21, 2023 18:22
@dennisYatunin
Copy link
Member Author

dennisYatunin commented Jun 21, 2023

Couple of updates:

  • All of the reviewer comments have been addressed.
  • All of the Manifest.jl files have been updated.
  • I added a file that tests low-level operations on BandMatrixRows. This showed that type promotion was not working as expected for Tuple/NamedTuple entries, and it showed that Julia's default convert is not type stable for nested Tuple/NamedTuple entries. So, I also added rpromote_type and rconvert to RecursiveApply.
  • I added a file that tests field-to-array conversions.
  • I removed rmul_with_projection_return_type, which I was using to get the return type of rmul_with_projection, and replaced it with Base._return_type, which is how Base.Broadcast gets the return types of broadcast expressions.
  • I made matrix fields of Numbers display in the REPL as matrices by overwriting the show method for Fields.Field{<:Fields.AbstractData{<:BandMatrixRow}} and calling column_field2array_view. @simonbyrne, Charlie said you might want to take a look at this change.

@dennisYatunin
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 21, 2023
@bors
Copy link
Contributor

bors bot commented Jun 21, 2023

try

Build failed:

@dennisYatunin
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 21, 2023
@bors
Copy link
Contributor

bors bot commented Jun 21, 2023

try

Build failed:

@dennisYatunin
Copy link
Member Author

dennisYatunin commented Jul 13, 2023

Also, it looks like the new invalidations are due to adding BandedMatrices.jl as a dependency to ClimaCore, because they remain unchanged when I comment out the new code in the MatrixFields module. Since this dependency is needed for printing and array-based utility functions, there isn't really a good way to get rid of the invalidations. The total invalidation count is only increasing by a fraction of a percent, though, so this isn't a big issue.

@simonbyrne
Copy link
Member

I'm broadly okay with this, but it is pretty complex, so it could do with a few more signposts so that others can find their way around. Could you expand the docs a bit to explain how it works?

@dennisYatunin
Copy link
Member Author

@simonbyrne @charleskawczynski I've added a new MatrixFields section to the docs, where I put all the docstrings from this module. I also moved the derivation of the matrix multiplication operator's algorithm there, and I added a summary docstring for the entire module. Is there anything else I should do before merging?

@dennisYatunin
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2023
@dennisYatunin
Copy link
Member Author

bors try-

@dennisYatunin
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2023
@bors
Copy link
Contributor

bors bot commented Jul 19, 2023

try

Build failed:

@dennisYatunin
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2023
@bors
Copy link
Contributor

bors bot commented Jul 19, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@dennisYatunin
Copy link
Member Author

bors r+

bors bot added a commit that referenced this pull request Jul 21, 2023
1326: Add new MatrixFields module, along with unit tests and performance tests r=dennisYatunin a=dennisYatunin

# Purpose

This PR encompasses the first two tasks of the implicit interface updates [SDI](#1230). That is, it refactors `StencilCoefs` into `BandMatrixRow` and `ApplyStencil`/`ComposeStencil` into `MultiplyColumnwiseBandMatrixField`, and it also adds a comprehensive set of tests. In the process of refactoring, the code for low-level matrix operations has been simplified and useful new functionality has been added. In order to avoid conflicts with pre-existing code, the new data structures and functions have been placed in a separate module called `MatrixFields`; this module can be removed once we are ready to replace all of the old code.

## Content

### API improvements
- Matrix field rows now have simple constructors (e.g., `DiagonalMatrixRow` and `TridiagonalMatrixRow`) that can be used to construct matrix fields on the fly in broadcast expressions.
- Matrix fields are now automatically promoted to common types for basic arithmetic operations. This involves promoting the entries in the rows, as well as padding rows with zeros when rows have unequal numbers of entries. In addition, `LinearAlgebra.I` (or, rather, `LinearAlgebra.UniformScaling`) is automatically promoted to a `DiagonalMatrixRow` when used with `BandMatrixRow`s. So, for example, it is now possible to evaluate `DiagonalMartrixRow(1) - 2 * TridiagonalMatrixRow(3, 4, 5) - I / 6`.
- Matrix fields are now fully integrated with `RecursiveApply`, which allows a single matrix field to store multiple matrices by using `Tuple`s or `NamedTuple`s as entries instead of scalars. This is achieved by using the new `rzero` function instead of `zero`, and also by making `+`, `-`, `*`, and `/` call `radd`, `rsub`, `rmul`, and `rdiv` when used on `BandMatrixRow`s. This change will make it possible to, e.g., store the derivatives of all tracer tendencies with respect to vertical velocity as a single matrix field in `ClimaAtmos`, which will lead to simpler code and better performance.
- All matrix-matrix and matrix-vector multiplications can now be handled using the `⋅` operator, which is an alias for `MultiplyColumnwiseBandMatrixField()`. The implementation of this operator is much simpler than the implementation of `ApplyStencil`/`ComposeStencil`, and it is effectively localized to the single function `multiply_matrix_at_index`. In order to avoid inference failures for complex matrix field broadcast expressions, the recursion limit for this function needed to be disabled. Unfortunately, just as with the pre-existing code, inlining this function with ``@propagate_inbounds`` drastically increased compilation time, and it also caused inference failures for particularly complicated matrix broadcast expressions. Although not using ``@propagate_inbounds`` causes a 3--4x slowdown in the evaluation of matrix broadcast expressions, it avoids a 10--100x slowdown in their compilation.
- The matrix of the divergence operator can now be properly represented as a matrix of covectors, rather than a matrix of scalars. Since the divergence operator turns vectors into scalars, multiplying its matrix by a matrix/vector of vectors should return a matrix/vector of scalars. However, this is not as simple as just multiplying a covector by a vector and returning a scalar, since the vector may need to be projected onto the dual axis of the covector before the multiplication. So, matrix-matrix and matrix-vector multiplication is now implemented using the `rmul_with_projection` function instead of `rmul`. In general, this function works just like `rmul`, but, when the first argument is a covector (an `AdjointAxisVector`) and the second is a vector or higher-order tensor (an `AxisTensor`), it uses the local geometry to project the second argument before the multiplication. In the future, we may want to generalize this to higher-order cotensors.
- The `MatrixFields` module also includes functions for converting matrix fields to more conventional arrays that are compatible with `LinearAlgebra.jl`. The function `column_field2array` converts a matrix field defined on a column space to a `BandedMatrix` from `BandedMatrices.jl`, and it also converts a regular field defined on a column space to a `Vector`. The function `field2arrays` works for fields defined on multiple columns by calling `column_field2array` on each column. 
- Matrix fields now get displayed in the REPL as actual matrices. Specifically, when a matrix field contains `Number` entries, its first column is passed to `column_field2array_view`, which is identical to `column_field2array`, except that it allocates less memory by returning a view of the field's underlying data. Although `column_field2array` and `column_field2array_view` work for matrix fields with `Tuple`/`NamedTuple` entries, the `show` method for a `BandedMatrix` crashes when displaying such entries. In addition, this `show` method results in somewhat unreadable output for other struct entries, like vectors and covectors. So, all non-`Number` matrix fields still get displayed in the REPL like regular fields.

### Related ClimaCore Changes
- To ensure that promotion and conversion of `BandMatrixRow` entries works correctly for nested `Tuple`/`NamedTuple` entries, the `rpromote_type` and `rconvert` functions have been added to RecursiveApply. The first function uses `rmaptype` to recursively call `promote_type`, and the second function uses `rmap` and `rzero` to make a more type-stable version of Julia's built-in `convert` function.
- To support manipulating matrices of covectors (and, in the future, matrices of higher-order cotensors), all basic arithmetic operations have been defined for `AdjointAxisTensor`. This includes `+`, `-`, `*`, `/`, `\`, and `==`, as well as the `zero` function. The new methods undo the adjoint and fall back to the methods for `AxisTensor`.
- To simplify the setup of the non-scalar matrix tests, `map` has been generalized from only working for a single `Field` to working for multiple `Field`s.

### Tests
- All of the following test files check for correctness, allocations, and type instabilities.
- The file `test/MatrixFields/band_matrix_row.jl` tests low-level operations with `BandMatrixRow`s, ensuring that constructors, conversions, arithmetic operations, and combinations with `LinearAlgebra.I` work for numbers and nested `Tuple`/`NamedTuple` values.
- The file `test/MatrixFields/rmul_with_projection.jl` tests `rmul_with_projection` for all currently supported combinations of scalars, vectors, tensors, and covectors, as well as several combinations of nested `Tuple`/`NamedTuple` values.
- The file `test/MatrixFields/field2array.jl` checks that `column_field2array`, `column_field2array_view`, and `field2arrays` work as expected for very simple matrix fields.
- The file `test/MatrixFields/matrix_field_broadcasting.jl` tests a wide range of scalar and non-scalar matrix field broadcast expressions. It compares each scalar matrix field broadcast expression against an equivalent implementation using `field2arrays` and LinearAlgebra's `mul!`, making sure that the matrix field implementation is roughly as performant as the array implementation. It also compares each non-scalar matrix field broadcast expression (e.g., an expression with matrices of covectors or `NamedTuple`s) against an equivalent scalar matrix field implementation.
- I have also checked that the new implementation is at least as performant as the pre-existing implementation for those cases that the pre-existing API can support. For example, to compare the two implementations for the "diagonal matrix times bi-diagonal matrix times tri-diagonal matrix times quad-diagonal matrix times vector" test case, run the following code after `test/MatrixFields/matrix_field_broadcasting.jl`:
    ```
    using ClimaCore: MatrixFields, Operators
    function compare_to_old_implementation()
        ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec, ᶠvec = random_test_fields(Float64)
        result = `@.` ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        inputs = (ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec)
        func!(result, ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec) =
            `@.` result = ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        time, allocs = `@benchmark` call_func(func!, result, inputs)
        
        to_old_version(row::BMR) where {BMR <: MatrixFields.BandMatrixRow} =
            Operators.StencilCoefs{MatrixFields.outer_diagonals(BMR)...}(row.entries)
        apply = Operators.ApplyStencil()
        compose = Operators.ComposeStencils()
        ᶜᶜmat_old = to_old_version.(ᶜᶜmat)
        ᶜᶠmat_old = to_old_version.(ᶜᶠmat)
        ᶠᶠmat_old = to_old_version.(ᶠᶠmat)
        ᶠᶜmat_old = to_old_version.(ᶠᶜmat)
        result_old = `@.` apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        inputs_old = (ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec)
        func!_old(result_old, ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec) =
            `@.` result_old = apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        time_old, allocs_old = `@benchmark` call_func(func!_old, result_old, inputs_old)

        `@assert` allocs == allocs_old == 0

        `@info` "New implementation time: $time s"
        `@info` "Old implementation time: $time_old s"
    end
    compare_to_old_implementation()
    ```
    This code prints out the following:
    ```
    [ Info: New implementation time: 1.5848e-5 s
    [ Info: Old implementation time: 1.5283e-5 s
    ```

Co-authored-by: Dennis Yatunin <dyatun@gmail.com>
@bors
Copy link
Contributor

bors bot commented Jul 21, 2023

Build failed:

@dennisYatunin
Copy link
Member Author

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 21, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 4c61e33 into main Jul 21, 2023
5 of 6 checks passed
@bors bors bot deleted the dy/matrix_fields branch July 21, 2023 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants