Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PETSc 3.20.0 broke the world #963

Closed
2 tasks done
guyer opened this issue Oct 5, 2023 · 6 comments · Fixed by #982
Closed
2 tasks done

PETSc 3.20.0 broke the world #963

guyer opened this issue Oct 5, 2023 · 6 comments · Fixed by #982
Labels

Comments

@guyer
Copy link
Member

guyer commented Oct 5, 2023

Breakages with PETSc observed in #946 correspond to PETSc 3.20.0, (released 2023-09-28, conda-forge petsc4py feedstock updated 2023-10-04). Previous PETSc 3.18.4 worked fine.

  • Assembly error

    Exception raised:
        Traceback (most recent call last):
          File "/Users/guyer/mambaforge-arm/envs/petsc4py319/lib/python3.11/doctest.py", line 1351, in __run
            exec(compile(example.source, filename, "single",
          File "<doctest fipy.matrices.petscMatrix._PETScMatrix.__mul__[10]>", line 1, in <module>
            numerix.allclose(numerix.array((1,2,3),'d') * L1, tmp)
                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
          File "/Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py", line 168, in __rmul__
            self.matrix.multTranspose(x, y)
          File "petsc4py/PETSc/Mat.pyx", line 1356, in petsc4py.PETSc.Mat.multTranspose
        petsc4py.PETSc.Error: error code 73
        [0] MatMultTranspose() at /Users/runner/miniforge3/conda-bld/petsc_1695965956423/work/src/mat/interface/matrix.c:2637
        [0] Object is in wrong state
        [0] Not for unassembled vector, did you call VecAssemblyBegin()/VecAssemblyEnd()?
  • Really slow
    Comparison of the runtimes for FiPy's test suite vs petsc4py version (these are only the 40 slowest tests when using petsc4py 3.20.1):
    Comparison of runtimes vs petsc4py version for 40 slowest test runtimes with petsc4py 3.20.1

    Ratio of runtimes for FiPy's test suite between petsc4py versions
    Ratio of runtimes for FiPy's test suite between petsc4py versions

@guyer guyer added the solvers label Oct 5, 2023
@guyer
Copy link
Member Author

guyer commented Nov 28, 2023

Mat.setValuesCSR() got astronomically slower. Compare petsc4py 3.18.4:

Total time: 0.154681 s
File: /Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py
Function: addAt at line 284

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   284                                               @profile
   285                                               def addAt(self, vector, id1, id2):
   286                                                   """
   287                                                   Add elements of `vector` to the positions in the matrix corresponding to (`id1`,`id2`)
   288                                           
   289                                                       >>> L = _PETScMatrixFromShape(rows=3, cols=3, bandwidth=3)
   290                                                       >>> L.put([3.,10.,numerix.pi,2.5], [0,0,1,2], [2,1,1,0])
   291                                                       >>> L.addAt([1.73,2.2,8.4,3.9,1.23], [1,2,0,0,1], [2,2,0,0,2])
   292                                                       >>> print(L)
   293                                                       12.300000  10.000000   3.000000  
   294                                                           ---     3.141593   2.960000  
   295                                                        2.500000      ---     2.200000  
   296                                                   """
   297        33        348.0     10.5      0.2          self.matrix.assemble(self.matrix.AssemblyType.FLUSH)
   298        66     154330.0   2338.3     99.8          self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
   299        33          3.0      0.1      0.0                                   addv=True)

to petsc4py 3.20.1

Total time: 82.0462 s
File: /Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py
Function: addAt at line 284

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   284                                               @profile
   285                                               def addAt(self, vector, id1, id2):
   286                                                   """
   287                                                   Add elements of `vector` to the positions in the matrix corresponding to (`id1`,`id2`)
   288                                           
   289                                                       >>> L = _PETScMatrixFromShape(rows=3, cols=3, bandwidth=3)
   290                                                       >>> L.put([3.,10.,numerix.pi,2.5], [0,0,1,2], [2,1,1,0])
   291                                                       >>> L.addAt([1.73,2.2,8.4,3.9,1.23], [1,2,0,0,1], [2,2,0,0,2])
   292                                                       >>> print(L)
   293                                                       12.300000  10.000000   3.000000  
   294                                                           ---     3.141593   2.960000  
   295                                                        2.500000      ---     2.200000  
   296                                                   """
   297        33       3853.0    116.8      0.0          self.matrix.assemble(self.matrix.AssemblyType.FLUSH)
   298        66   82042326.0    1e+06    100.0          self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
   299        33          3.0      0.1      0.0                                   addv=True)

(self._ijv2csr() did not change)

@guyer
Copy link
Member Author

guyer commented Nov 28, 2023

@guyer
Copy link
Member Author

guyer commented Nov 28, 2023

PETSc matrix preallocation has been commented out since the suite was added to FiPy

  • Uncommenting matrix preallocation
    matrix.setPreallocationNNZ(bandwidth) # FIXME: ??? None, bandwidth
    raises exceptions
    Traceback (most recent call last):
      :      
      File "/Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py", line 291, in addAt
        self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
      File "PETSc/Mat.pyx", line 1019, in petsc4py.PETSc.Mat.setValuesCSR
      File "PETSc/petscmat.pxi", line 1008, in petsc4py.PETSc.matsetvalues_csr
      File "PETSc/petscmat.pxi", line 1001, in petsc4py.PETSc.matsetvalues_ijv
    petsc4py.PETSc.Error: error code 63
    [0] MatSetValues() at /Users/runner/miniforge3/conda-bld/petsc_1675165805075/work/src/mat/interface/matrix.c:1474
    [0] MatSetValues_SeqAIJ() at /Users/runner/miniforge3/conda-bld/petsc_1675165805075/work/src/mat/impls/aij/seq/aij.c:450
    [0] Argument out of range
    [0] New nonzero at (0,0) caused a malloc
    Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check
  • Turning off the nonzero allocation check
    matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)
    leads to very slow tests
    • 49 s for petsc4py 3.18.4 with no preallocation
    • 503 s for petsc4py 3.18.4 with (faulty) preallocation
    • 1570 s for petsc4py 3.20.1 with no preallocation

@guyer
Copy link
Member Author

guyer commented Nov 28, 2023

setPreallocationNNZ() didn't work because it's the number of non-zeros in the matrix, not per row (like Trilinos).

setPreallocationNNZ() sets the number of non-zeros per row, but setting it to zero is "bad".

Making a simplistic attempt at preallocation

@@ -482,7 +482,7 @@ class _PETScMatrix(_SparseMatrix):
 
 class _PETScMatrixFromShape(_PETScMatrix):
 
-    def __init__(self, rows, cols, bandwidth=0, sizeHint=None, matrix=None, comm=PETSc.COMM_SELF):
+    def __init__(self, rows, cols, bandwidth=1, sizeHint=None, matrix=None, comm=PETSc.COMM_SELF):
         """Instantiates and wraps a PETSc `Mat` matrix
 
         Parameters
@@ -510,13 +510,14 @@ class _PETScMatrixFromShape(_PETScMatrix):
             matrix.setSizes([[rows, None], [cols, None]])
             matrix.setType('aij') # sparse
             matrix.setUp()
-#             matrix.setPreallocationNNZ(bandwidth) # FIXME: ??? None, bandwidth
-#             matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)
+            if bandwidth > 0:
+                matrix.setPreallocationNNZ(bandwidth)
+                matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)

combined with setting bandwidth=1 in all subclasses leads to tests running:

  • in 58 s for petsc4py 3.18.4 (on battery; same as master).
  • in 66 s for petsc4py 3.20.1

@guyer
Copy link
Member Author

guyer commented Nov 29, 2023

Comparison of the runtimes for FiPy's test suite vs petsc4py version, including case of naïve preallocation (these are only the 40 slowest tests when using petsc4py 3.20.1):
Comparison of runtimes vs petsc4py version for 40 slowest test runtimes with petsc4py 3.20.1

Ratio of runtimes for FiPy's test suite between petsc4py versions, when using naïve preallocation
Ratio of runtimes for FiPy's test suite between petsc4py versions, when using naïve preallocation

Overall, about the same, and worst cases are within a factor of 4.

@guyer
Copy link
Member Author

guyer commented Nov 29, 2023

Turning on MAT_NEW_NONZERO_ALLOCATION_ERR produces 34 failures in the test suite. Better preallocation might recover more of the slowdown, but this calls for a deeper look at bandwidth= and sizeHint=.

@guyer guyer closed this as completed in #982 Dec 4, 2023
guyer added a commit that referenced this issue Dec 4, 2023
Fixes #963
(Actually, PETSC 3.19.0 broke the world.)

This PR:
- Assembles before `multTranspose` to prevent newly added exception
- Renames `bandwidth` to `nonZerosPerRow` and removes `sizeHint`

  The two were confusingly redundant:
  - PySparse takes `sizeHint`, the number of non-zeros in the matrix.
  - PyTrilinos takes `NumEntriesPerRow`.
  - petsc4py didn't used to be clear what it took, but is now
    documented as number of non-zeros per row (of the local portion
    of the matrix, but we'll ignore that part).
  - scipy doesn't preallocate.
  - Linear algebra
    [defines bandwidth](https://en.wikipedia.org/wiki/Band_matrix#Bandwidth)
    as "number $k$ such that $a_{i,j}=0$ if $|i-j| > k$", which is
    roughly half the number of non-zeros per row (and only applies
    to a band-diagonal matrix).
    Better to be explicit about what we really mean.

  Now all take same parameter and PySparse adjusts as needed.

  `sizeHint` was introduced in @a15d696 (in 2006!) to
  "allow smaller preallocations", but it was never used that way.
  Now, `nonZerosPerRow` can take an array_like to specify row-by-row
  preallocations, which are directly supported by PyTrilinos and petsc4py,
  and can be simulated for PySparse.

  Added `exactNonZeros`, which may have performance benefits for
  PyTrilinos and petsc4py. Currently unused.
- Uses `Term`'s knowledge of own stencil to preallocate more effectively.
  Still doesn't do a good job with vector equations, but that's a deeper change
  (the resolution of which might help #920).

* Assemble before multTranspose

* Rename `bandwidth` to `nonZerosPerRow` and remove `sizeHint`

The two were confusingly redundant:
- PySparse takes `sizeHint`, the number of non-zeros in the matrix.
- PyTrilinos takes `NumEntriesPerRow`.
- petsc4py didn't used to be clear what it took, but is now
  documented as number of non-zeros per row (of the local portion
  of the matrix, but we'll ignore that part).
- scipy doesn't preallocate.
- Linear algebra
  [defines bandwidth](https://en.wikipedia.org/wiki/Band_matrix#Bandwidth)
  as "number $k$ such that $a_{i,j}=0$ if $|i-j| > k$", which is
  roughly half the number of non-zeros per row (and only applies
  to a band-diagonal matrix).
  Better to be explicit about what we really mean.

Now all take same parameter and PySparse adjusts as needed.

`sizeHint` was introduced in @a15d696 (in 2006!) to
"allow smaller preallocations", but it was never used that way.
Now, `nonZerosPerRow` can take an array_like to specify row-by-row
preallocations, which are directly supported by PyTrilinos and petsc4py,
and can be simulated for PySparse.

Added `exactNonZeros`, which may have performance benefits for
PyTrilinos and petsc4py. Currently unused.

* Fix(?) conda/mamba installs

* Fix(?) race condition
guyer added a commit that referenced this issue Dec 4, 2023
Fixes #963
(Actually, PETSC 3.19.0 broke the world.)

This PR:
- Assembles before `multTranspose` to prevent newly added exception
- Renames `bandwidth` to `nonZerosPerRow` and removes `sizeHint`

  The two were confusingly redundant:
  - PySparse takes `sizeHint`, the number of non-zeros in the matrix.
  - PyTrilinos takes `NumEntriesPerRow`.
  - petsc4py didn't used to be clear what it took, but is now
    documented as number of non-zeros per row (of the local portion
    of the matrix, but we'll ignore that part).
  - scipy doesn't preallocate.
  - Linear algebra
    [defines bandwidth](https://en.wikipedia.org/wiki/Band_matrix#Bandwidth)
    as "number $k$ such that $a_{i,j}=0$ if $|i-j| > k$", which is
    roughly half the number of non-zeros per row (and only applies
    to a band-diagonal matrix).
    Better to be explicit about what we really mean.

  Now all take same parameter and PySparse adjusts as needed.

  `sizeHint` was introduced in @a15d696 (in 2006!) to
  "allow smaller preallocations", but it was never used that way.
  Now, `nonZerosPerRow` can take an array_like to specify row-by-row
  preallocations, which are directly supported by PyTrilinos and petsc4py,
  and can be simulated for PySparse.

  Added `exactNonZeros`, which may have performance benefits for
  PyTrilinos and petsc4py. Currently unused.
- Uses `Term`'s knowledge of own stencil to preallocate more effectively.
  Still doesn't do a good job with vector equations, but that's a deeper change
  (the resolution of which might help #920).
- Fixes(?) conda/mamba installs
- Fixes(?) race condition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant