Add dtype argument to EquivalentSources #278

santisoler · 2021-11-10T13:43:56Z

The dtype argument is used to cast the inputs of the fit method, to build
the location of the point sources, to allocate the Jacobian matrix and
therefore to produce predictions. Add a new utility function for casting the
inputs of the fit method to the desired dtype. Add new test file for the
equivalent sources utility functions. Add tests for the new feature. Simplify
test suite for equivalent sources through fixtures.

Reminders:

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

The dtype argument is used to cast the inputs of the fit method, to build the location of the point sources, to allocate the Jacobian matrix and therefore to produce predictions. Add a new utility function for casting the inputs of the fit method to the desired dtype. Add new test file for the equivalent sources utility functions. Add tests for the new feature.

Also removes the comment to ignore pylint warning that doesn't apply anymore.

santisoler · 2021-11-10T15:13:53Z

harmonica/tests/test_eq_sources_cartesian.py

+    # Check the data type of the source coefficients
+    #  assert eqs.coefs_.dtype == np.dtype(dtype)


@leouieda I've noticed that the coefs_ are casted to float64 even if the inputs to vdb.least_squares are all float32. Even that, the predictions are created as float32.

I was trying to track this down. It seems that scikit-learn uses the dtype of the Jacobian for everything. But the coeffs are actually coming from scipy.linalg.solve according to this line. The docs for solve don't mention dtype so it could be that they always return a float64 from the underlying LAPACK functions.

It would be fine to leave the coefs the way they are returned since they usually don't require much storage. It's mostly the Jacobian that's affected.

leouieda · 2021-11-12T15:44:19Z

Any thoughts on defaulting to float32?

santisoler · 2021-11-12T16:51:51Z

Any thoughts on defaulting to float32?

I ran this little script to compare the accuracy of float32 results:

import verde as vd
import harmonica as hm
import matplotlib.pyplot as plt

region = (-3e3, -1e3, 5e3, 7e3)
shape = (40, 40)
coordinates = vd.grid_coordinates(region=region, shape=shape, extra_coords=0)
points = vd.grid_coordinates(region=region, shape=(6, 6), extra_coords=-1e3)
masses = vd.datasets.CheckerBoard(amplitude=1e13, region=region).predict(points)
data = hm.point_mass_gravity(coordinates, points, masses, field="g_z")

eqs64 = hm.EquivalentSources(dtype="float64")
eqs64.fit(coordinates, data)
prediction_64 = eqs64.predict(coordinates)

eqs32 = hm.EquivalentSources(dtype="float32")
eqs32.fit(coordinates, data)
prediction_32 = eqs32.predict(coordinates)


fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)

maxabs = vd.maxabs(prediction_32, prediction_64)

pc = ax1.pcolormesh(
    *coordinates[:2], prediction_32, vmin=-maxabs, vmax=maxabs, cmap="seismic"
)
plt.colorbar(pc, ax=ax1)
ax1.set_title("Prediction 32bits")
ax1.set_aspect("equal")

pc = ax2.pcolormesh(
    *coordinates[:2], prediction_64, vmin=-maxabs, vmax=maxabs, cmap="seismic"
)
plt.colorbar(pc, ax=ax2)
ax2.set_title("Prediction 64bits")
ax2.set_aspect("equal")

diff = data - prediction_32
maxabs = vd.maxabs(diff)

pc = ax3.pcolormesh(*coordinates[:2], diff, vmin=-maxabs, vmax=maxabs, cmap="seismic")
plt.colorbar(pc, ax=ax3)
ax3.set_title("Diff 32bits")
ax3.set_aspect("equal")

diff = data - prediction_64
maxabs = vd.maxabs(diff)

pc = ax4.pcolormesh(*coordinates[:2], diff, vmin=-maxabs, vmax=maxabs, cmap="seismic")
plt.colorbar(pc, ax=ax4)
ax4.set_title("Diff 64bits")
ax4.set_aspect("equal")

plt.show()

With the following result:

The errors of the float32 are 5 orders of magnitude higher than using float64. In the light of these results, I would say to keep float64 by default.

In case the dataset is large enough so it won't fit in memory, but not soooo large, would be interesting to see a comparison between using float32 and gradient-boosted equivalent sources. I suspect float32 would give better results.

leouieda · 2021-11-13T13:41:03Z

👍 sounds good to me! We can add a tutorial about this at some point. Merge away!

santisoler added 3 commits November 10, 2021 10:41

Improve docstring of jacobian method

a3deda1

Also removes the comment to ignore pylint warning that doesn't apply anymore.

Fix pylint complains

9dba79b

santisoler added the enhancement Idea or request for a new feature label Nov 10, 2021

santisoler added 3 commits November 10, 2021 11:08

Remove unwanted comment

f5368a1

Simplify eqs cartesian tests through fixtures

6b35224

Add damping parameter on test_dtype

f67949b

santisoler commented Nov 10, 2021

View reviewed changes

Remove unused arguments from test functions

efa8abd

santisoler requested a review from leouieda November 10, 2021 15:55

Add test function for checking accuracy when dtype is float32

6d3bef4

santisoler merged commit 24e7ef4 into master Nov 15, 2021

santisoler deleted the eqs-dtype branch November 15, 2021 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dtype argument to EquivalentSources #278

Add dtype argument to EquivalentSources #278

santisoler commented Nov 10, 2021 •

edited

Loading

santisoler Nov 10, 2021

leouieda Nov 12, 2021

leouieda commented Nov 12, 2021

santisoler commented Nov 12, 2021

leouieda commented Nov 13, 2021

		# Check the data type of the source coefficients
		# assert eqs.coefs_.dtype == np.dtype(dtype)

Add dtype argument to EquivalentSources #278

Add dtype argument to EquivalentSources #278

Conversation

santisoler commented Nov 10, 2021 • edited Loading

santisoler Nov 10, 2021

Choose a reason for hiding this comment

leouieda Nov 12, 2021

Choose a reason for hiding this comment

leouieda commented Nov 12, 2021

santisoler commented Nov 12, 2021

leouieda commented Nov 13, 2021

santisoler commented Nov 10, 2021 •

edited

Loading