-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dtype argument to EquivalentSources #278
Conversation
The dtype argument is used to cast the inputs of the fit method, to build the location of the point sources, to allocate the Jacobian matrix and therefore to produce predictions. Add a new utility function for casting the inputs of the fit method to the desired dtype. Add new test file for the equivalent sources utility functions. Add tests for the new feature.
Also removes the comment to ignore pylint warning that doesn't apply anymore.
# Check the data type of the source coefficients | ||
# assert eqs.coefs_.dtype == np.dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leouieda I've noticed that the coefs_
are casted to float64
even if the inputs to vdb.least_squares
are all float32
. Even that, the predictions are created as float32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to track this down. It seems that scikit-learn uses the dtype of the Jacobian for everything. But the coeffs are actually coming from scipy.linalg.solve according to this line. The docs for solve
don't mention dtype so it could be that they always return a float64
from the underlying LAPACK functions.
It would be fine to leave the coefs the way they are returned since they usually don't require much storage. It's mostly the Jacobian that's affected.
Any thoughts on defaulting to float32? |
I ran this little script to compare the accuracy of float32 results: import verde as vd
import harmonica as hm
import matplotlib.pyplot as plt
region = (-3e3, -1e3, 5e3, 7e3)
shape = (40, 40)
coordinates = vd.grid_coordinates(region=region, shape=shape, extra_coords=0)
points = vd.grid_coordinates(region=region, shape=(6, 6), extra_coords=-1e3)
masses = vd.datasets.CheckerBoard(amplitude=1e13, region=region).predict(points)
data = hm.point_mass_gravity(coordinates, points, masses, field="g_z")
eqs64 = hm.EquivalentSources(dtype="float64")
eqs64.fit(coordinates, data)
prediction_64 = eqs64.predict(coordinates)
eqs32 = hm.EquivalentSources(dtype="float32")
eqs32.fit(coordinates, data)
prediction_32 = eqs32.predict(coordinates)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
maxabs = vd.maxabs(prediction_32, prediction_64)
pc = ax1.pcolormesh(
*coordinates[:2], prediction_32, vmin=-maxabs, vmax=maxabs, cmap="seismic"
)
plt.colorbar(pc, ax=ax1)
ax1.set_title("Prediction 32bits")
ax1.set_aspect("equal")
pc = ax2.pcolormesh(
*coordinates[:2], prediction_64, vmin=-maxabs, vmax=maxabs, cmap="seismic"
)
plt.colorbar(pc, ax=ax2)
ax2.set_title("Prediction 64bits")
ax2.set_aspect("equal")
diff = data - prediction_32
maxabs = vd.maxabs(diff)
pc = ax3.pcolormesh(*coordinates[:2], diff, vmin=-maxabs, vmax=maxabs, cmap="seismic")
plt.colorbar(pc, ax=ax3)
ax3.set_title("Diff 32bits")
ax3.set_aspect("equal")
diff = data - prediction_64
maxabs = vd.maxabs(diff)
pc = ax4.pcolormesh(*coordinates[:2], diff, vmin=-maxabs, vmax=maxabs, cmap="seismic")
plt.colorbar(pc, ax=ax4)
ax4.set_title("Diff 64bits")
ax4.set_aspect("equal")
plt.show() With the following result: The errors of the float32 are 5 orders of magnitude higher than using float64. In the light of these results, I would say to keep float64 by default. In case the dataset is large enough so it won't fit in memory, but not soooo large, would be interesting to see a comparison between using float32 and gradient-boosted equivalent sources. I suspect float32 would give better results. |
👍 sounds good to me! We can add a tutorial about this at some point. Merge away! |
The
dtype
argument is used to cast the inputs of thefit
method, to buildthe location of the point sources, to allocate the Jacobian matrix and
therefore to produce predictions. Add a new utility function for casting the
inputs of the fit method to the desired
dtype
. Add new test file for theequivalent sources utility functions. Add tests for the new feature. Simplify
test suite for equivalent sources through fixtures.
Reminders:
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
and the base__init__.py
file for the package.AUTHORS.md
file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.