-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Organize baseline images by method and/or cache baseline images during GitHub workflows #2117
Comments
Sounds good. Another possible option: running |
Here are the number of test images grouped by module-name at commit 561eb41. Maybe we can start opening individual PRs for the modules with >10 baseline images like
Not sure if we could just skip the modules with 1 baseline image? Or should we just do all for consistency? We might see at some point that the number of HTTP calls falls below the threshold once the top modules are all completed. P.S., Here's the script to generate the statistics: import os
import glob
import pandas as pd
png_images = glob.glob("pygmt/tests/baseline/test_*.png")
df = pd.DataFrame(
data=[os.path.splitext(os.path.basename(filepath))[0] for filepath in png_images],
columns=["filename"],
)
df_modulename = df.filename.str.split("_", expand=True)[1]
print(len(df_modulename))
print(df_modulename.value_counts()) |
Thanks @weiji14 for compiling that information! The plan to start with the methods with many baseline images sounds good to me. I don't think we'll need to do the methods with just one image if the tests stop failing frequently for this reason. |
Or if you want to give @seisman's suggestion a quick try, could modify pygmt/.github/workflows/ci_tests.yaml Lines 120 to 124 in 561eb41
|
I think it's best to try to fix the root of the problem, which is that we are not using dvc in an optimal way. |
I'm on board with the plan to make one .dvc file per module. Is the process as simple as moving all of the baseline images from a specifc module (e.g. |
I don't think it will work. The tests will fail to find the baseline images. |
I don't think we will organize baseline images by method (i.e., tracking directories rather than tracking individual files) as proposed in the OP. The main reasons are:
So, I'm inclined to close the issue. Feel free to re-open it if you don't agree. |
Description of the desired feature
I've noticed that our test workflows often fail during the dvc pull step due to a failure to download one or two of the baseline images. For example, 8 of the last 10 runs in https://github.com/GenericMappingTools/pygmt/actions/workflows/ci_tests.yaml report at least one job failing for this reason. This is because it is unstable (and slow) to rely on >160 https calls.
This proposal is to restructure the baseline images according to the suggestion in #1490 (comment), which is to organize the baseline images in directories by method with one .dvc file for each directory rather than a 1:1 match between baseline images and .dvc files. This should reduce the number of https calls from >160 to ~25, which would increase stability and speed. The DAGsHub team previously mentioned that these problems could be eventually fixed by using a different connection protocol, but I don't think we should wait on that.
Another option is to cache the dvc files (both .dvc/cache and pygmt/test/baseline/*.png) similar to how we cache the .gmt files so that the dvc pull step only updates outdated files. But this is really only working around the core issue, so I suggest that this would be in addition to the restructure proposed above.
Are you willing to help implement and maintain this feature? Yes
The text was updated successfully, but these errors were encountered: