Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wide-scale testing on Earthscope #252

Closed
11 tasks done
kkappler opened this issue Apr 9, 2023 · 2 comments
Closed
11 tasks done

Wide-scale testing on Earthscope #252

kkappler opened this issue Apr 9, 2023 · 2 comments
Assignees

Comments

@kkappler
Copy link
Collaborator

kkappler commented Apr 9, 2023

Scripts are being kept in earthscope_tests branch, in aurora/test_utils/earthscope for now.

Tasks include

  • Accessing XML files from SPUD
    This is done in 00_catalog_SPUD.py
  • Method for testing mt_metadata I/O on SPUD XMLs
    Done in 01_test_load_spud_tfs.py
    Results of these tests are being posted to mt_metadata issue 143
  • method for extracting remote references from tf
    Prototypes exist but needs review/discussion
  • add list of RR stations to tf review csv
    Prototype exists, but since extraction of RR is unstable, so is this.
  • verify that lists of RR stations are accurate -- spot checking manually a few to start
  • Method for mth5 builds from Earthscope for processing
    03_test_download_from_earthscope.py
  • Tracking for test results of mth5 builds
  • Address Issue Inconsistent coverage between metadata and data at IRIS/Earthscope #275
  • Address Issue mda string in SPUD XML IFF IRIS/Earthscope data? #276

Karl ToDo

  • Driver for aurora processing / iterator over valid xml with data to generate TFs
  • TF comparison methods
kkappler added a commit that referenced this issue Apr 9, 2023
This branch is not expected to modify code in aurora.
We can add-back-in the tests when we are ready to merge,
in the meantime it just seems silly to run tests on every commit

When the time comes to merge, we just need to copy
.github/workflows/test.yml back into the repo

[Issue(s): #252]
kkappler added a commit that referenced this issue Apr 9, 2023
also add folder to hold supporting functions if needed.

[Issue(s): #252]
kkappler added a commit that referenced this issue Jun 3, 2023
- modify stage 01  to use get_summary_table_name
- modify stage 01 to add support for remotes_2
- stage 3 is in dev - not working yet
- stage 4 in dev
- add EXPERIMENT_PATH as a place to store inventory/metadata (dataless h5s)
- factor get_remotes_2 out of get_remotes
- add support for summary_table filename make/load

issue #252
kkappler added a commit that referenced this issue Jun 19, 2023
There is an issue when the time intervals are incorrect,
this was handled by returning "None".
It was never expected to be encountered,
But is seems that there are mth5s with end time earlier than start time.

While this should be fixed upstream, for now in order to avoid an
exception in building the kernel_dataset, we should at least
return the correctly shaped output.

Since overlap() method is supposed to return a start_time and and
end_time, returning None is not acceptable, but returning
None, None is OK (at least structurally)

This fix is being inserted to support task: WideScale Testing

[Issue(s): #252]
kkappler added a commit that referenced this issue Jun 19, 2023
kkappler added a commit that referenced this issue Jun 29, 2023
- deprecate unused TMP_FROM_EMTF argument
- add testing control param restrict_to_first_n_rows
- make SPUD paths a dict, keyed by emtf, data, base

[Issue(s): #252]
@kkappler
Copy link
Collaborator Author

kkappler commented Sep 8, 2023

This task has run an entire first pass with aurora results in reasonable agreement with spud in most cases.

A follow up task is to take the six stages of testing:

  • 00_catalog_SPUD.py
  • 01_test_load_spud_tfs.py
  • 02_test_station_inventory_valid.py
  • 03_test_download_from_earthscope.py
  • 04_test_processing_with_aurora.py
  • 05_compare_tfs.py

and wrap them in a common framework. Towards this I forked a widescale_test branch off of earthscope_tests where each of the six steps can be wrapped as an instance of a WideScaleTest class. The idea is that each test has an output table (defined by a schema), and that the table is prepared into a dataframe and then dask iterates over the df.

I am attaching a previous result csv from stage 00, 01, 02 here for comparison with the updated version.

00_spud_xml_scrape.csv

01_spud_xml_review_2023-09-07_203451.csv

@kkappler
Copy link
Collaborator Author

All stages have successfully executed on gadi, as well as on my local machine.

If we were going to do this again, I would make the following updates:

  1. More testing of dask, it was not clear that dask was speeding things up much for stage 01
  2. Merge the h5 files, either into one archive, or at least one archive per survey, rather than one mth5 per station.

For reference, I zipped and attached the summary tables csvs from gadi

summary_tables_gadi_20230928.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants