-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of multiple acquisition runs as a single processing run #80
Comments
ProcessingRunTS class could be used to manage, or at least reference the data
Run merging requirements Nan filling is a general solution with two potential complications |
Ideally we want a model that is completely general (for disjoint time series). The TSCollection is associated with a list of time intervals ℐ0 = {(a,b)_i such that all data to be processed are in the Union of (a,b)_i). The individual elements of ℐ0 are normally acquisition runs, or intervals properly contained in acquisition runs, but we want to be careful not to exclude the case of joining acquisition runs via some gap-fill technique. For example, a few long acquisition runs, with only a short gap in between may want to be processed for very long periods (longer than either acquisition run can yield alone). A companion set of intervals, where synthetic data (interp, iawrw, etc) can be overlain on the original set. The companion set basically specifies intervals that should be infilled so that runs can be treated as continuous. This is particularly useful in the case of a few missing samples, but could have wider application. |
The place where this will be implemented in the code is the function process_mth5_run in aurora/pipelines/process_mth5.py The current function structure is:
To support multiple runs we could replace run_id (currently a string) with optionally with a list of strings, each specifying a run. Implementing this change does not look too complicated. The function structure would stay very similar. Instead of extracting a single run, and STFT and process, we would instead extract each run in the list, and STFT each individually. Then the STFTs would be merged together in one xarray of spectral measurements and that array would be passed to the TF estimation method. This solution should work in general for single station processing. For multiple station processing there is one more layer to consider here. The run labels will not in general be the same for different stations. We would need an iterable of runs for the station of interest, and also an iterable of runs for the remote reference station. The determination of which runs will be processed is currently not supported. When there are many stations (MMT) we would need to handle many subcases. Might need another version of |
While working on issue#80, and PR184, have noticed that processing config defaults to estimator.engine = "RME_RR". This is fine, but I find I need to specify to use "RME" explicitly when there is only one station. So a couple fixes were added: 1. Processing class now has a validate() method. If there is no RR station, _and_ the estimator.engine is "RME_RR", it gets reset to "RME". Also added the ability to pass a kwarg to ConfigCreator instance called estimator. The kwarg is a dict and if "engine" is a key, it will overwrite the estimator with the corresponding value. The parkfield SS run test was updated to use the config_creator method. cas04 test is usign validate() [Issue(s): #80]
Allow request list to have mulitple stations and modify channel_summary_to_make_mth5 to groupby station,run rather than just run. Add tests of make multistation mth5 to cas04 tests. [Issue(s): #80]
Replaced dict with classes. Now have a SyntheticRun and a SyntheticStation. This will be used to create an example synthetic case with many runs [Issue(s): #80]
Change from timedelta.seconds to timedelta.total_seconds() Remove run_id from sort_by, it should be only station, starttime [Issue(s): #80]
This is done in frequency domain. Issue #152 is still open about doing this in time domain. |
Use Cases:
Case 2 can be handled by breaking process_mth5_decimation_level into:
#stft_agg = []
#for run in runlist:
# stft_obj = make_stft_decimation_level()
#stft_agg.append(stft_obj)
# tf_obj = process_stft_decimation_level(stft_agg)
The text was updated successfully, but these errors were encountered: