Filters not included for all channels #157

kkappler · 2023-07-22T22:10:38Z

This relates to aurora issue 252 Wide scale testing on Earthscope.

In an effort to better understand disagreements in TFs between aurora and SPUD, a closer look at station CAS04 was undertaken. CAS04 does result in an incorrect TF, and the reason appears to be because not all channel-runs are associated with filters in the mth5.

I verify that this problem can be reproduced by running the jupyter notebook:
mth5/docs/examples/notebooks/make_mth5_driver_v0.2.0.ipynb

I modified the notebook on fc branch, adding two cells. Right after the the channel_summary_df is created, an iterrows operation is performed, and for each row of the channel_summary, the number of filters is counted, and recorded in a new column "n_filters".

Three entries in that table have no filters, channel ey in run c and channels ex,ey in run d. We may have encountered this before in aurora issue 31, however, I believe we worked around it with a custom XML provided by Tim. The metadata at IRIS look OK now though.

Using IRIS data services, we get this xml for which I manually checked that there do appear to be filters associated with every channel.

So, something is likely going wrong in the mth5 build. As an additional hint, when I build the "dataless" mth5 for CAS04, rather than having the expected number of rows (20, 5ch x 4 runs) there are 17 rows, and the missing rows correspond to exactly the rows that have no filters in the mth5.

The text was updated successfully, but these errors were encountered:

There was a hardcoded assumption in get_inventory_from_df that only one valid value of channel instance should be added to the list of returned channels from a station. This is not only false in the case of wildcards (e.g. *F*, *Q*), it is also false when a channel has been broken into multiple runs. CAS04 is a good example of this. See mth5 issue #157, and aurora 277 for more details.

kkappler · 2023-07-29T02:28:13Z

This change resulted in obtaining TFs that are in good agreement with SPUD on the CAS04 tests as well as no rows with zero filters in the channel_summary, so I think we want to keep the change.

@laura-iris : Could you please also take a look at the logic in get_inventory_from_df and see if it can be simplified somewhat ? Originally I had hoped to just iterate a single function over rows of the request_df, facilitating paralellization. It may be that simplifying the request_df before it hits get_inventory_from_df is appropriate (e.g. converting wildcards to explicit network-station-channel lists, and maybe other simplifications).

Some notes are in issue #157 and aurora issue 277

laura-iris · 2023-07-31T21:38:44Z

This reminds me a lot of some work I did last year, and when I was looking through my code and notes I see that what appears to be the exact same function used to live in make_mth5.py instead of fdsn.py. I had worked on looping over the epochs and simplifying some of the logic at that point. It seems like the changes weren't translated over to fdsn.py.

@kkappler can you take a look at this old merge request and let me know if it also looks like the same code? #106

kkappler · 2023-08-01T02:55:10Z

@laura-iris This does look like the same code (cleaned up and fixed already!) It looks like it makes sense to try and replace the existing get_inventory_from_df method in fdsn.py on fc branch with the fix you had made previously in in make_mth5.py, that had been merged into master

I don't completely understand it, but here is what it looks like happened:

In an effort to generalize make_mth5.py so that is can get data from clients other than earthscope/IRIS, @kujaku11 modified make_mth5.py to use a "client plugin model". In the current implementation there are two client plugins, fdsn.FDSN, and geomag.GeomagClient (for Intermagnet observatories). In future, other clients can follow this pattern.

The factoring of fdsn.py out of make_mth5.py was started on Feb 4, 2022, on the clients branch of mth5.
The PR #82 for the clients branch was closed and merged into master on March 27, 2023.

However, your fix to issue #105 was merged into master on August 19, 2022 (while clients branch was still in development). At this point in time however, clients had never been merged into master, so you had no way to know that code had been factored out of make_mth5.py.

It looks like what happened is that somehow during this generalization process an older version of the data wrangling wound up in fdsn. It also looks like this had already gone pete tong on master by Nov 14, 2022.

kkappler · 2023-08-03T23:03:54Z

@laura-iris I took a look at the modifications you made to make_mth5 and have blended those into fdsn.py, together with some clean up I did as well.

I kept your structure of using
self._process_list()
which uses a dict to select based on mth5_version
self._run_010 or self._run_020, and your idea of self._loop_stations() method.
However, the logic in loop_stations uses my self.wrangle_runs_into_containers() method, which has a simplified logic from the old loop_statons.

I'll commit to fix_issue_157 and test this weekend.

Re-add methods: - _loop_stations - _run_010 - _run_020 - _process_list These functions were lost in a merge sometime last year. Some additional notes are in issue #157

Replace logic in make_mth5_from_fdsn_client with a call to self._process_list issue #157

kkappler · 2023-08-06T16:57:28Z

See mt_metadata issue 151

There is a bug in mt_metadata/mt_metadata/timeseries/stationxml/xml_channel_mt_channel.py causing
multiple newly discovered filters to be renamed with the same name.

kkappler · 2023-08-06T20:22:31Z

Substantially modified the code (but not the ultimate approach) in mth5/clients/fdsn.py.

Replaced nested loop logic with alternative approach

Make one dict of networks and one dict of stations up front
loop over df only one time, packing channels into stations
then pack stations into networks
finally pack networks into inventory and return (with data if requested)

This should be much easier to debug in future.

The only complexity is this idea that you can have namespace clashes with
network_id. We handle this (as per tronan's original version) by
pairing network with a timestamp.

@laura-iris Can you take a look at my latest commit and see if it makes sense to you?

kkappler · 2023-08-12T03:52:19Z

@laura-iris and I discussed this today, the logic reproduces the old code, BUT, still does not generically support wildcards ... consider the case if a network code has multiple instances, in different time periods, then we only receive the zeroth instance.

This should be handled in a later SOW, and the logic for handling should be in either validate dataframe or in in the network handling (build_network_dict) where multiple start-times for a network (station) would be handled

kkappler · 2023-08-12T19:39:16Z

Filters are now correctly included for all channels, but are not showing up in the second run:

kkappler · 2023-08-12T23:19:54Z

@kujaku11 This is seems to be because we need an additional operation in fdsn.py

I am not sure where the best place to put it is ... here is a snapshot of the main flow of make_mth5_from_fdsn_client

In this case, the inventory has only 5-channels (they never changed):

but the streams implies two runs:

experiment = translator.xml_to_mt(inv)
is unaware that there are multiple runs, because it sees only the inventory,

So we need to do one of the following:

A) Count the number of runs implied by streams, and then augment the experiement
B) Before closing the mth5, (which is aware that there are two runs) check for filterless channels, and if they exist, deduce the appropriate filters
C) something else
The important things to be aware of here are:
1: It is possible to have multiple instances of the same channel name, and this seems to have been handled

It is possible to have one channel with multiple runs (this is where we have problems today)

I think that we can take a tack like (B) with a function called: repair_missing_filters(m)
This would look for missing filters, and then if they are found, see if the same channel has filters on an earlier run.

kkappler · 2023-08-13T01:37:40Z

The following workaround is being tested, which at least allows me to move forward in the analysis of TF values (it is committed in sandbox/mth5_helpers.py on earthscope_tests branch of aurora)

kujaku11 · 2023-08-18T23:33:00Z

@kkappler have a look at commit afc39f0. I added some logic if there are more runs found in the data than the metadata. It will use metadata from the 1st run and channels to populate metadata in other runs.

kkappler · 2023-08-24T22:19:59Z

@kujaku11 almost, but the both the first and second runs are named 001

Almost certain we just need to pop the "id" key from metadata dict before overwriting -- working on fix now

kujaku11 · 2023-09-23T23:49:18Z

Fixed with update of FDSN

kkappler added bug Something isn't working Filters Functionality dealing with how filters are used, applied, and described labels Jul 22, 2023

kkappler assigned kujaku11 Jul 23, 2023

kkappler mentioned this issue Jul 29, 2023

Modify FDSN logic to acknowledge multiple channels #158

Merged

2 tasks

kkappler assigned laura-iris Jul 29, 2023

kkappler added a commit that referenced this issue Aug 4, 2023

Add old functions to fdsn from make_mth5

e41eccb

Re-add methods: - _loop_stations - _run_010 - _run_020 - _process_list These functions were lost in a merge sometime last year. Some additional notes are in issue #157

kkappler added a commit that referenced this issue Aug 4, 2023

Factoring

d862124

Replace logic in make_mth5_from_fdsn_client with a call to self._process_list issue #157

kkappler added a commit that referenced this issue Aug 25, 2023

fix non-unique run names, #157

73d4670

kujaku11 closed this as completed Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filters not included for all channels #157

Filters not included for all channels #157

kkappler commented Jul 22, 2023

kkappler commented Jul 29, 2023 •

edited

Loading

laura-iris commented Jul 31, 2023

kkappler commented Aug 1, 2023

kkappler commented Aug 3, 2023

kkappler commented Aug 6, 2023

kkappler commented Aug 6, 2023

kkappler commented Aug 12, 2023

kkappler commented Aug 12, 2023

kkappler commented Aug 12, 2023

kkappler commented Aug 13, 2023

kujaku11 commented Aug 18, 2023

kkappler commented Aug 24, 2023 •

edited

Loading

kujaku11 commented Sep 23, 2023

Filters not included for all channels #157

Filters not included for all channels #157

Comments

kkappler commented Jul 22, 2023

kkappler commented Jul 29, 2023 • edited Loading

laura-iris commented Jul 31, 2023

kkappler commented Aug 1, 2023

kkappler commented Aug 3, 2023

kkappler commented Aug 6, 2023

kkappler commented Aug 6, 2023

kkappler commented Aug 12, 2023

kkappler commented Aug 12, 2023

kkappler commented Aug 12, 2023

kkappler commented Aug 13, 2023

kujaku11 commented Aug 18, 2023

kkappler commented Aug 24, 2023 • edited Loading

kujaku11 commented Sep 23, 2023

kkappler commented Jul 29, 2023 •

edited

Loading

kkappler commented Aug 24, 2023 •

edited

Loading