Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filters not included for all channels #157

Closed
kkappler opened this issue Jul 22, 2023 · 13 comments
Closed

Filters not included for all channels #157

kkappler opened this issue Jul 22, 2023 · 13 comments
Assignees
Labels
bug Something isn't working Filters Functionality dealing with how filters are used, applied, and described

Comments

@kkappler
Copy link
Collaborator

This relates to aurora issue 252 Wide scale testing on Earthscope.

In an effort to better understand disagreements in TFs between aurora and SPUD, a closer look at station CAS04 was undertaken. CAS04 does result in an incorrect TF, and the reason appears to be because not all channel-runs are associated with filters in the mth5.

I verify that this problem can be reproduced by running the jupyter notebook:
mth5/docs/examples/notebooks/make_mth5_driver_v0.2.0.ipynb

I modified the notebook on fc branch, adding two cells. Right after the the channel_summary_df is created, an iterrows operation is performed, and for each row of the channel_summary, the number of filters is counted, and recorded in a new column "n_filters".

Three entries in that table have no filters, channel ey in run c and channels ex,ey in run d. We may have encountered this before in aurora issue 31, however, I believe we worked around it with a custom XML provided by Tim. The metadata at IRIS look OK now though.

Using IRIS data services, we get this xml for which I manually checked that there do appear to be filters associated with every channel.

So, something is likely going wrong in the mth5 build. As an additional hint, when I build the "dataless" mth5 for CAS04, rather than having the expected number of rows (20, 5ch x 4 runs) there are 17 rows, and the missing rows correspond to exactly the rows that have no filters in the mth5.

@kkappler kkappler added bug Something isn't working Filters Functionality dealing with how filters are used, applied, and described labels Jul 22, 2023
kkappler added a commit that referenced this issue Jul 29, 2023
There was a hardcoded assumption in get_inventory_from_df that
only one valid value of channel instance should be added to the list
of returned channels from a station.

This is not only false in the case of wildcards (e.g. *F*, *Q*),
it is also false when a channel has been broken into multiple runs.

CAS04 is a good example of this.

See mth5 issue #157, and aurora 277 for more details.
@kkappler
Copy link
Collaborator Author

kkappler commented Jul 29, 2023

This change resulted in obtaining TFs that are in good agreement with SPUD on the CAS04 tests as well as no rows with zero filters in the channel_summary, so I think we want to keep the change.

@laura-iris : Could you please also take a look at the logic in get_inventory_from_df and see if it can be simplified somewhat ? Originally I had hoped to just iterate a single function over rows of the request_df, facilitating paralellization. It may be that simplifying the request_df before it hits get_inventory_from_df is appropriate (e.g. converting wildcards to explicit network-station-channel lists, and maybe other simplifications).

Some notes are in issue #157 and aurora issue 277

@laura-iris
Copy link
Collaborator

This reminds me a lot of some work I did last year, and when I was looking through my code and notes I see that what appears to be the exact same function used to live in make_mth5.py instead of fdsn.py. I had worked on looping over the epochs and simplifying some of the logic at that point. It seems like the changes weren't translated over to fdsn.py.

@kkappler can you take a look at this old merge request and let me know if it also looks like the same code? #106

@kkappler
Copy link
Collaborator Author

kkappler commented Aug 1, 2023

@laura-iris This does look like the same code (cleaned up and fixed already!) It looks like it makes sense to try and replace the existing get_inventory_from_df method in fdsn.py on fc branch with the fix you had made previously in in make_mth5.py, that had been merged into master

I don't completely understand it, but here is what it looks like happened:

In an effort to generalize make_mth5.py so that is can get data from clients other than earthscope/IRIS, @kujaku11 modified make_mth5.py to use a "client plugin model". In the current implementation there are two client plugins, fdsn.FDSN, and geomag.GeomagClient (for Intermagnet observatories). In future, other clients can follow this pattern.

The factoring of fdsn.py out of make_mth5.py was started on Feb 4, 2022, on the clients branch of mth5.
The PR #82 for the clients branch was closed and merged into master on March 27, 2023.

However, your fix to issue #105 was merged into master on August 19, 2022 (while clients branch was still in development). At this point in time however, clients had never been merged into master, so you had no way to know that code had been factored out of make_mth5.py.

It looks like what happened is that somehow during this generalization process an older version of the data wrangling wound up in fdsn. It also looks like this had already gone pete tong on master by Nov 14, 2022.

@kkappler
Copy link
Collaborator Author

kkappler commented Aug 3, 2023

@laura-iris I took a look at the modifications you made to make_mth5 and have blended those into fdsn.py, together with some clean up I did as well.

I kept your structure of using
self._process_list()
which uses a dict to select based on mth5_version
self._run_010 or self._run_020, and your idea of self._loop_stations() method.
However, the logic in loop_stations uses my self.wrangle_runs_into_containers() method, which has a simplified logic from the old loop_statons.

I'll commit to fix_issue_157 and test this weekend.

kkappler added a commit that referenced this issue Aug 4, 2023
Re-add methods:
- _loop_stations
- _run_010
- _run_020
- _process_list

These functions were lost in a merge sometime last year.
Some additional notes are in issue #157
kkappler added a commit that referenced this issue Aug 4, 2023
Replace logic in make_mth5_from_fdsn_client with a call to self._process_list

issue #157
@kkappler
Copy link
Collaborator Author

kkappler commented Aug 6, 2023

See mt_metadata issue 151

There is a bug in mt_metadata/mt_metadata/timeseries/stationxml/xml_channel_mt_channel.py causing
multiple newly discovered filters to be renamed with the same name.

@kkappler
Copy link
Collaborator Author

kkappler commented Aug 6, 2023

Substantially modified the code (but not the ultimate approach) in mth5/clients/fdsn.py.

Replaced nested loop logic with alternative approach

  • Make one dict of networks and one dict of stations up front
  • loop over df only one time, packing channels into stations
  • then pack stations into networks
  • finally pack networks into inventory and return (with data if requested)

This should be much easier to debug in future.

The only complexity is this idea that you can have namespace clashes with
network_id. We handle this (as per tronan's original version) by
pairing network with a timestamp.

@laura-iris Can you take a look at my latest commit and see if it makes sense to you?

@kkappler
Copy link
Collaborator Author

@laura-iris and I discussed this today, the logic reproduces the old code, BUT, still does not generically support wildcards ... consider the case if a network code has multiple instances, in different time periods, then we only receive the zeroth instance.

This should be handled in a later SOW, and the logic for handling should be in either validate dataframe or in in the network handling (build_network_dict) where multiple start-times for a network (station) would be handled

@kkappler
Copy link
Collaborator Author

Filters are now correctly included for all channels, but are not showing up in the second run:
image

@kkappler
Copy link
Collaborator Author

@kujaku11 This is seems to be because we need an additional operation in fdsn.py

I am not sure where the best place to put it is ... here is a snapshot of the main flow of make_mth5_from_fdsn_client

image

In this case, the inventory has only 5-channels (they never changed):
image

but the streams implies two runs:
image

experiment = translator.xml_to_mt(inv)
is unaware that there are multiple runs, because it sees only the inventory,
image

So we need to do one of the following:

  • A) Count the number of runs implied by streams, and then augment the experiement
  • B) Before closing the mth5, (which is aware that there are two runs) check for filterless channels, and if they exist, deduce the appropriate filters
  • C) something else
    The important things to be aware of here are:
    1: It is possible to have multiple instances of the same channel name, and this seems to have been handled
  1. It is possible to have one channel with multiple runs (this is where we have problems today)

I think that we can take a tack like (B) with a function called: repair_missing_filters(m)
This would look for missing filters, and then if they are found, see if the same channel has filters on an earlier run.

@kkappler
Copy link
Collaborator Author

The following workaround is being tested, which at least allows me to move forward in the analysis of TF values (it is committed in sandbox/mth5_helpers.py on earthscope_tests branch of aurora)
image

@kujaku11
Copy link
Owner

@kkappler have a look at commit afc39f0. I added some logic if there are more runs found in the data than the metadata. It will use metadata from the 1st run and channels to populate metadata in other runs.

@kkappler
Copy link
Collaborator Author

kkappler commented Aug 24, 2023

@kujaku11 almost, but the both the first and second runs are named 001

image

Almost certain we just need to pop the "id" key from metadata dict before overwriting -- working on fix now

kkappler added a commit that referenced this issue Aug 25, 2023
@kujaku11
Copy link
Owner

Fixed with update of FDSN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Filters Functionality dealing with how filters are used, applied, and described
Projects
None yet
Development

No branches or pull requests

3 participants