CAS04 Dataset reconciliation #31

kkappler · 2021-07-09T22:24:29Z

The CAS04 dataset processes with aurora but differs from the archived results in SPUD by an factor of ~1e-12. Specifically, the aurora TFs are much smaller than those on SPUD.

A plausible source of error here is if the Electric field data were interpretted as mV/km , but were actually V/m, then the E field spectra would be too large by 1e-6. Because apparent resistivity is proportional to 1/|E|^2 this would make the TF differ by 1e-12.

The data were generated by this python file
https://github.com/kujaku11/mth5/blob/master/examples/make_mth5_from_iris_dmc_local.py

And the results pulled from the web were accessed here:
http://ds.iris.edu/spud/emtf/18633652

ToDo: Review the metadata units for all stages of the filters. This is a NIMS long period system.

-Moved matlab_z_file_reader into a a specific folder for working with matlab z-files -updated z-file reader to give the ability to trim N periods off the z-file -modified z-file writer to take an optional argument for orientation strings -fixed a bug in the z-file writer - removed np.flip() on data array which was a hackaround, no longer needed because aurora uses period preferentially to frequency when working with TFs. [issue #31, among others]

kkappler · 2021-10-02T21:33:59Z

created a script called issue_31.py in aurora/tests/earthscope/cas04/ which dumps the metadata as it provided in the mth5 that is made by:
mth5/examples/make_mth5_from_iris_dmc.py

START SUMMARY FOR CHANNEL ex
Total number of filters = 6

FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ex



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL ey
Total number of filters = 6

FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ey



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hx
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hx time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hx



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hy
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hy time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hy



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hz
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hz time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hz

This is a little bit confusing. The scale factors may be correct in their product but they are not correct in their individual values.

Requests and recommendations:

When the XML are created, order the filter stages along the data flow.
Transducer --> Analog Filters --> A2D Converter --> digitial filters.

This would change, for example ex to

FILTER# 0 [was 2]          mV/km --> V/m                    Gain:1e-06
FILTER# 1 [was 3]            V/m --> V                      Gain:92.0  
FILTER# 2 [was 0]              V --> V                      Gain:1.0
FILTER# 3 [was 1]              V --> V                      Gain:1.0
FILTER# 4 [was 4]              V --> count                  Gain:484733700000000.0
FILTER# 5 [was 5]          count --> count                  Gain:1.0

from:

FILTER# 0 electric field 5 pole butterworth low-pass     mV/km --> mV/km,    Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass    mV/km --> mV/km,    Gain:1.0
FILTER# 2 mv per km to v per m                           mV/km --> V/m,      Gain:1e-06
FILTER# 3 v per m to v                                     V/m --> V,        Gain:92.0
FILTER# 4 v to counts (electric)                             V --> count,    Gain:484733700000000.0
FILTER# 5 electric time offset                           count --> count,    Gain:1.0

and hx is ok how it is:

FILTER# 0 magnetic field 3 pole butterworth low-pass         nT --> V,     Gain:1.0
FILTER# 1 v to counts (magnetic)                              V --> count, Gain:100.0
FILTER# 2 hx time offset                                  count --> count, Gain:1.0

Use SI units in the metadata, let the units be converted in the processing
Counts to volts for electric are not realistic, possibly this system is already reporting in mV/km? and the additional 1e-6 is why I have a scale factor error?
Counts to volts for electric and magnetic should nominally be the same

kkappler · 2021-10-02T21:36:38Z

This requires a review with the team, and relates to issue #36

The way that the scale factors are setup in SEED is that the data in counts are divided by the product of the stages. Compare the CAS04 metadata with that of SAO

Ctrl+F LQ2

In that example the total sensitivity is -9.96867550678789E8 ~= 1e9

This station uses the dipole length (236m) as an analog gain in stage 1, together with the PZ response of the EFSC front end analog filter. Stage 2 is the 10x gain on the EFSC. Stage 3 is volts to counts, with a gain of 427135. i.e. ~4e5 counts per volt.

Working back from data in counts we have the following logic:

Say you read 400000 counts. This corresponds to 1 "Gained Volt" after the amplifier, which is due to 0.1 "True Volt" before the amplifier which is around 0.0004 V/m, which is ~420mV/km. A plausible field to observe at SAO during a major storm.

In the case of CAS04 the counts to Volts is ~1.72 x 2^48, whereas at SAO it is around 2^19. It doesn't seem plausible that there are enough bits on the digitizer to allow for a 2^48 counts per volt.

It would be helpful to have the config files from the SPUD data processing run. Are these available?

kkappler · 2021-10-28T16:29:27Z

@akelbert I am going to put a summary of what we discussed this morning and a request for follow up info in later this morning -- testing comm

akelbert · 2021-10-28T17:11:23Z

@kkappler I'm attaching the configuration files that we used for the RR CAS04 processing (NASA-MTA20 survey). Also attaching the four CAS04 SP files with an added *.txt extension - otherwise GitHub won't attach.

CF_for_sharing.zip
CAS04a.sp.txt
CAS04b.sp.txt
CAS04c.sp.txt
CAS04d.sp.txt

kkappler · 2021-10-28T18:13:02Z

Thanks @akelbert - these are a great start and I can use these to check the units. Are there the other processing parameters by any chance? Specifically:

pwset.cfg #prewhitening settings
band_setup.cfg # frequency bands
decset.cfg # decimation specification
options.cfg or options_RR.cfg # has setting about RR station, coherence sorting parameters etc.
tranmt.cfg (or similar) # config file pointing to options, input and output filessystem

I think these are all of them, but I don't see where the apodization window is defined ... maybe I am missing a file? In any case, it would be ideal if we could group together all the settings used in the processing that generated the SPUD data.

akelbert · 2021-10-28T18:17:32Z

Hi @kkappler, these settings are all in the zip archive above. The *.cfg files cannot be attached to issues in GitHub, extension not recognized. The zip has them all.

akelbert · 2021-10-28T18:31:51Z

@kkappler on "counts only in IRIS", see the FDSN SEED manual, http://www.fdsn.org/pdf/SEEDManual_V2.4.pdf Blockette 1000 (miniSEED - Data Only) on p. 123. As far as I'm aware, that's the unique data archiving format allowed at IRIS.

kkappler · 2021-10-28T18:38:23Z

@akelbert The zip archive is empty. Maybe because the files inside need to be "tricked" into being .txt as well?

akelbert · 2021-10-28T18:51:59Z

@kkappler trying again; if that still doesn't work I'll rename them all to txt files.

CF_for_sharing.zip

kkappler · 2021-11-01T17:17:06Z

I am going to add a few comments to describe the differences I see between the mt_metadata and the files provided by Anna. Here is a description of the metadata that I am interacting with sourced from the current station_xml.

I am using data that is created by the example in mth5 here:

once that file is created, filling in filepath with your local path in the example code below allows interaction with the filters.

from pathlib import Path
from mth5 import mth5

filepath = Path("path/to/file.h5")
mth5_obj = mth5.MTH5()
mth5_obj.open_mth5(filepath, mode="r")
cas04 = mth5_obj.stations_group.get_station("CAS04")
run_a=cas04.get_run('a')
ex = run_a.get_channel("ex")
ex.channel_response_filter.filters_list

makes the following output

[{
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "butterworth filter",
         "gain": 1.0,
         "name": "electric field 5 pole butterworth low-pass",
         "normalization_factor": 313383.601119193,
         "poles": [
             "(-3.883009+11.951875j)",
             "(-3.883009-11.951875j)",
             "(-10.166194+7.386513j)",
             "(-10.166194-7.386513j)",
             "(-12.566371+0j)"
         ],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "mV/km",
         "zeros": []
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "butterworth filter",
         "gain": 1.0,
         "name": "electric field 1 pole butterworth high-pass",
         "normalization_factor": 1.00000351809047,
         "poles": [
             "(-0.000167+0j)"
         ],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "mV/km",
         "zeros": [
             "0j"
         ]
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "unit conversion",
         "gain": 1e-06,
         "name": "mv per km to v per m",
         "normalization_factor": 1.00000351809047,
         "poles": [],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "V/m",
         "zeros": []
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "electric dipole",
         "gain": 94.0,
         "name": "v per m to v",
         "normalization_factor": 1.00000351809047,
         "poles": [],
         "type": "zpk",
         "units_in": "V/m",
         "units_out": "V",
         "zeros": []
     }
 },
 {
     "coefficient_filter": {
         "calibration_date": "1980-01-01",
         "comments": "analog to digital conversion",
         "gain": 484733700000000.0,
         "name": "v to counts (electric)",
         "type": "coefficient",
         "units_in": "V",
         "units_out": "count"
     }
 },
 {
     "time_delay_filter": {
         "calibration_date": "1980-01-01",
         "comments": "time correction",
         "delay": -0.285,
         "gain": 1.0,
         "name": "electric time offset",
         "type": "time delay",
         "units_in": "count",
         "units_out": "count"
     }
 }]

akelbert · 2021-11-02T18:34:49Z

@kkappler @kujaku11 so I'm still confused on our resolution on the gain factor question. I combed through my USGS email over the years, and at least since 2015 there was no mention of the factor 4.847337E14. Prior to the latest archiving where this factor was introduced as the Volts to counts conversion, the gain for this conversion was always set to 4.09601E8 (makes perfect sense to me). This change followed a lengthy discussion (I believe) a suggestion from Tim Ronan - better check with him! After looking through all the documents, I am inclined to go back to the 4.09601E8 for this re-archiving. Also, prior to the year 2020, we've always archived the electrics in V/m, e.g., http://service.iris.edu/fdsnws/station/1/query?net=EM&sta=IDA11&cha=*&level=response. Now, we've made the change to mV/km, as in http://service.iris.edu/fdsnws/station/1/query?net=ZU&sta=KSR28&cha=*&level=response. This, too, followed a lengthy discussion with everyone, including Gary, why said that the only reason this wasn't done in the past was that IRIS didn't support non-SI units at the time this archiving was initially setup.

The only info I could find in the archiving docs follows below (also, attached):

Gain for stage 1 for magnetic channels (Butterworth 3 poles low-pass) is calculated as 1/hfac*1e9, where hfac is taken from .hed file, at zero frequency (conversion from Tesla to digital counts). Typical value of hfac is 0.01, then gain will be 1e11, as shown in the table below. Sensitivity is a product of gains over entire channel at zero frequency, that is also 1/hfac109.
Gain for stage 1 for electric channels is equal to the dipole length (m), coming from .hed files (100m in the example below) and corrected for normalization frequency 0.01 (i.e. multiplied by A0/A1, where A1 calculated at frequency=0.01Hz). Gain for stage 2 for magnetic channels (low-pass) is calculated as 1/efac103, where efac is taken from *.hed file, and multiplied by A0/A1, where A1 calculated at frequency=0.01Hz (conversion from Volts to digital counts). For example, typical value for MT1 NIMS efac is 2.441406e-6, thus typical value for gain will be 4.0960e+08, multiplied by A0/A1 (~1). Sensitivity is again calculated as a product of gains over entire channel at 0.01Hz frequency.

OSU_Data_Archiving_2006.pdf

At this point, for my re-archiving I'd like to switch back to that original gain factor which makes sense to me - the new one doesn't. Would that fix the discrepancy that you're seeing? I also suggest that we stick with the practical units, but OK with switching back to SI units if that's for some reason easier. OK with moving the dipole length multiplication upfront to make it Stage 1. At any rate, let us set up a telecon between the three of us, at least, but better - with Gary, and finalize these details so that I can complete the archiving and move on. I really have to at this time.

kkappler · 2021-11-18T22:01:21Z

Updated XML no longer have very large count to volt conversion. Still need to check that dipole length is correct (it should be 92).

Testing against the archived result on SPUD is a bit complicated by the fact that the SPUD results are .zmm (not .zrr).

Anna will run a .zrr and share

The .zrr along with the processing configuration from EMTF,
The updated station_xml with 92m dipole

akelbert · 2021-11-22T17:50:22Z

@kkappler here are three decent remote reference options for CAS04 and the corrected/updated StationXML for this site. Please check against Aurora!

CF_for_sharing.zip
.
CAS04_stationxml.xml.zip
CAS04_zrr.zip

kkappler · 2021-12-01T18:21:13Z

@akelbert is it possible to generate .zrr for non-merged runs...
specifically these examples all use b,c,d together. Can you please create one with only run b (or only run c or only run d or all of the above?) It isn't important right now which of them, just as long as I know which one it is. This will allow me to run the tests without merging runs which is still being built out.

kkappler · 2021-12-03T02:11:17Z

@akelbert : Also, can I get the station.xml for the remote reference station as well? That way I can pull the reference data based on the listed runs available in the reference station
...
I suppose that if you just ran single station processing on b,c, or d we could circumvent this for now

timronan · 2021-12-04T00:54:05Z

@akelbert CAS04_stationxml.xml.zip does not pass validation.

211,Error,8P,CAS04,,,2020-06-02T18:41:43,2020-07-13T21:46:12,Chan: LQE Loc: 2020-07-01T19:36:55 2020-07-13T21:46:12 epoch overlaps with LQE 2020-07-01T19:36:55 2020-07-13T21:46:12

8P.CAS04.LQE 2020-06-02T18:41:43 2020-07-13T21:46:12 is duplicated in CAS04_stationxml.xml.zip . The attached zip file has the updated stationxml file attached.

Also, this stationxml has a different network code than the CAS04 station in the IRIS archive. The archived version of CAS04 is ZU.CAS04 while the stationxml is labeled as 8P.CAS04. This leads to problems when trying extracting time series from the the IRIS archive unless this Network naming conflict is manually resolved.

The attached metadata does not resolve this Network name conflict because I think that ZU may be renamed to 8P but it is important to note this Network naming conflict so we don't accidentally trip over this inconsistency in the future.

CAS04_stationxml_updated.xml.zip

Karl says: This file is corrupt but a verison of it is committed in aurora/tests/cas04/cas04_from_tim_20211203.xml

Outline of workflow from xml to mth5 started [Issue(s): #31]

Modified process_mth5 by creating process_mth5_runs, nearly a copy of process_mth5_run. This method will take lists of runs to process and eventually supercede process_mth5_run. [Issue(s): #31, #118, #80]

Modified process_mth5 to loop over runs and create merged FC object. Testing ok on decimation level zero. Now need to add looping over decimation levels. [Issue(s): #31, #118, #80]

kkappler · 2022-06-07T03:32:39Z

@kujaku11 : Revisiting this dataset in the context of merging runs. The currently archived version is missing filters for electric channels. In particular:

channel ex has identical filters for runs a,b,c but no filters at all for run d.
channel ey has identical filters for runs a,b but no filters at all for runs c,d.
channels hx, hy have all identical filters for all runs a,b,c,d and pass a cursory inspection

If you want to build the mth5 locally to inspect it, this can be done using the new operate_aurora.ipynb in docs/notebooks on main branch. Can you PTAL and see if an updated metadata file with consistent channels can be passed to Tim or Laura for archiving?

In a pinch I will try just overwriting the electric field filters from run "a" onto "c" and "d" in operate_aurora.ipynb and see if I can workaround that way.

Using the updated method in mth5 locally (see mth5 issue #105), am now able to process runs c and d for CAS04 as single station. Working on getting a similar h5 built in tests/cas04 [Issue(s): #31, #80]

kkappler · 2022-07-09T18:39:05Z

The specific issue with the metadata (cited above on June 6, 2022) is still under investigation by @laura-iris , and she will be looking at the make_mth5 logic this month. In the meantime, a variation to make_mth5 has been applied on the fix_issue_105 branch of mth5. Details are at https://github.com/kujaku11/mth5/issues/105 & https://github.com/kujaku11/mth5/pull/106.

With this fix in place I am now able to generate mth5 files with all the stations ["CAS04", "CAV07", "NVR11", "REV06"]
and am starting to get remote reference processed results. I will report these results on this issue as they are generated.

Confirm that 01_make_cas04_mth5.py makes the big 4-station file Removed other stuff into an ipynb for show and tell [Issue(s): #31]

kkappler · 2022-07-23T22:53:18Z

Pasted below are comparisons of the z-files returned by aurora and emtf.
They agree reasonably well at low frequencies. At high frequencies, we know that coherence sorting was used by emtf, and this feature is not yet available in aurora.

Note that NVR11 was processed using a different band setup file due to issue #196

kkappler · 2022-07-24T16:23:20Z

Closing this issue for now. We may chose to re-open when we address coherence sorting module.

kkappler added invalid This doesn't seem right question Further information is requested labels Oct 2, 2021

kkappler mentioned this issue Oct 2, 2021

Add Support for Units #36

Closed

kkappler added a commit that referenced this issue Jan 16, 2022

Adding cas04 workflow

6d77a60

Outline of workflow from xml to mth5 started [Issue(s): #31]

kkappler added a commit that referenced this issue Mar 4, 2022

Towards multiple runs:

0746299

Modified process_mth5 to loop over runs and create merged FC object. Testing ok on decimation level zero. Now need to add looping over decimation levels. [Issue(s): #31, #118, #80]

kkappler mentioned this issue Mar 8, 2022

Processing Dataset Definition #118

Closed

kkappler mentioned this issue Jun 7, 2022

Merged Runs in Frequency Domain #184

Merged

20 tasks

kkappler mentioned this issue Jul 8, 2022

NIMS jupyter notebook for training #193

Closed

8 tasks

kkappler added a commit that referenced this issue Jul 9, 2022

Move some mth5 make out of cas04 and into ipynb

36783f4

Confirm that 01_make_cas04_mth5.py makes the big 4-station file Removed other stuff into an ipynb for show and tell [Issue(s): #31]

kkappler removed the invalid This doesn't seem right label Jul 24, 2022

kkappler closed this as completed Jul 24, 2022

kkappler added a commit that referenced this issue Jul 22, 2023

add USGS zfile (see Issue #31)

3422ecf

kkappler mentioned this issue Jul 22, 2023

Filters not included for all channels kujaku11/mth5#157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAS04 Dataset reconciliation #31

CAS04 Dataset reconciliation #31

kkappler commented Jul 9, 2021 •

edited

Loading

kkappler commented Oct 2, 2021 •

edited

Loading

kkappler commented Oct 2, 2021 •

edited

Loading

kkappler commented Oct 28, 2021

akelbert commented Oct 28, 2021

kkappler commented Oct 28, 2021

akelbert commented Oct 28, 2021

akelbert commented Oct 28, 2021

kkappler commented Oct 28, 2021

akelbert commented Oct 28, 2021

kkappler commented Nov 1, 2021

akelbert commented Nov 2, 2021

kkappler commented Nov 18, 2021

akelbert commented Nov 22, 2021

kkappler commented Dec 1, 2021

kkappler commented Dec 3, 2021 •

edited

Loading

timronan commented Dec 4, 2021 •

edited by kkappler

Loading

kkappler commented Jun 7, 2022 •

edited

Loading

kkappler commented Jul 9, 2022 •

edited

Loading

kkappler commented Jul 23, 2022

kkappler commented Jul 24, 2022

CAS04 Dataset reconciliation #31

CAS04 Dataset reconciliation #31

Comments

kkappler commented Jul 9, 2021 • edited Loading

kkappler commented Oct 2, 2021 • edited Loading

kkappler commented Oct 2, 2021 • edited Loading

kkappler commented Oct 28, 2021

akelbert commented Oct 28, 2021

kkappler commented Oct 28, 2021

akelbert commented Oct 28, 2021

akelbert commented Oct 28, 2021

kkappler commented Oct 28, 2021

akelbert commented Oct 28, 2021

kkappler commented Nov 1, 2021

akelbert commented Nov 2, 2021

kkappler commented Nov 18, 2021

akelbert commented Nov 22, 2021

kkappler commented Dec 1, 2021

kkappler commented Dec 3, 2021 • edited Loading

timronan commented Dec 4, 2021 • edited by kkappler Loading

kkappler commented Jun 7, 2022 • edited Loading

kkappler commented Jul 9, 2022 • edited Loading

kkappler commented Jul 23, 2022

kkappler commented Jul 24, 2022

kkappler commented Jul 9, 2021 •

edited

Loading

kkappler commented Oct 2, 2021 •

edited

Loading

kkappler commented Oct 2, 2021 •

edited

Loading

kkappler commented Dec 3, 2021 •

edited

Loading

timronan commented Dec 4, 2021 •

edited by kkappler

Loading

kkappler commented Jun 7, 2022 •

edited

Loading

kkappler commented Jul 9, 2022 •

edited

Loading