Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAS04 Dataset reconciliation #31

Closed
kkappler opened this issue Jul 9, 2021 · 20 comments
Closed

CAS04 Dataset reconciliation #31

kkappler opened this issue Jul 9, 2021 · 20 comments
Labels
question Further information is requested

Comments

@kkappler
Copy link
Collaborator

kkappler commented Jul 9, 2021

The CAS04 dataset processes with aurora but differs from the archived results in SPUD by an factor of ~1e-12. Specifically, the aurora TFs are much smaller than those on SPUD.

A plausible source of error here is if the Electric field data were interpretted as mV/km , but were actually V/m, then the E field spectra would be too large by 1e-6. Because apparent resistivity is proportional to 1/|E|^2 this would make the TF differ by 1e-12.

The data were generated by this python file
https://github.com/kujaku11/mth5/blob/master/examples/make_mth5_from_iris_dmc_local.py

And the results pulled from the web were accessed here:
http://ds.iris.edu/spud/emtf/18633652

ToDo: Review the metadata units for all stages of the filters. This is a NIMS long period system.

kkappler added a commit that referenced this issue Sep 30, 2021
-Moved matlab_z_file_reader into a a specific folder for working with
matlab z-files
-updated z-file reader to give the ability to trim N periods off the
z-file
-modified z-file writer to take an optional argument for orientation
strings
-fixed a bug in the z-file writer - removed np.flip() on data array
which was a hackaround, no longer needed because aurora uses period
preferentially to frequency when working with TFs.

[issue #31, among others]
@kkappler
Copy link
Collaborator Author

kkappler commented Oct 2, 2021

created a script called issue_31.py in aurora/tests/earthscope/cas04/ which dumps the metadata as it provided in the mth5 that is made by:
mth5/examples/make_mth5_from_iris_dmc.py

START SUMMARY FOR CHANNEL ex
Total number of filters = 6

FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ex



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL ey
Total number of filters = 6

FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ey



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hx
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hx time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hx



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hy
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hy time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hy



<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hz
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hz time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hz

This is a little bit confusing. The scale factors may be correct in their product but they are not correct in their individual values.

Requests and recommendations:

  1. When the XML are created, order the filter stages along the data flow.
    Transducer --> Analog Filters --> A2D Converter --> digitial filters.

This would change, for example ex to

FILTER# 0 [was 2]          mV/km --> V/m                    Gain:1e-06
FILTER# 1 [was 3]            V/m --> V                      Gain:92.0  
FILTER# 2 [was 0]              V --> V                      Gain:1.0
FILTER# 3 [was 1]              V --> V                      Gain:1.0
FILTER# 4 [was 4]              V --> count                  Gain:484733700000000.0
FILTER# 5 [was 5]          count --> count                  Gain:1.0

from:

FILTER# 0 electric field 5 pole butterworth low-pass     mV/km --> mV/km,    Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass    mV/km --> mV/km,    Gain:1.0
FILTER# 2 mv per km to v per m                           mV/km --> V/m,      Gain:1e-06
FILTER# 3 v per m to v                                     V/m --> V,        Gain:92.0
FILTER# 4 v to counts (electric)                             V --> count,    Gain:484733700000000.0
FILTER# 5 electric time offset                           count --> count,    Gain:1.0

and hx is ok how it is:

FILTER# 0 magnetic field 3 pole butterworth low-pass         nT --> V,     Gain:1.0
FILTER# 1 v to counts (magnetic)                              V --> count, Gain:100.0
FILTER# 2 hx time offset                                  count --> count, Gain:1.0
  1. Use SI units in the metadata, let the units be converted in the processing

  2. Counts to volts for electric are not realistic, possibly this system is already reporting in mV/km? and the additional 1e-6 is why I have a scale factor error?

  3. Counts to volts for electric and magnetic should nominally be the same

@kkappler
Copy link
Collaborator Author

kkappler commented Oct 2, 2021

This requires a review with the team, and relates to issue #36

The way that the scale factors are setup in SEED is that the data in counts are divided by the product of the stages. Compare the CAS04 metadata with that of SAO

Ctrl+F LQ2

In that example the total sensitivity is -9.96867550678789E8 ~= 1e9

This station uses the dipole length (236m) as an analog gain in stage 1, together with the PZ response of the EFSC front end analog filter. Stage 2 is the 10x gain on the EFSC. Stage 3 is volts to counts, with a gain of 427135. i.e. ~4e5 counts per volt.

Working back from data in counts we have the following logic:

Say you read 400000 counts. This corresponds to 1 "Gained Volt" after the amplifier, which is due to 0.1 "True Volt" before the amplifier which is around 0.0004 V/m, which is ~420mV/km. A plausible field to observe at SAO during a major storm.

In the case of CAS04 the counts to Volts is ~1.72 x 2^48, whereas at SAO it is around 2^19. It doesn't seem plausible that there are enough bits on the digitizer to allow for a 2^48 counts per volt.

It would be helpful to have the config files from the SPUD data processing run. Are these available?

@kkappler kkappler added invalid This doesn't seem right question Further information is requested labels Oct 2, 2021
@kkappler
Copy link
Collaborator Author

@akelbert I am going to put a summary of what we discussed this morning and a request for follow up info in later this morning -- testing comm

@akelbert
Copy link

@kkappler I'm attaching the configuration files that we used for the RR CAS04 processing (NASA-MTA20 survey). Also attaching the four CAS04 SP files with an added *.txt extension - otherwise GitHub won't attach.

CF_for_sharing.zip
CAS04a.sp.txt
CAS04b.sp.txt
CAS04c.sp.txt
CAS04d.sp.txt

@kkappler
Copy link
Collaborator Author

Thanks @akelbert - these are a great start and I can use these to check the units. Are there the other processing parameters by any chance? Specifically:

  • pwset.cfg #prewhitening settings
  • band_setup.cfg  # frequency bands
  • decset.cfg # decimation specification
  • options.cfg or options_RR.cfg # has setting about RR station, coherence sorting parameters etc.
  • tranmt.cfg (or similar) # config file pointing to options, input and output filessystem

I think these are all of them, but I don't see where the apodization window is defined ... maybe I am missing a file?  In any case, it would be ideal if we could group together all the settings used in the processing that generated the SPUD data.

@akelbert
Copy link

Hi @kkappler, these settings are all in the zip archive above. The *.cfg files cannot be attached to issues in GitHub, extension not recognized. The zip has them all.

@akelbert
Copy link

@kkappler on "counts only in IRIS", see the FDSN SEED manual, http://www.fdsn.org/pdf/SEEDManual_V2.4.pdf Blockette 1000 (miniSEED - Data Only) on p. 123. As far as I'm aware, that's the unique data archiving format allowed at IRIS.

@kkappler
Copy link
Collaborator Author

@akelbert The zip archive is empty. Maybe because the files inside need to be "tricked" into being .txt as well?

@akelbert
Copy link

@kkappler trying again; if that still doesn't work I'll rename them all to txt files.

CF_for_sharing.zip

@kkappler
Copy link
Collaborator Author

kkappler commented Nov 1, 2021

I am going to add a few comments to describe the differences I see between the mt_metadata and the files provided by Anna. Here is a description of the metadata that I am interacting with sourced from the current station_xml.

I am using data that is created by the example in mth5 here:

once that file is created, filling in filepath with your local path in the example code below allows interaction with the filters.

from pathlib import Path
from mth5 import mth5

filepath = Path("path/to/file.h5")
mth5_obj = mth5.MTH5()
mth5_obj.open_mth5(filepath, mode="r")
cas04 = mth5_obj.stations_group.get_station("CAS04")
run_a=cas04.get_run('a')
ex = run_a.get_channel("ex")
ex.channel_response_filter.filters_list

makes the following output

[{
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "butterworth filter",
         "gain": 1.0,
         "name": "electric field 5 pole butterworth low-pass",
         "normalization_factor": 313383.601119193,
         "poles": [
             "(-3.883009+11.951875j)",
             "(-3.883009-11.951875j)",
             "(-10.166194+7.386513j)",
             "(-10.166194-7.386513j)",
             "(-12.566371+0j)"
         ],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "mV/km",
         "zeros": []
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "butterworth filter",
         "gain": 1.0,
         "name": "electric field 1 pole butterworth high-pass",
         "normalization_factor": 1.00000351809047,
         "poles": [
             "(-0.000167+0j)"
         ],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "mV/km",
         "zeros": [
             "0j"
         ]
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "unit conversion",
         "gain": 1e-06,
         "name": "mv per km to v per m",
         "normalization_factor": 1.00000351809047,
         "poles": [],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "V/m",
         "zeros": []
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "electric dipole",
         "gain": 94.0,
         "name": "v per m to v",
         "normalization_factor": 1.00000351809047,
         "poles": [],
         "type": "zpk",
         "units_in": "V/m",
         "units_out": "V",
         "zeros": []
     }
 },
 {
     "coefficient_filter": {
         "calibration_date": "1980-01-01",
         "comments": "analog to digital conversion",
         "gain": 484733700000000.0,
         "name": "v to counts (electric)",
         "type": "coefficient",
         "units_in": "V",
         "units_out": "count"
     }
 },
 {
     "time_delay_filter": {
         "calibration_date": "1980-01-01",
         "comments": "time correction",
         "delay": -0.285,
         "gain": 1.0,
         "name": "electric time offset",
         "type": "time delay",
         "units_in": "count",
         "units_out": "count"
     }
 }]

@akelbert
Copy link

akelbert commented Nov 2, 2021

@kkappler @kujaku11 so I'm still confused on our resolution on the gain factor question. I combed through my USGS email over the years, and at least since 2015 there was no mention of the factor 4.847337E14. Prior to the latest archiving where this factor was introduced as the Volts to counts conversion, the gain for this conversion was always set to 4.09601E8 (makes perfect sense to me). This change followed a lengthy discussion (I believe) a suggestion from Tim Ronan - better check with him! After looking through all the documents, I am inclined to go back to the 4.09601E8 for this re-archiving. Also, prior to the year 2020, we've always archived the electrics in V/m, e.g., http://service.iris.edu/fdsnws/station/1/query?net=EM&sta=IDA11&cha=*&level=response. Now, we've made the change to mV/km, as in http://service.iris.edu/fdsnws/station/1/query?net=ZU&sta=KSR28&cha=*&level=response. This, too, followed a lengthy discussion with everyone, including Gary, why said that the only reason this wasn't done in the past was that IRIS didn't support non-SI units at the time this archiving was initially setup.

The only info I could find in the archiving docs follows below (also, attached):

Gain for stage 1 for magnetic channels (Butterworth 3 poles low-pass) is calculated as 1/hfac*1e9, where hfac is taken from .hed file, at zero frequency (conversion from Tesla to digital counts). Typical value of hfac is 0.01, then gain will be 1e11, as shown in the table below. Sensitivity is a product of gains over entire channel at zero frequency, that is also 1/hfac109.
Gain for stage 1 for electric channels is equal to the dipole length (m), coming from .hed files (100m in the example below) and corrected for normalization frequency 0.01 (i.e. multiplied by A0/A1, where A1 calculated at frequency=0.01Hz). Gain for stage 2 for magnetic channels (low-pass) is calculated as 1/efac103, where efac is taken from *.hed file, and multiplied by A0/A1, where A1 calculated at frequency=0.01Hz (conversion from Volts to digital counts). For example, typical value for MT1 NIMS efac is 2.441406e-6, thus typical value for gain will be 4.0960e+08, multiplied by A0/A1 (~1). Sensitivity is again calculated as a product of gains over entire channel at 0.01Hz frequency.

OSU_Data_Archiving_2006.pdf

At this point, for my re-archiving I'd like to switch back to that original gain factor which makes sense to me - the new one doesn't. Would that fix the discrepancy that you're seeing? I also suggest that we stick with the practical units, but OK with switching back to SI units if that's for some reason easier. OK with moving the dipole length multiplication upfront to make it Stage 1. At any rate, let us set up a telecon between the three of us, at least, but better - with Gary, and finalize these details so that I can complete the archiving and move on. I really have to at this time.

@kkappler
Copy link
Collaborator Author

Updated XML no longer have very large count to volt conversion. Still need to check that dipole length is correct (it should be 92).

Testing against the archived result on SPUD is a bit complicated by the fact that the SPUD results are .zmm (not .zrr).

Anna will run a .zrr and share

  1. The .zrr along with the processing configuration from EMTF,
  2. The updated station_xml with 92m dipole

@akelbert
Copy link

@kkappler here are three decent remote reference options for CAS04 and the corrected/updated StationXML for this site. Please check against Aurora!

CF_for_sharing.zip
.
CAS04_stationxml.xml.zip
CAS04_zrr.zip

Screen Shot 2021-11-22 at 10 43 23 AM

@kkappler
Copy link
Collaborator Author

kkappler commented Dec 1, 2021

@akelbert is it possible to generate .zrr for non-merged runs...
specifically these examples all use b,c,d together. Can you please create one with only run b (or only run c or only run d or all of the above?) It isn't important right now which of them, just as long as I know which one it is. This will allow me to run the tests without merging runs which is still being built out.

@kkappler
Copy link
Collaborator Author

kkappler commented Dec 3, 2021

@akelbert : Also, can I get the station.xml for the remote reference station as well? That way I can pull the reference data based on the listed runs available in the reference station
...
I suppose that if you just ran single station processing on b,c, or d we could circumvent this for now

@timronan
Copy link
Collaborator

timronan commented Dec 4, 2021

@akelbert CAS04_stationxml.xml.zip does not pass validation.

211,Error,8P,CAS04,,,2020-06-02T18:41:43,2020-07-13T21:46:12,Chan: LQE Loc: 2020-07-01T19:36:55 2020-07-13T21:46:12 epoch overlaps with LQE 2020-07-01T19:36:55 2020-07-13T21:46:12

8P.CAS04.LQE 2020-06-02T18:41:43 2020-07-13T21:46:12 is duplicated in CAS04_stationxml.xml.zip . The attached zip file has the updated stationxml file attached.

Also, this stationxml has a different network code than the CAS04 station in the IRIS archive. The archived version of CAS04 is ZU.CAS04 while the stationxml is labeled as 8P.CAS04. This leads to problems when trying extracting time series from the the IRIS archive unless this Network naming conflict is manually resolved.

The attached metadata does not resolve this Network name conflict because I think that ZU may be renamed to 8P but it is important to note this Network naming conflict so we don't accidentally trip over this inconsistency in the future.

CAS04_stationxml_updated.xml.zip

Karl says: This file is corrupt but a verison of it is committed in aurora/tests/cas04/cas04_from_tim_20211203.xml

kkappler added a commit that referenced this issue Jan 16, 2022
Outline of workflow from xml to mth5 started

[Issue(s): #31]
kkappler added a commit that referenced this issue Feb 26, 2022
Modified process_mth5 by creating process_mth5_runs, nearly a copy of process_mth5_run.
This method will take lists of runs to process and eventually supercede process_mth5_run.

[Issue(s): #31, #118, #80]
kkappler added a commit that referenced this issue Mar 4, 2022
Modified process_mth5 to loop over runs and create merged FC object.
Testing ok on decimation level zero.  Now need to add looping over
decimation levels.

[Issue(s): #31, #118, #80]
@kkappler
Copy link
Collaborator Author

kkappler commented Jun 7, 2022

@kujaku11 : Revisiting this dataset in the context of merging runs. The currently archived version is missing filters for electric channels. In particular:

  • channel ex has identical filters for runs a,b,c but no filters at all for run d.
  • channel ey has identical filters for runs a,b but no filters at all for runs c,d.
  • channels hx, hy have all identical filters for all runs a,b,c,d and pass a cursory inspection

If you want to build the mth5 locally to inspect it, this can be done using the new operate_aurora.ipynb in docs/notebooks on main branch. Can you PTAL and see if an updated metadata file with consistent channels can be passed to Tim or Laura for archiving?

In a pinch I will try just overwriting the electric field filters from run "a" onto "c" and "d" in operate_aurora.ipynb and see if I can workaround that way.

kkappler added a commit that referenced this issue Jun 18, 2022
Using the updated method in mth5 locally (see mth5 issue #105),
am now able to process runs c and d for CAS04 as single station.

Working on getting a similar h5 built in tests/cas04

[Issue(s): #31, #80]
@kkappler
Copy link
Collaborator Author

kkappler commented Jul 9, 2022

The specific issue with the metadata (cited above on June 6, 2022) is still under investigation by @laura-iris , and she will be looking at the make_mth5 logic this month. In the meantime, a variation to make_mth5 has been applied on the fix_issue_105 branch of mth5. Details are at https://github.com/kujaku11/mth5/issues/105 & https://github.com/kujaku11/mth5/pull/106.

With this fix in place I am now able to generate mth5 files with all the stations ["CAS04", "CAV07", "NVR11", "REV06"]
and am starting to get remote reference processed results. I will report these results on this issue as they are generated.

kkappler added a commit that referenced this issue Jul 9, 2022
Confirm that 01_make_cas04_mth5.py makes the big 4-station file
Removed other stuff into an ipynb for show and tell

[Issue(s): #31]
@kkappler
Copy link
Collaborator Author

Pasted below are comparisons of the z-files returned by aurora and emtf.
They agree reasonably well at low frequencies. At high frequencies, we know that coherence sorting was used by emtf, and this feature is not yet available in aurora.

Note that NVR11 was processed using a different band setup file due to issue #196

CAS04_RRCAV07
CAS04_RRNVR11
CAS04_RRREV06

@kkappler kkappler removed the invalid This doesn't seem right label Jul 24, 2022
@kkappler
Copy link
Collaborator Author

Closing this issue for now. We may chose to re-open when we address coherence sorting module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants