-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataless mth5 from fdsn #151
Comments
@kujaku11 After re-running with the latest updates here is the summary: Identified 6 exception types
which are getting parsed as bogus station names
and processing Tag: RET11a_RET11_AZV19_TXD27_TXD25_AZU19-RET11a_U19-RET11a_RET11_AZV19_TXD27_TXD25_AZU19-NVT11a_COS21_RER14_NMX2 so it is probably supposed to be NMX21
Below is pasted a condensed output of error messages.
|
Re-ran this test on 20230720, using wildcards ["F", "Q"] as channel selection. Results were:
TOTAL #Exceptions 519 of 2007 Cases However, only two channels are getting saved to the h5, probably something to do with the length of the request df when wildcards are used. I'll look into this. |
When wildcards are not used, and explicit channel lists are passed the stats improve slightly: There were 2007 unique network-station pairs in 2007 rows {'IndexError': 227, 'ValueError': 18, 'AttributeError': 158, 'TypeError': 1, 'FDSNTimeoutException': 1, 'MTValidatorError': 1}TOTAL #Exceptions 406 of 2007 Cases 02_exceptions_summary_2023-07-22_082438.txt |
@kujaku11 Ignoring wildcards for now, there is a more fundamental problem, which is that the filters in many of the mth5 are not correct. I added two more columns to the widescale dataless mth5 building summary dataframe. Importantly, the pasted example, for network EM, and station ORF08 shows that Also, in many of the files (including CAS04 which processes in agreement with SPUD TFs) there are 6 stages for E-fields and 3 for magnetics. So there is also a question about why the filters are stored in different ways, but the first and foremost problem is to sort out why is the second stage of ey being ignored sometimes. *CAS04 looks like: {'ex': 6, 'ey': 6, 'hx': 3, 'hy': 3, 'hz': 3}
|
The problem with not getting all the filters entered was in:
This function parses the filters from obspy into mt filters The method Also, note that if the filter has_a name, we do not assess whether or not it is new, ... is this OK?? |
The method _add_filter_number() was issuing redundant numbers when encountering multiple new, nameless filters. I tried to fixed this and created a branch (fix_issue_151) Also, note that if the filter has_a name, we do not assess whether or not it is new, ... is this OK?? I think so, because if it has_a name, it will be registered in ch_filter_dict which is returned by _xml_response_to_mt and then merged into the survey before the next channel is processed. Relates to issue #151
@kkappler |
@kkappler |
@kkappler The |
@kkappler As for the filters, this is a |
Re-ran data-less wide-scale build Sat 19 Aug, 2023. 5 errors. I think 2 are moot, and will confirm. and will debug another this week, leaving 2 errors I don't understand. =============================================== *** EXCEPTIONS SUMMARY *** Identified 5 exception types 5 instances of XMLSyntaxError, with 1 unique error(s)
However, rerunning the code the next day, EM_IDK11 seems to have built fine. Weird. Makes me wonder if this is some error during transmission of data, and sometimes a malformed xml is received? I will start tracking which Network-Station pairs give the error, to see if there is a pattern.
['value 361.4 out of bounds (0, 360)' 'value 362.6 out of bounds (0, 360)' 377 instances of NotImplementedError, with 2 unique error(s) 1 instances of TypeError, with 1 unique error(s) There were 2005 unique network-station pairs in 2005 rows {'XMLSyntaxError': 5, 'IndexError': 196, 'ValueError': 18, 'NotImplementedError': 377, 'TypeError': 1} (https://github.com/kujaku11/mt_metadata/files/12386965/02_exceptions_summary_2023-08-19_171631.txt) |
Rerunning with latest updates yields:
@kujaku11 can you try parsing the xml for 8P, REU09? (example link) |
The To handle this, I have created a custom error in the earthscope test_utils called DataAvailabilityError, this is triggered when Laura's data availability tables show no available data. There were 196 (out of 2005) instances of this error. All 196 were checked to give 404 Not Found when the following url was checked This brings the current set of errors down to: Which are both understood. The only remaining were: |
@kkappler I was able to parse |
@kujaku11 I confirm 8P_REU09.h5 builds. The only outstanding case is now 8P REX11. |
@kkappler Can you confirm 8P_REX11 stationxml can be read. I think the issue with building the H5 is that channel LQN only has a tag for run d. I don't know if this is a mistake in the metadata or that LQN was only recorded for run d. This is what the metadata from IRIS says. And here's the data availability:
So it looks like LQN was only recorded for |
@kkappler I was able to build an H5 for REX11. Here's the channel summary. Can you confirm or refute the building of REX11? |
As part of widescale testing on earthscope, one exercise undertaken was to try to create mth5s from the fdsn StationXML served by IRIS/Earthscope.
The iteration was ultimately sourced by stations scraped from SPUD transfer functions.
Stations and network codes were directly extracted from the "data" XML files (files that were accessed from URLs of the form: https://ds.iris.edu/spudservice/data/) in all cases that the string "mda" was grepped.
Moreover, any remote stations listed in said files were also added to the iterator.
In all, on a first pass, 2007 independent cases were identified.
When a remote-reference station was accessed, because the TF XML do not associate a network code, it was assumed to be in the same network as the "primary" station (the one associated with the transfer function).
The results of the first-pass attempt to build these 2007 mth5 files (with metadata only) were recorded in a dataframe and are linked to this ticket in the following table. 02_local_metadata_coverage.csv
In summary, 1631 h5 files were built successfully and 376 exceptions were encountered.
The exception type value counts are as follows:
IndexError
cases likely represent data were not found. This could be for a few reasons, such as:Here is a sample URL to the response level metadata for Network=EM, station=FL001, which has the IndexError:
https://service.iris.edu/fdsnws/station/1/query?net=EM&sta=FL001&level=response&format=xml&includecomments=true&includeavailability=true&nodata=404
There were 78 unique
AttributeError
messages, all associated with the "Person" object. A few are listed below,a full set can be seen by reading in the csv and calling unique on the AttributeError rows, i.e.:
Here is a sample URL to the response level metadata for Network=8P, station=CAV09, which has the AttributeError:
https://service.iris.edu/fdsnws/station/1/query?net=8P&sta=CAV09&level=response&format=xml&includecomments=true&includeavailability=true&nodata=404
The
XMLSyntaxError
cases had only a single unique value:"Start tag expected, '<' not found, line 1, column 1"
Here is a sample URL to the response level metadata for Network=EM, station=NEN28, which has the XMLSyntaxError:
https://service.iris.edu/fdsnws/station/1/query?net=EM&sta=NEN28&level=response&format=xml&includecomments=true&includeavailability=true&nodata=404
The
ValueError
cases were all of the form:Here is a sample URL to the response level metadatat for Network=EM, station=CON25, which has the ValueError:
https://service.iris.edu/fdsnws/station/1/query?net=EM&sta=CON25&level=response&format=xml&includecomments=true&includeavailability=true&nodata=404
And the lone
TypeError
was:Here the URL to the response level metadata for Network=8P, station=REU09, which had the TypeError:
https://service.iris.edu/fdsnws/station/1/query?net=8P&sta=REU09&level=response&format=xml&includecomments=true&includeavailability=true&nodata=404
The text was updated successfully, but these errors were encountered: