Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_data fails on file with fs == 0 #81

Open
tclements opened this issue Feb 22, 2021 · 1 comment
Open

read_data fails on file with fs == 0 #81

tclements opened this issue Feb 22, 2021 · 1 comment

Comments

@tclements
Copy link
Contributor

With SeisIO v1.2.0

(@v1.5) pkg> st
Status `~/.julia/environments/v1.5/Project.toml`
  [b372bb87] SeisIO v1.2.0

I found an mseed file with 113 different unique sampling rates.

using SeisIO
SeisIO.SEED.scan_seed("/home/timclements/CIHEC__BHZ___1999289.ms")

        CHANNEL |       N PTS | N GAPS | N FS 
----------------+-------------+--------+-----
    CI.HEC..BHZ |     1359978 |    151 |  113 

1-element Array{String,1}:
 "CI.HEC..BHZ, nx = 1359978, ngaps = 151, nfs = 113"

When trying to read the file, I get a BoundsError:

S = read_data("mseed","/home/timclements/CIHEC__BHZ___1999289.ms")
ERROR: BoundsError: attempt to access 65535-element Array{Float32,1} at index [0]
Stacktrace:
 [1] getindex at ./array.jl:809 [inlined]
 [2] SEED_Steim!(::IOStream, ::SeisIO.SeisIOBuf, ::UInt16) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/1_mSEEDdec.jl:193
 [3] parserec!(::SeisData, ::SeisIO.SeisIOBuf, ::IOStream, ::Int64, ::Int64, ::Bool, ::Int64) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/2_parserec.jl:246
 [4] parsemseed!(::SeisData, ::IOStream, ::Int64, ::Int64, ::Bool, ::Int64) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/readmseed.jl:11
 [5] read_mseed_file!(::SeisData, ::String, ::Int64, ::Int64, ::Bool, ::Bool, ::Int64) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/readmseed.jl:23
 [6] read_data!(::SeisData, ::String, ::String; cf::String, full::Bool, jst::Bool, ll::UInt8, memmap::Bool, nx_add::Int64, nx_new::Int64, strict::Bool, swap::Bool, v::Int64, vl::Bool) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Wrappers/read_data.jl:120
 [7] read_data(::String, ::String; full::Bool, cf::String, jst::Bool, ll::UInt8, memmap::Bool, nx_add::Int64, nx_new::Int64, strict::Bool, swap::Bool, v::Int64, vl::Bool) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Wrappers/read_data.jl:314
 [8] read_data(::String, ::String) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Wrappers/read_data.jl:313
 [9] top-level scope at REPL[12]:1

Checking this out with obspy shows that there are some data gaps with sampling rate == 0 Hz:

import obspy 
import numpy as np
st = obspy.read("CIHEC__BHZ___1999289.ms")
print(st[39].stats)
         network: CI
         station: HEC
        location: 
         channel: BHZ
       starttime: 1999-10-16T09:46:43.675537Z
         endtime: 1999-10-16T09:46:43.675537Z
   sampling_rate: 0.0
           delta: 0
            npts: 0
           calib: 1.0
         _format: MSEED
           mseed: AttribDict({'dataquality': 'D', 'number_of_records': 1, 'encoding': 'STEIM1', 'byteorder': '>', 'record_length': 512, 'filesize': 2440192})

# find unique sampling rates 
np.unique([tr.stats.sampling_rate for tr in st])
array([  0.,  20.])

# how many samples have fs == 0 
np.sum([tr.stats.sampling_rate == 0. for tr in st])
56

I only found this once while reading a few million files, so this is not a pressing error at all but the file might be good to include in the test set. mseed file attached here:
mseed.zip

@jpjones76
Copy link
Owner

jpjones76 commented Mar 23, 2021

I'll have to check the file manually to see what's going on with sample rate. If field 11 of the data header is 0.0, and field 10 isn't, I think it's not valid SEED. If field 10 is 0.0, there's supposed to be special handling. I've never coded that case because I've never encountered an example that mixed opaque data with normal data on the same channel. [Per the SEED manual, field 10 = 0.0 implies that the data record contains opaque data, i.e., Blockette 2000.]

It's funny that you should mention it as a one in a million occurrence, in fact: opaque data is so rare that I've only seen it in one test file, which I had to ask Chad Trabant at IRIS to find. Even he couldn't find an example immediately, and the test file he found contained only opaque data for the channel -- so this file is the first case I've heard of where a single channel mixes time-series and opaque data. I actually wonder how the IRIS seed reader handles it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants