Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in handling data read errors in _read_data_from_dfile #549

Open
wangyinz opened this issue Jun 18, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@wangyinz
Copy link
Member

Read the code below first.

if tr.stats.endtime.timestamp != mspass_object.endtime():
message = "Inconsistent endtimes detected\n"
message += (
"Endtime expected from MongoDB document = {}\n".format(
UTCDateTime(mspass_object.endtime())
)
)
message += "Endtime set by obspy reader = {}\n".format(
tr.stats.endtime
)
message += "Endtime is derived in mspass and should have been repaired - cannot recover this datum so it was killed"
mspass_object.elog.log_error(
alg, message, ErrorSeverity.Invalid
)
mspass_object.kill()
# These two lines are needed to properly initialize
# the DoubleVector before calling Trace2TimeSeries
tr_data = tr.data.astype(
"float64"
) # Convert the nparray type to double, to match the DoubleVector
mspass_object.npts = len(tr_data)
mspass_object.data = DoubleVector(tr_data)
# We can't use Trace2TimeSeries because we loose
# all but miniseed metadata if we do that.
# We do, however, need to compare post errors
# if there is a mismatch
if mspass_object.npts > 0:
mspass_object.set_live()
else:
message = "Error during read with format={}\n".format(format)
message += "Unable to reconstruct data vector"
mspass_object.elog.log_error(
alg, message, ErrorSeverity.Invalid
)
mspass_object.kill()

Basically, the data got killed when the endtime doesn't match. The elog message claims that such data will be killed. However, it was later being set alive because the npts is greater than 0, which is mostly true because otherwise it won't be able to calculate the endtime. I can see inside the database the corresponding elog appears to be:

{
   "_id":{
      "$oid":"6670faebf7b0b388f4b5bcd7"
   },
   "logdata":[
      {
         "job_id":{
            "$numberInt":"0"
         },
         "algorithm":"Database._read_data_from_dfile:  ",
         "badness":"ErrorSeverity.Invalid",
         "error_message":"Inconsistent endtimes detected\nEndtime expected from MongoDB document = 2011-04-07T15:17:43.975000Z\nEndtime set by obspy reader = 2011-04-07T15:17:43.975000Z\nEndtime is derived in mspass and should have been repaired - cannot recover this datum so it was killed",
         "process_id":{
            "$numberInt":"3266"
         }
      }
   ],
   "data_tag":"serial_preprocessed",
   "wf_TimeSeries_id":{
      "$oid":"6670f5c8131a86997a8e1d1e"
   }
}

Note that this record is from the earthscope2024 notebook here.

I think all we need is probably change the wording of the elog message to reflect that the error is potentially harmless. Or, we should refine the checking here such that errors within a certain threshold is acceptable (which is the case above).

@pavlis
Copy link
Collaborator

pavlis commented Jun 18, 2024 via email

@wangyinz
Copy link
Member Author

I found it by looking at the records in the elog collection in the database. This mismatch error actually happened pretty frequently from that small dataset. Anyway, I don't think this needs immediate attention, but we do need to fix it. I am just opening the issue here as a bookmark.

@pavlis
Copy link
Collaborator

pavlis commented Jun 20, 2024

I think on the test data set it is related to an error thrown but handled on one of the files. This problem is fundamentally created by the fact we are using obspy's reader to crack miniseed files when running the read_data method of Database but the index read_data uses is created by our custom reader that only indexes the files. Since both use the same underlying miniseed library a working hypothesis is that some packets in the files have intact headers but the compressed data section of some packets have errors that make them impossible to decompress. I think obspy's reader will truncate any trace with such an error at the end of the previous packet. That would explain the behavior if my guess is correct.

@wangyinz wangyinz added the bug Something isn't working label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants