-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IOOS catalog data services health monitor #60
Comments
The previous IOOS Catalog was renamed to the Service Monitor and is still operational: https://monitor.ioos.us/ I wasn't able to find a service from that URL posted, but there is an FVCOM Nowcast here: https://monitor.ioos.us/services/590765c1b46a3d0055e83e36 - not sure if this is the same one. Note: The service monitor considers it down because it won't respond to the .das request in 60s - testing from here it returns in around 70s. |
@daf, excellent. How does one go about changing the characteristics of individual tests? (e.g. changing the .das request to 80 s instead of 60s for that dataset) Is there a repo where we can submit a PR? |
I don't think it currently has a per-service specific timeout - it's just a global. See here: Low-hanging fruit here would be to make the second timeout equal to the first (they should likely be equal). For a per-service timeout, you'd have to add the field to the Service model, have some way of setting it (via Admin area?) and use it in the same harvest code linked above. A much more involved process. |
@rsignell-usgs @daf Just FYA, we're considering retiring the Service Monitor site altogether at some point in the future. It's unclear what the need is for this capability, and some of the functionality is duplicated by the Axiom Sensor Dashboard (for SOS services anyway). The Service Monitor has caused issues with RA data services in performing its monitoring/harvesting routines in the past, which is an issue for the RAs, and also there is maintenance overhead with keeping it online. We hadn't had many requests for this capability recently, until this one. The Service Monitor has been rather neglected the past few years, in terms of development focus, and as far as we know, end user usage. Basically, we have to figure out what the most logical path forward is, given limited resources. The Sensor Dashboard does quite well monitoring SOS services. Can that be extended to also check OPeNDAP and/or ERDDAP service endpoints too? Is this in Axiom's plan for the Dashboard already? From what I've heard, this might be an easier solution than continuing to develop/extend and support the existing Service Monitor. Open to suggestions however. How necessary is the capability to monitor service health, and what information do we need when we talk about monitoring (somewhere to go to to see past uptime for a service, daily uptime emails of all services to an email list - as has been done in the past, instant alerting to a POC if a service goes down)? cc: @kwilcox @shane-axiom @ericmbernier @benjwadams @kknee @dpsnowden @kbailey-noaa |
I recall the service monitor had an ultimate goal of being able to automate the IOOS Asset Inventory process. Is that still the case? |
@kwilcox We still have that as a goal, it may be just that we make a new tool to do that, partially based on the Service Monitor code. It hasn't really been decided at this point. Ideas or thoughts welcome. Either way, it will use the metadata in the Catalog for content, so there's reason to make sure that is maintained as best it can be. |
The basic use case here is: I'd like to know before I run a catalog-driven workflow which services are failing. Every hour we could test the health of the main service endpoints (e.g. THREDDS, SOS, ncWMS server endpoints). We could write a script to extract those endpoints from a catalog search to a list which is crawled. Then we could crawl all the datasets on a more leisurely basis (however long it takes), perhaps just doing them sequentially and then starting again. Kind of like painting the Golden Gate Bridge. Would this take a few days? |
Do we have tools to monitor the health of the data services in the IOOS catalog?
@ocefpaf just discovered this morning that his demo wasn't working because
http://crow.marine.usf.edu:8080/thredds/dodsC/FVCOM-Nowcast-Agg.nc.html
is broken. The THREDDS endpoint is there, but the opendap access to this dataset isn't working, giving java.io.EOFException error (yes, it has been reported to the provider).
Unfortunately this is a common occurrence, and it would be great to know about problems in advance instead of finding them out when we want to give demos!
The text was updated successfully, but these errors were encountered: