-
Notifications
You must be signed in to change notification settings - Fork 51
Need to create system status page of the che.openshift.io #1224
Comments
Should we have the root epic openshiftio/openshift.io#4730 under this repository? |
@slemeur having user-story under openshift.io works just fine IMO (I personally do not think we should have it under rh-che since the status service would be a separate repo) |
Does this service allow you to feed metrics or programmatic configuration changes to it? e.g. can you tell it to start monitoring a given route url that doesn't exist yet, and measure time until it does? |
Related to the other need to track openshift route-creation times. Notify service at oc api call start time, let it determine time taken for route to be actually accessible. |
@fche AFAIK, it is planned to be done on che-server side and exposing via prometheus metric - eclipse-che/che#12699 |
cc: @gorkem |
OK, assuming it is in a position to reliably tell whether the routes are externally accessible. |
@fche if we opt for a custom dsaas service the major question is, who will be the primary owner / maintainer ? |
aye, there is the rub But independent of that question, one can work out in greater detail just what info you'd like to see there. |
@fche I believe most of the details are covered in the following user-story - openshiftio/openshift.io#4730 |
What do you think the chances are that many or all of the datasets you are talking about could be rendered entirely as grafana (or perhaps pcp) dashboards? So, assume there is a queriable metric database nearby the rhche server. Assume it's been gathering the status/health metrics being discussed over at openshiftio/openshift.io#4730. Does the "system status" have to be anything other than a preconfigured dashboard - with some combination of graphical or textual forms we can generate? |
I believe everything could be rendered entirely via grafana, but the goal of statuspage is to make it user-friendly, easy to update, easy to notify users, easy to create incident, easy to scheduled maintenance etc. |
Could we think about it as the public status-page being downstream of our internal status dashboards & machinery? i.e., not tightly coupled to che, but rather to a hypothetical dev-console health dashboard? |
IMO, che.openshift.io is a very special case not |
Understood, just trying to minimize number of bits of machinery and maximize reusability. Maybe think of it more like - a running copy of che should have its own health display for benefit of each of its users. Can the public dashboard be another consumer of that same data & maybe even some of the same renderings? |
well, potentially it could, but ideally status page should be deployed separately from the monitored service - if the service is down, status page should be still up with the reported accident (if status page is part of the service itself it would be down together with the service during incident / scheduled maintenance) |
Yup, kind of like a reliable mirror. |
As a prototype, before we do a full proper operator / openshift4 / prometheus flavoured thing, we could perhaps layer a small piece of new code on top of the existing osd-monitor-poc pcp-based infrastructure, to relay metric threshold crossing events to statuspage.io. We'd need to know a sample metric name and threshold predicate, and statuspage.io api credentials. |
@fche will you be able to give a hand with impl. push part in the next sprint (first we need to figure out which metrics are we going to push - hobby plan offers only 2 system metrics, so we need to be picky) ? |
Can indeed help with a quick prototype, presuming building on the present osd-monitor-poc machinery, not major new stuff. It's about as complicated as adding a new outbound zabbix relay. |
Sounds good, I will reach you once I would have more details about params for statuspage API |
Closing this epic since https://che.statuspage.io/ is setup and we have a separate issue for contributing system metrics to statuspage (which is currently not a priority) - #1286 |
Currently there is no status page for
che.openshift.io
which would provide information about the state of the platform. There are many different online services that are providing information about the state of their platform:It was decided instead of creating custom dsaas service use account on https://www.statuspage.io/
sub-tasks:
Related openshift.io user-story - openshiftio/openshift.io#4730
The text was updated successfully, but these errors were encountered: