Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List Clusters API call returns Internal Server Error after installation #306

Open
fbalak opened this issue Oct 5, 2017 · 21 comments
Open

Comments

@fbalak
Copy link
Contributor

fbalak commented Oct 5, 2017

I have cluster with 4 nodes. I have installed tendrl and after all was installed I had to wait about 2 minutes before I was able to use tendrl ui because List Clusters API call was returning 500 Internal Server Error. Most of other API calls were working correctly. (I was able to log in and navigate through sites)

After 2 minutes the site and API call started working correctly.

This is how it looks right after installation:
105_server_error

Tested with:

tendrl-commons-1.5.2-20171004T104238.6d5ffef.noarch
tendrl-api-1.5.2-20170927T205654.a9e16c0.noarch
tendrl-ui-1.5.2-20170928T151925.1cdc907.noarch
tendrl-node-agent-1.5.2-20171004T105957.709badc.noarch
tendrl-monitoring-integration-1.5.2-20171005T132717.1b8a4c1.noarch
tendrl-grafana-plugins-1.5.2-20171005T132717.1b8a4c1.noarch
tendrl-notifier-1.5.2-20170927T205657.566bfa0.noarch
tendrl-api-httpd-1.5.2-20170927T205654.a9e16c0.noarch
glusterfs-4.0dev-0.176.gitfabee9d.el7.centos.x86_64
@anivargi
Copy link
Contributor

anivargi commented Oct 7, 2017

@fbalak It would be really helpful if you can provide the API logs when it was in error state.

@anivargi anivargi self-assigned this Oct 7, 2017
@fbalak
Copy link
Contributor Author

fbalak commented Oct 9, 2017

@anivargi Sure, here is /var/log/messages from server where the api is run:
messages-server.txt
If you want access to the machine, I can grant it to you if you send me ssh public key.

@anivargi
Copy link
Contributor

anivargi commented Oct 9, 2017

@fbalak can you share /var/log/tendrl/api/error.log

@fbalak
Copy link
Contributor Author

fbalak commented Oct 9, 2017

@anivargi here: error.log

@anivargi
Copy link
Contributor

anivargi commented Oct 9, 2017

@r0h4n @shtripat looks like TendrlContext is temporarily empty even if the cluster_id is present, which causes the API to fail, I can add a check for that, but does this need to be handled by the backend?

@r0h4n
Copy link
Contributor

r0h4n commented Oct 10, 2017

Assuming you have not imported this cluster, tendrl-node-agent will start syncing all the data after the default "sync_interval" (as given in node-agent.conf.yaml). This is expected behavior and the UI and API should respond appropriately before the first sync completes

If the cluster is a Tendrl managed cluster, then we need to investigate further.

@anivargi
Copy link
Contributor

@fbalak

@fbalak
Copy link
Contributor Author

fbalak commented Oct 12, 2017

@anivargi @r0h4n I have been testing this and from the UI point of view it looks good now:
1012_cluster_error
I am just concerned that the API call still returns 500 Internal Server Error and it provide no information for user about what is happening. If we know where the problem is in the backend can you please update error message and exit code? I would suggest 503 Service Unavailable.

@fbalak
Copy link
Contributor Author

fbalak commented Oct 23, 2017

After discussion with @dahorak we came to conclusion that there should not be any 5xx error. It would be best to return not entirely imported cluster in response and add some loading API attribute that would be reflected in UI but It would also make sense to return code 202 Accepted and return empty list.

@mbukatov
Copy link
Contributor

mbukatov commented Oct 31, 2017

Based on discussion with @r0h4n today, I would propose the following.

Since Tendrl needs to wait for all node agents to send all data required to describe the cluster, the cluster can't be shown immediately. Only when Tendrl has all the data, it makes sense to return the cluster in the api/1.0/clusters api call.

Unfortunately there are few problems with the current behavior:

  • when the admin logs in for the first time, there are no indication that the cluster is being detected
  • the api returns 500 (Internal Server Error) status code while tendrl is collecting the data
  • ui shows just "no clusters detected", as if nothing happened (ignoring 500 http status code)

I don't think it makes sense to return not fully detected cluster in clusters list for multiple reasons:

  • data model tendrl uses doesn't include not fully detected cluster, cluster is either fully detected or it doesn't exist, so listing cluster still being detected in the list would require changes in the data model
  • ui would need to be updated to handle this

So one possible approach would be to return empty list in the clusters call response until the cluster is fully detected. And then to let the user know about cluster detecting taking place, tendrl could send the notification about this.

@mbukatov
Copy link
Contributor

I searched in the design documents and the ui design doesn't anticipate with any delay in detection of the cluster for import: https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/244820318

@julienlim How would you suggest to address this delay in cluster detection? Would the notification as suggester be ok?

@mbukatov
Copy link
Contributor

@fbalak @dahorak I don't think that returning 202 Accepted makes sense, as the cluster detection is not triggered by clusters api call.

@mbukatov
Copy link
Contributor

mbukatov commented Oct 31, 2017

@fbalak I disagree with your description in #306 (comment) - showing spinner during cluster detection looks much more reasonable compared current 500 status code which is ignored by the ui. Unfortunately it seems that the 500 status code was misused to indicate that the cluster detection is running, which is not ok.

@dahorak
Copy link
Contributor

dahorak commented Nov 1, 2017

@mbukatov I agree with you, I suggested @fbalak 200 OK with empty clusters list (if it is not possible to mark cluster as not fully detected).

@julienlim
Copy link
Member

Current takeaway on answers we need to the following questions:

  • How to convey the information about the ongoing detection on the api level
  • How to show this ongoing detection in the UI; it would need be clear for a first time user as well. Note: Troubleshooting section in documentation should also mention / address this.

@Tendrl/tendrl-qe @[email protected] @mcarrano @julienlim

@mbukatov
Copy link
Contributor

mbukatov commented Nov 1, 2017

Thinking about my previous suggestion to use notification: sending notification that tendrl is detecting a cluster makes sense from backend perspective, but could we use this notification to control when to show spinner in the ui?

@r0h4n
Copy link
Contributor

r0h4n commented Nov 1, 2017

  • Tendrl can provide notifications as to whenever cluster detection is in progress or is complete. It can also set a "detection_in_progress" flag (per node since we havent detected any cluster yet).

@Tendrl/tendrl-core do you guys think we should add this right away or wait for the next release?

@r0h4n
Copy link
Contributor

r0h4n commented Nov 8, 2017

More updates on this issue discussed in today's daily meeting

  • We need help from @julienlim to understand possibilities of how to highlight the cluster detection progress in case of single cluster and in case of multiple clusters.

@julienlim
Copy link
Member

@Tendrl/tendrl-core @Tendrl/tendrl-qe @mcarrano @r0h4n

Here's our recommendation from a UI perspective:

(1) When there is only 1 cluster that is being detected (but not yet ready for importing), show the following blank slate:

Title: Cluster detected
Summary: Tender server is connecting to the Tendrl node agents to detect the cluster resources.

(2) When there are multiple clusters and a detected cluster shows up in the Cluster list (but is not yet ready for Import), in the column before the button, show the following message “Cluster detection in progress” and disable Import button. When cluster detection has completed, then enable the “Import” button. Currently, the status of the cluster prior to being imported shows a “?” which is probably still okay to leave as-is.

If there's any other scenarios that I've not addressed, please advise and we can come up with other recommendations accordingly.

@anivargi
Copy link
Contributor

anivargi commented Nov 9, 2017

Fixed in #332

@shirshendu
Copy link

@julienlim This issue is pretty old. Is it still persisting for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants