Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chatie API Server Down Accident Report #98

Open
su-chang opened this issue Feb 11, 2023 · 5 comments
Open

Chatie API Server Down Accident Report #98

su-chang opened this issue Feb 11, 2023 · 5 comments

Comments

@su-chang
Copy link

su-chang commented Feb 11, 2023

Token Service Discovery Service Accident

Our wechaty puppet service discovery service has been experiencing out-of-service issues from 3 pm Feb 7.

  1. 10 am Feb 7: notice the disk usage of some instances are abnormal, then clear logs file and make instance keep running right, at the same time the api.chatie.io work well
  2. 3 pm Feb 7: this problem outbreak in the afternoon then we working on it, and found that the http response status code 503 of api.chatie.io
  3. 2 am Feb 8: @huan show some detail info from heroku, see: 🔥🔥🔥 api.chatie.io服务异常,HTTP错误码503 #97 (comment)
  4. 8 am Feb 8: confirm api.chatie.io out-of-service due receive too many requests (init token on api.chatie.io) in few seconds
  5. 9 am Feb 8: find the bug in wechaty-puppet-workpro, one NodeJS Timer function init token on api.chatie.io has not been clear right, and we notice that the only way which could fix this bug temporarily is to restart all containers
  6. 10 am Feb 8: confirm the operation time to restart all containers
  7. 2 pm Feb 8: restart all containers
  8. 2:30 pm Feb8`: the server fully restored
  9. 6 pm Feb 8: create the hotfix PR to fix this problem
  10. 9 pm Feb 8: PR has been merged, and ready to deploy
  11. 0 pm Feb 9: start deploy for some instances
@su-chang
Copy link
Author

TODO

We will continue to deploy the fixed version to rest instances before Feb 15

@huan
Copy link
Member

huan commented Feb 12, 2023

Could you explain point 5: why wechaty-puppet-workpro needs to init token on api.chatie.io?

If I remember correctly, the wechaty service discovery is managed by Wechaty itself?

@su-chang
Copy link
Author

su-chang commented Feb 13, 2023

wechaty-puppet-workpro is based on wechaty-grpc, but not wechaty-puppet-service.

And the logic about init token on api.chatie.io is maintained by workpro.

@huan
Copy link
Member

huan commented Feb 13, 2023

I think this is a bad idea, but I hope it will work well in the future.

The protocol might be changed someday so please be prepared to follow new protocols.

@su-chang
Copy link
Author

Thanks for your advice, we will pay attention to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants