Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Registration Failures During Node Registration #24461

Open
mr-karan opened this issue Nov 14, 2024 · 1 comment
Open

Service Registration Failures During Node Registration #24461

mr-karan opened this issue Nov 14, 2024 · 1 comment

Comments

@mr-karan
Copy link
Contributor

Nomad version

1.7.7

Operating system and Environment details

  • Running in an AWS environment
  • Ubuntu 24.04

Issue

Service registration errors and task failures occurring during node registration.

Reproduction steps

  1. Node starts registration process
  2. Multiple service registration deletion attempts fail
  3. Template rendering issues occur for HAProxy peer service
  4. Sibling task failures cascade to other services

Expected Result

  • Clean node registration
  • Successful service registration management
  • Proper template rendering for HAProxy peer service
  • Successful task execution without cascading failures

Actual Result

Multiple cascading failures observed:

  1. Service registration errors:
[ERROR] client.rpc: error performing RPC to server: error="rpc error: rpc error: service registration not found"
  1. Template failures:
Missing: nomad.service(haproxy-peer)
  1. Task failures:
Setup Failure: failed to setup alloc: pre-run hook "group_services" failed: no servers
  1. Forced termination:
Exit Code: 0, Exit Message: "executor: error waiting on process: rpc error: code = Canceled desc = grpc: the client connection is closing"

Nomad Client logs

Nov 14 08:11:51 [INFO]  agent: (runner) starting
Nov 14 08:11:51 [ERROR] client.rpc: error performing RPC to server: error="rpc error: rpc error: service registration not found" rpc=ServiceRegistration.DeleteByID server=172.31.2.217:4647
Nov 14 08:11:51 [INFO]  client.service_registration.nomad: attempted to delete non-existent service registration: service_id=_nomad-task-d90c47ce-f4be-0fa3-e019-5d1b522e64a1-group-haproxy-default-haproxy-peer-haproxy-peer-net namespace=kite
Nov 14 08:12:01 [INFO]  client: node registration complete

Nomad Alloc Events Timeline

Nov 14, '24 08:10:36 - Terminated (Exit Code: 0)
Nov 14, '24 08:10:35 - Killing (Sent interrupt, 5s grace period)
Nov 14, '24 08:10:33 - Template Missing: nomad.service(haproxy-peer)
Nov 14, '24 08:10:30 - Sibling Task Failed (prepare-logging-setup)
Nov 14, '24 08:10:30 - Setup Failure (group_services hook failed)
Nov 14, '24 07:31:19 - Started

The primary issue appears to be related to service registration and template rendering failures, particularly affecting HAProxy peer services. This is causing cascading failures across dependent services and tasks.

@tgross
Copy link
Member

tgross commented Nov 14, 2024

@mr-karan if the node hasn't registered yet, how is it running services? Is this a node that was running services and then restarted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants