-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure that only registered cluster workers are asked to report metrics #182
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordering issue (new AggregatorRegistry()
vs. cluster.fork()
) is a breaking change in this PR's current state. It might be sufficient for the worker to retry sending the init message until the master acknowledges, for up to some amount of time. Sort of adds a lot of code...
Didn't comment on the cleanup stuff (stray console.log, linter stuff).
lib/cluster.js
Outdated
that.aliveWorkers.add(workerId); | ||
console.log(that.aliveWorkers); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding another cluster.on("message")
listener, can you do this in addListeners
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do that I might have to move the aliveListeners to be a module-level variable. Would that be ok?
lib/cluster.js
Outdated
worker.send(message) | ||
} | ||
} | ||
that.aliveWorkers = new Set([...that.aliveWorkers].filter(x => !failedWorkers.has(x))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just do this.aliveWorkers.remove(...)
in the loop, instead of making a new Set
instance each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to check the semantics of mutating the Set while iterating, but yes will change.
|
||
Options are: | ||
coordinated If false (default), request metrics from all cluster workers. If true, request metrics only from workers that have required prom-client. | ||
* @param {object?} options object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the only reason this needs to be an option because setting it to true
makes the order of forking vs. new AggregatorRegistry()
matter? I'd rather go for an implementation that isn't sensitive to that, e.g. the worker repeatedly attempts to register with the master until the master acks the registration.
Having a heterogeneous pool of workers (not all of them setting up prom-client) is unusual, but I think the behavior achieved when this is true
is what should happen by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only reason to make it an option is that it would break backwards compatibility.
Given that a metrics client should be as unobtrusive as possible, I would be wary of adding a prolonged discovery phase. I would certainly prefer in my consuming codebase to keep things simple and accept that ordering matters.
In the case of the homogenous cluster, the previous implementation cleanly sidesteps a lot of the issues with “garbage collecting” the workers and nobody else had complained so far :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help: 'Number of connected cluster workers reporting to prometheus', | ||
registers: [registry] | ||
}); | ||
g.set(request.workerCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, however this is very specific to the coordinated option - perhaps it should be exposed only when coordinated is true. Perhaps a similar metric could be added to the default metrics collected that counts all the cluster workers?
(Sorry for the delay, busy week. Will try to send more feedback later today.) |
Any news here? |
Sorry to drop the ball. I'm sorta hesitant to add the complexity to support an unusual use case (heterogeneous workers in a cluster), but this approach works. My preference would be a version that isn't sensitive to the order of PR also still needs some formatting/linting issues addressed. |
d6f2cf6
to
29b6d86
Compare
This is a PR related to #181, as start of discussion.
Tests don't pass yet as I used spaces instead of tabs :)