Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added metrics for master in cluster mode #324

Closed
wants to merge 1 commit into from

Conversation

dfilkovi
Copy link

In cluster mode, only metrics from workers are aggregated, and if you use metrics i master also this isn't shown in aggregated metrics. This fix adds master metrics also.

Copy link
Collaborator

@zbjornson zbjornson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, cluster aggregation was designed to aggregate homogenous data. For example, this will skew nodejs_eventloop_lag_seconds, which is averaged. I guess if you're not collecting those default metrics in the master, then that won't affect the aggregation. The cluster master also isn't generally doing much work--I'm curious: what are you monitoring? At the least, I would note the above (regarding skewing when aggregating the same metrics from the master and workers) in the readme.

Also needs the changelog updated and ideally a test.

if (request.failed) return; // Callback already run with Error.
if (request.failed) return; // Callback already run with Error.

request.responses.push(Registry.globalRegistry.getMetricsAsJSON()); //add metrics of master
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you correct the indentation here please?

@dfilkovi
Copy link
Author

I get it, maybe then this isn't the best option.
Maybe to have an option to turn this on/off because there are many cases in which you want to get master metrics, let's say for example you are receiving some messages from kafka/rabbitmq to your node process and you are passing this messages to clients connected to workers but still want to save them to database, and you could use master process for that and workers for clients.

@zbjornson
Copy link
Collaborator

Going to close this for now, but if you want to update it so it's configurable and documented, happy to reconsider. #280 is a more versatile solution to the same issue however.

@zbjornson zbjornson closed this Jan 23, 2021
@KeithWoods
Copy link

@zbjornson you mentioned this fix would "this will skew nodejs_eventloop_lag_seconds, which is averaged" - is that because each child, by default, doesn't have a pid (process id) label so the master will effectively aggregate/average nodejs_eventloop_lag_seconds across all worker processes as from a prometheus POV it looks like the same metric, thus resulting in a possible skew as the master is generally quite?

If so, depending on the use case the default averaging might not be ideal, however for backwards compatibility #280 would be less breaking and still flexible enough.

For my case, the master is effectively a router, all workers are similar but may have different metric values such as lag depending on their workload, so I've added a pid to each worker. I've made a local fix which effectively is the love child of this pr and #280. This looks to avoid the averaging issue.

nodejs_eventloop_lag_seconds{pid="11215"} 0.002395987  // worker
nodejs_eventloop_lag_seconds{pid="11216"} 0.002386548 // worker
nodejs_eventloop_lag_seconds{pid="11213"} 0.004258793 // master

I've also popped some comments on #280 , I might be able to pick that up and add tests. Still interested in the above question to clarify my understanding (if you have the time :) ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants