Added metrics for master in cluster mode #324

dfilkovi · 2020-01-29T16:50:32Z

In cluster mode, only metrics from workers are aggregated, and if you use metrics i master also this isn't shown in aggregated metrics. This fix adds master metrics also.

zbjornson

Hrm, cluster aggregation was designed to aggregate homogenous data. For example, this will skew nodejs_eventloop_lag_seconds, which is averaged. I guess if you're not collecting those default metrics in the master, then that won't affect the aggregation. The cluster master also isn't generally doing much work--I'm curious: what are you monitoring? At the least, I would note the above (regarding skewing when aggregating the same metrics from the master and workers) in the readme.

Also needs the changelog updated and ideally a test.

zbjornson · 2020-01-30T23:04:20Z

lib/cluster.js

-					if (request.failed) return; // Callback already run with Error.
+                    if (request.failed) return; // Callback already run with Error.
+
+                    request.responses.push(Registry.globalRegistry.getMetricsAsJSON()); //add metrics of master


Can you correct the indentation here please?

dfilkovi · 2020-02-15T11:23:16Z

I get it, maybe then this isn't the best option.
Maybe to have an option to turn this on/off because there are many cases in which you want to get master metrics, let's say for example you are receiving some messages from kafka/rabbitmq to your node process and you are passing this messages to clients connected to workers but still want to save them to database, and you could use master process for that and workers for clients.

zbjornson · 2021-01-23T18:19:30Z

Going to close this for now, but if you want to update it so it's configurable and documented, happy to reconsider. #280 is a more versatile solution to the same issue however.

KeithWoods · 2022-01-04T06:56:03Z

@zbjornson you mentioned this fix would "this will skew nodejs_eventloop_lag_seconds, which is averaged" - is that because each child, by default, doesn't have a pid (process id) label so the master will effectively aggregate/average nodejs_eventloop_lag_seconds across all worker processes as from a prometheus POV it looks like the same metric, thus resulting in a possible skew as the master is generally quite?

If so, depending on the use case the default averaging might not be ideal, however for backwards compatibility #280 would be less breaking and still flexible enough.

For my case, the master is effectively a router, all workers are similar but may have different metric values such as lag depending on their workload, so I've added a pid to each worker. I've made a local fix which effectively is the love child of this pr and #280. This looks to avoid the averaging issue.

nodejs_eventloop_lag_seconds{pid="11215"} 0.002395987  // worker
nodejs_eventloop_lag_seconds{pid="11216"} 0.002386548 // worker
nodejs_eventloop_lag_seconds{pid="11213"} 0.004258793 // master

I've also popped some comments on #280 , I might be able to pick that up and add tests. Still interested in the above question to clarify my understanding (if you have the time :) ).

Added metrics for master in cluster mode

5625712

zbjornson requested changes Jan 30, 2020

View reviewed changes

zbjornson mentioned this pull request Feb 24, 2020

Added opportunity add any registry to merge with clusters #280

Open

zbjornson closed this Jan 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added metrics for master in cluster mode #324

Added metrics for master in cluster mode #324

dfilkovi commented Jan 29, 2020

zbjornson left a comment

zbjornson Jan 30, 2020

dfilkovi commented Feb 15, 2020

zbjornson commented Jan 23, 2021

KeithWoods commented Jan 4, 2022

Added metrics for master in cluster mode #324

Added metrics for master in cluster mode #324

Conversation

dfilkovi commented Jan 29, 2020

zbjornson left a comment

Choose a reason for hiding this comment

zbjornson Jan 30, 2020

Choose a reason for hiding this comment

dfilkovi commented Feb 15, 2020

zbjornson commented Jan 23, 2021

KeithWoods commented Jan 4, 2022