-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support balancing amongst multiple Clickhouse shards (multiple DSNs in config) #189
Comments
I didn't get the dataflow architecture. Does it look like nginx -> otel-collectors -> Clickhouse? You should not need to loadbalance ingestion to clickhouse shards. The distributed clickhouse cluster does that using a shard key. You should just need to declare multiple shards in the clickhouse config for cluster and for k8s is even simpler. I don't see the need to load balance via nginx. Are you trying to create multi-region cluster of clickhouse and ingest data to the shards of the region of the applications? Or are you having multiple clickhouse clusters? Please share more details on the use case and the need along with data flow to help us understand more. Also, share the infra where you are running SigNoz. |
@srikanthccv Do you have context on this? |
Hi! Happy to try and answer questions as I understand them; although I don't have the entire picture by myself … I want the usual things — mostly resilience against failure of a single node and/or maintenance downtime; avoiding write amplification, especially in cascading failure-states (since this is telemetry data, I'll admit to a slight fear of backpressure causing the "problem solving tool" to turn into a "problem exacerbating tool" once everything's on fire); query perf (don't want some %age of reads hitting a node that's overloaded with all of the entire cluster's writes) … also, we're not really sure exactly how much data we're going to want to be ingesting yet, but we're sure it will be a A Lot; so there might be throughput limitations on a single node. To be fair, we're entirely new to storing telemetry in Clickhouse, this is still an experimental foray; our CH expertise lies in an entirely different dataset, with different constraints. Perhaps this is silly premature optimization; but it's also something worrying-enough to us (and with which we have enough experience in another domain) to be a bit careful. Arch questions: currently one cluster of ~80 nodes dedicated to telemetry work, all in a single DC; I doubt we'll stand up another. Data-flow: For the moment, due to #156, we're playing with 5 non-Signoz
So, all very messy, and somewhat temporary; we're still feeling this technology out right now. 😓 Hope that answers your questions! |
Just following up on this; let me know y'all's thoughts. (= |
@ELLIOTTCABLE just to make sure I follow you, you have a big cluster with tens of nodes and balance the ingestion load on the nodes, as in don't overload one node to do the work of receiving and distributing the data to other shards/servers. I am highlighting that to avoid any confusion of data balancing that happens with the I would prefer not to have the balancing logic in the exporter, mainly because
Adding HTTP protocol support should be fairly straightforward and I think that is the best choice for supporting the usecase. |
We'd like to balance writes from the SigNoz ingestor to multiple Clickhouse shards.
Our preferred approach is to use the HTTP Clickhouse-protocol behind NginX (see #188); but another alternative would be to explicitly balance in the
clickhousetracesexporter
across multiple Clickhouse-TCP connections.The text was updated successfully, but these errors were encountered: