Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any chance for replication factor > 1 support? #9

Open
nbrownus opened this issue Nov 2, 2018 · 7 comments
Open

Any chance for replication factor > 1 support? #9

nbrownus opened this issue Nov 2, 2018 · 7 comments

Comments

@nbrownus
Copy link

nbrownus commented Nov 2, 2018

So happy to see someone has picked up the project. Wondering if supporting replication is on the roadmap.

@azhiltsov
Copy link
Member

We currently does not have plans for it as we have stopped using replication factor quite a while ago. We are running setup where carbon-c-relay create multiple copies of the same metrics. This gives us more predictability for their placement on a stores, and more predictable failure scenario when you have more than one store failure. Another pros will be back-filling of data after a failure. With replication factor = 1 and equally sized rings it easy to find the source host for back-fill.

@aptituz
Copy link

aptituz commented May 10, 2019

@azhiltsov That sounds interesting. Could you elaborate a bit on your setup? Do you have multiple carbon clusters for that copy-based "replication"?

@Civil
Copy link
Member

Civil commented May 10, 2019

https://fosdem.org/2017/schedule/event/graphite_at_scale/ - there are more information there. But TLDR:

  1. With replication factor 1 and duplication on Relay level you have 1 to 1 mapping of the hosts. You always will be able to easily find the metric.
  2. It's basically tradeoff of loosing some data and probability of loosing it. With RF >1 - you will always loose some data in case of 2 servers failure. It will be a small portion of data, but still something. With RF=1 but 2 clusters having same data - probability to loose data will be lower (conditional probability), but if it will happen - amount of lost data will be higher.

You can think about that as about Raid5 vs. Raid10, with an exception that if you'll loose n+1 server when you have RF= n you won't loose all the data, but only portion of it.

@azhiltsov
Copy link
Member

@aptituz our carbon-c-relay config snippet would look like this:

cluster carbon_a
    jump_fnv1a_ch replication 1
        10.19.19.24:2003=000
        10.19.20.34:2003=001
   ;
cluster carbon_b
    jump_fnv1a_ch replication 1
        10.18.19.4:2003=000
        10.18.20.4:2003=001
   ;
match ^carbon\.
    send to
        carbon_a
        carbon_b
    stop;

the 'match' section above is what actually creates two replicas.

@aptituz
Copy link

aptituz commented May 14, 2019

And how is this handled on the other side of the stack? Is graphite-web/carbonapi/carbonzipper or whatever is used to query the data just configured to use the hosts of both clusters?

@Civil
Copy link
Member

Civil commented May 14, 2019

You just query all the servers from a carbonzipper.

@aptituz
Copy link

aptituz commented May 14, 2019

Thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants