-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How is etcd important for coreos setup? #658
Comments
At the moment, there isn't much dependency on etcd. the only role it is really playing (at the moment) is to be used in an OS 'upgrade' via the cluster getting an etcd-lock (to ensure all machines are not rebooted at once in the cluster). In the future we could do more dynamic bootstrapping of the machines allowing them to discover each other better (rather than relying on ansible host / group variables to bootstrap the cluster at launch time). When we ported over the ansible cfg to coreOS we deliberately didnt go the whole hog on making everything fully dynamic just to try and ease the transition. we'd be up for the discussion, it depends how much we want to couple the project to coreOS as the only place we can deploy to. |
Hmm. I'd vote for removing dynamic part from core setup, and focus on dynamic aspect of mesos cluster. Rolling updates can be probably handled well at the terraform level (or even a script to taint and redeploy machines master machines one by one, including mesos graceful shutdown). I find it less magical and more explicit than rolling update in etcd cluster.. All to all, instead of handling discoverability at two places (etcd, consul) and fleet setup at two places (fleetd, mesos), I'd focus on consul and mesos, and make base setup more static.. Especially I can see value in deploying slaves with CentOS 7 or Ubuntu 16.04 LTS as a base system (both support systemd). |
I think the only requirement for host system for Apollo should be: network, storage (like glusterfs or ceph), systemd, docker support.. All the rest can be handled with proper docker services in systemd, no? |
we currently support rolling updates for different parts of the platform via two ansible playbooks |
Well, don't see how etcd and fleetd plays a role in bootstrapping the cluster. There are two level of clustering here:
According to CoreOS documentation "The update service is an optional hosted service provided by CoreOS and is not included in a standard CoreOS cluster" and starts at $995/month what is pretty sad requirement of an open source product. Of course you could setup setup something open source like core roller, but it's not officially supported solution, questionable for production deployment. On the other hand tainting instances in terraform, and re-deploying in serial with script, is far more straightforward solution that you could potentially use at the same time as ansible scripts updating containers Other than coreupdate feature you don't use etcd and fleetd for nothing. They just expose additional complexity to handle, so why bother? |
OK, I see etcd is only used in one place: to bootstrap consul cluster in https://github.com/Capgemini/Apollo/blob/devel/roles/consul/templates/consul-discovery.service.j2 But you already have this information (IP addresses of master nodes) from terraform output. You don't need etcd to discover initial set of servers. Even if you don't want to use them, there's free discovery service hosted by atlas as described in https://www.consul.io/docs/guides/bootstrapping.html |
you are right, we dont need fleet, i will raise a ticket to disable this. |
I think digital ocean automatically updates coreos-stable channel, and you can just explicitly update ami on aws if you decide to upgrade.. As for initial bootstrapping goes id's just use |
Cant see the usecase for fleet at the minute |
Raised #659 to turn off fleet - agreed it's not necessary. I don't think there is a simple answer (at the moment) on the etcd issue as it could depend on how we would like to do autoscaling of instances in the future. I would agree that at the moment it's not that necessary but i'm not fully sold (yet) on the idea of handling the updates solely via terraform taint etc... If we were to do much more dynamic autoscaling of instances etcd could come into it's own in a situation like that where we can dynamically adjust and configure the cluster at scale time without the need for interaction with an external service (e.g. ansible). It might be worth exploring how we would tackle autoscaling of instances first before making a decision on whether to ditch etcd or not. It could be that it turns out autoscaling would be real easy just hooking straight into a launch config with ansible and we don't need etcd. In that case i'd be in favour of removing it for simplicity. If we were to keep etcd we should move to the "production cluster" setup where we'd have consensus achieved by the "master" node set (3/5 servers) and each slave would be an etcd worker. |
What about alternative approach of deploying: instead using ansible to provision hosts, just generate |
Hey,
I see Apollo uses consul for service discovery and the services are deployed with ansible. Then how important is initial etcd cluster and fleet daemon? Isn't currently the only role of coreos to provide consistent base system with systemd support? Can we just disable etcd and fleet?
The text was updated successfully, but these errors were encountered: