feat(influxdb): Use stable/influxdb chart #185

jchauncey · 2017-03-24T18:47:34Z

closes #180

This moves to using the influxdb chart in the stable repository
Sets the version to 0.4.0
Adds two pre-upgrade hooks to the base monitor chart. These hooks allow the chart to migrate data from the old volume mount location to the new one provided by the influxdb chart.

Manual Testing Steps:

Notes

This PR does not allow for upgrading from no persistence to persistence.
If the influxdb pod gets stuck in a ContainerCreating state it's normally because the pod has moved from the node that originally hosted the pvc. If you kill the pod it will be reassigned and should start normally. I am going to work on a hook that will automatically do this once I get the majority of this PR done.

Prereqs

The logger-redis-cache secret (you can get this from the workflow chart)
The ability to create a persistent volume
Running at least version 2.3.0 of tiller (helm init --upgrade)

Test InfluxDB persistence Migration

Install master of `deis/monitor`

helm upgrade deis-monitor . --install --namespace deis --set influxdb.persistence.enabled=true
kubectl exec into the running influxdb pod. You will notice that all the data in stored at /data/*

Install this PR of `deis/monitor`

Clone this PR
Run a helm dependency build in charts/monitor
Run helm upgrade deis-monitor . --namespace deis --set influxdb.persistence.enabled=true
While this upgrade is taking place you can do a kubectl get pods --watch --namespace=deis and watch the kill-pods and migrate-data pods do their thing.
kubectl exec into the new running influxdb pod. You should now see data in /var/lib/influxdb/data/*

Test InfluxDB deployment resource is deleted after upgrade/install

We can no longer just turn off parts of the chart. Instead we must delete the deployment resource after install/upgrade. The biggest problem here is that the influxdb chart programmatically builds its name so we rely on the fact that most deployments should be {{ .Release.Name }}-influxdb.

Install

This is to validate that a clean install of this chart will do the right thing with off cluster influx.

Start with a clean cluster
Run helm upgrade deis-monitor . --install --namespace deis --set global.influxdb_location=off-cluster,influxdb.url=http://some.other.url
Validate that influxdb is not running after install is finished
kubectl exec into a running telegraf pod. Check that config.toml has your influxdb url set.

Upgrade

This is to validate that when upgrading from an existing install (no matter its configuration) it will do the correct thing. It should be noted that we will kill off the telegraf pods and let them restart so they can pick up the new configuration of off cluster influx support in case the user was not usign that previously.

Start with an existing monitor install (it doesn't matter its state)
Run helm upgrade deis-monitor . --namespace deis --set global.influxdb_location=off-cluster,influxdb.url=http://some.other.url
Validate that influxdb is not running after upgrade is finished
kubectl exec into a running telegraf pod. Check that config.toml has your influxdb url set.

Test No Persistence -> No Persistence upgrade

Install master of `deis/monitor`

helm upgrade deis-monitor . --install --namespace deis\

Install this PR of `deis/monitor`

Clone this PR
Run a helm dependency build in charts/monitor
Run helm upgrade deis-monitor . --namespace deis
Validate that all pods come up correctly and data is shown in grafana

Test older tiller vs New tiller

Make sure you already have a version of monitor installed since we are only testing the upgrade path here

Install a version older than 2.3.0 of tiller

Clone repo
git checkout a tag (2.2.3 for example)
make build
./bin/tiller
Run helm upgrade deis-monitor . --namespace deis --host localhost:44134 --set influxdb.persistence.enabled=true
Validate that no upgrade hooks were ran by doing kubectl get jobs
The upgrade should proceed and install fine (but it will not migrate the data). Since we have no way to stop the execution of the chart at this point this is the best we can do. @mboersma is this ok??

deis-bot · 2017-03-24T18:47:36Z

@rimusz is a potential reviewer of this pull request based on my analysis of git blame information. Thanks @jchauncey!

vdice · 2017-03-30T22:30:01Z

With respect to Test InfluxDB deployment resource is deleted after upgrade/install, all appears well (expectations met) with the following caveat: the helm upgrade command technically errors out and exits non-zero:

 $ helm upgrade deis-workflow workflow-pr/workflow --version v2.12.1-20170330220532-sha.77e675e --set global.influxdb_location=off-cluster
Release "deis-workflow" has been upgraded. Happy Helming!
Error: deployments.extensions "deis-workflow-influxdb" not found

 $ echo $?
1

Possible to avoid erroring out in this scenario?

vdice · 2017-03-30T22:43:05Z

W/r/t Test InfluxDB persistence Migration:

I tested at the level of a full Workflow install and although I am meeting the expectations listed in the description (new volume locations for storing influxdb data), after Workflow upgrade my grafana instance cannot locate the migrated data (dashboards empty, grafana pod logs show 2017/03/30 22:38:07 http: proxy error: dial tcp 10.131.246.29:80: i/o timeout)

Here were my steps: https://gist.github.com/vdice/d0325647ad4136feb76fa8e9e0a0725e

vdice · 2017-03-30T22:58:13Z

W/r/t Test existing installation with no persistence and upgrade to influxdb with persistence:

When attempting to upgrade, the expectations are not met; namely, the migrate-data job/pod does not finish (see below) and the upgrade fails:

 $ helm upgrade --wait deis-workflow workflow-pr/workflow --version v2.12.1-20170330220532-sha.77e675e -f values-only-influxdb-persistent.yaml
Error: UPGRADE FAILED: timed out waiting for the condition

 $ kd describe po migrate-data-5zzmh-msld4
...
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  2m		1s		11	{default-scheduler }			Warning		FailedScheduling	[SchedulerPredicates failed due to persistentvolumeclaims "deis-monitor-influxdb" not found, which is unexpected., SchedulerPredicates failed due to persistentvolumeclaims "deis-monitor-influxdb" not found, which is unexpected., SchedulerPredicates failed due to persistentvolumeclaims "deis-monitor-influxdb" not found, which is unexpected.]

jchauncey · 2017-04-13T18:44:10Z

This PR is waiting on an upstream change before it can be tested.

closes deis#180 * This moves to using the influxdb chart in the stable repository * Sets the version to 0.4.0 * Adds two pre-upgrade hooks to the base monitor chart. These hooks allow the chart to migrate data from the old volume mount location to the new one provided by the influxdb chart. * If `influxdb_location` is set to `off-cluster` it will delete the deployment resource using a post-install or post-upgrade hook.

jchauncey · 2017-04-27T17:02:22Z

With the k8s bug around binding persistent volumes on pods that have moved nodes I think I will just close this PR for now.

jchauncey added the in progress label Mar 24, 2017

jchauncey self-assigned this Mar 24, 2017

jchauncey added awaiting review and removed in progress labels Mar 24, 2017

vdice self-requested a review March 24, 2017 20:22

jchauncey force-pushed the fix-180 branch 2 times, most recently from a5099a3 to c329e8b Compare March 28, 2017 16:47

jchauncey added in progress and removed awaiting review labels Mar 29, 2017

jchauncey force-pushed the fix-180 branch from c329e8b to 55de1ac Compare March 30, 2017 19:06

jchauncey force-pushed the fix-180 branch 5 times, most recently from 7f55ec5 to c8eb29a Compare April 11, 2017 20:49

jchauncey added this to the v2.14 milestone Apr 13, 2017

jchauncey mentioned this pull request Apr 13, 2017

feat(repo.groovy): Remove influxdb from component list deis/jenkins-jobs#369

Closed

jchauncey force-pushed the fix-180 branch from c8eb29a to f83a491 Compare April 19, 2017 16:20

jchauncey mentioned this pull request Apr 20, 2017

Unable to attach disk already being used in GCE for a job resource kubernetes/kubernetes#44732

Closed

jchauncey force-pushed the fix-180 branch from f83a491 to 6e87efe Compare April 20, 2017 20:02

jchauncey removed the in progress label Apr 27, 2017

jchauncey closed this Apr 27, 2017

jchauncey removed this from the v2.14 milestone Apr 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(influxdb): Use stable/influxdb chart #185

feat(influxdb): Use stable/influxdb chart #185

jchauncey commented Mar 24, 2017 •

edited

Loading

deis-bot commented Mar 24, 2017

vdice commented Mar 30, 2017

vdice commented Mar 30, 2017

vdice commented Mar 30, 2017

jchauncey commented Apr 13, 2017

jchauncey commented Apr 27, 2017

feat(influxdb): Use stable/influxdb chart #185

feat(influxdb): Use stable/influxdb chart #185

Conversation

jchauncey commented Mar 24, 2017 • edited Loading

Manual Testing Steps:

Notes

Prereqs

Test InfluxDB persistence Migration

Install master of deis/monitor

Install this PR of deis/monitor

Test InfluxDB deployment resource is deleted after upgrade/install

Install

Upgrade

Test No Persistence -> No Persistence upgrade

Install master of deis/monitor

Install this PR of deis/monitor

Test older tiller vs New tiller

Install a version older than 2.3.0 of tiller

deis-bot commented Mar 24, 2017

vdice commented Mar 30, 2017

vdice commented Mar 30, 2017

vdice commented Mar 30, 2017

jchauncey commented Apr 13, 2017

jchauncey commented Apr 27, 2017

jchauncey commented Mar 24, 2017 •

edited

Loading

Install master of `deis/monitor`

Install this PR of `deis/monitor`

Install master of `deis/monitor`

Install this PR of `deis/monitor`