Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(influxdb): Use stable/influxdb chart #185

Closed
wants to merge 1 commit into from

Conversation

jchauncey
Copy link
Member

@jchauncey jchauncey commented Mar 24, 2017

closes #180

  • This moves to using the influxdb chart in the stable repository
  • Sets the version to 0.4.0
  • Adds two pre-upgrade hooks to the base monitor chart. These hooks allow the chart to migrate data from the old volume mount location to the new one provided by the influxdb chart.

Manual Testing Steps:

Notes

  • This PR does not allow for upgrading from no persistence to persistence.
  • If the influxdb pod gets stuck in a ContainerCreating state it's normally because the pod has moved from the node that originally hosted the pvc. If you kill the pod it will be reassigned and should start normally. I am going to work on a hook that will automatically do this once I get the majority of this PR done.

Prereqs

  • The logger-redis-cache secret (you can get this from the workflow chart)
  • The ability to create a persistent volume
  • Running at least version 2.3.0 of tiller (helm init --upgrade)

Test InfluxDB persistence Migration

Install master of deis/monitor

  • helm upgrade deis-monitor . --install --namespace deis --set influxdb.persistence.enabled=true
  • kubectl exec into the running influxdb pod. You will notice that all the data in stored at /data/*

Install this PR of deis/monitor

  • Clone this PR
  • Run a helm dependency build in charts/monitor
  • Run helm upgrade deis-monitor . --namespace deis --set influxdb.persistence.enabled=true
  • While this upgrade is taking place you can do a kubectl get pods --watch --namespace=deis and watch the kill-pods and migrate-data pods do their thing.
  • kubectl exec into the new running influxdb pod. You should now see data in /var/lib/influxdb/data/*

Test InfluxDB deployment resource is deleted after upgrade/install

We can no longer just turn off parts of the chart. Instead we must delete the deployment resource after install/upgrade. The biggest problem here is that the influxdb chart programmatically builds its name so we rely on the fact that most deployments should be {{ .Release.Name }}-influxdb.

Install

This is to validate that a clean install of this chart will do the right thing with off cluster influx.

  • Start with a clean cluster
  • Run helm upgrade deis-monitor . --install --namespace deis --set global.influxdb_location=off-cluster,influxdb.url=http://some.other.url
  • Validate that influxdb is not running after install is finished
  • kubectl exec into a running telegraf pod. Check that config.toml has your influxdb url set.

Upgrade

This is to validate that when upgrading from an existing install (no matter its configuration) it will do the correct thing. It should be noted that we will kill off the telegraf pods and let them restart so they can pick up the new configuration of off cluster influx support in case the user was not usign that previously.

  • Start with an existing monitor install (it doesn't matter its state)
  • Run helm upgrade deis-monitor . --namespace deis --set global.influxdb_location=off-cluster,influxdb.url=http://some.other.url
  • Validate that influxdb is not running after upgrade is finished
  • kubectl exec into a running telegraf pod. Check that config.toml has your influxdb url set.

Test No Persistence -> No Persistence upgrade

Install master of deis/monitor

  • helm upgrade deis-monitor . --install --namespace deis\

Install this PR of deis/monitor

  • Clone this PR
  • Run a helm dependency build in charts/monitor
  • Run helm upgrade deis-monitor . --namespace deis
  • Validate that all pods come up correctly and data is shown in grafana

Test older tiller vs New tiller

  • Make sure you already have a version of monitor installed since we are only testing the upgrade path here

Install a version older than 2.3.0 of tiller

  • Clone repo
  • git checkout a tag (2.2.3 for example)
  • make build
  • ./bin/tiller
  • Run helm upgrade deis-monitor . --namespace deis --host localhost:44134 --set influxdb.persistence.enabled=true
  • Validate that no upgrade hooks were ran by doing kubectl get jobs
  • The upgrade should proceed and install fine (but it will not migrate the data). Since we have no way to stop the execution of the chart at this point this is the best we can do. @mboersma is this ok??

@jchauncey jchauncey self-assigned this Mar 24, 2017
@deis-bot
Copy link

@rimusz is a potential reviewer of this pull request based on my analysis of git blame information. Thanks @jchauncey!

@vdice
Copy link
Member

vdice commented Mar 30, 2017

With respect to Test InfluxDB deployment resource is deleted after upgrade/install, all appears well (expectations met) with the following caveat: the helm upgrade command technically errors out and exits non-zero:

 $ helm upgrade deis-workflow workflow-pr/workflow --version v2.12.1-20170330220532-sha.77e675e --set global.influxdb_location=off-cluster
Release "deis-workflow" has been upgraded. Happy Helming!
Error: deployments.extensions "deis-workflow-influxdb" not found

 $ echo $?
1

Possible to avoid erroring out in this scenario?

@vdice
Copy link
Member

vdice commented Mar 30, 2017

W/r/t Test InfluxDB persistence Migration:

I tested at the level of a full Workflow install and although I am meeting the expectations listed in the description (new volume locations for storing influxdb data), after Workflow upgrade my grafana instance cannot locate the migrated data (dashboards empty, grafana pod logs show 2017/03/30 22:38:07 http: proxy error: dial tcp 10.131.246.29:80: i/o timeout)

Here were my steps: https://gist.github.com/vdice/d0325647ad4136feb76fa8e9e0a0725e

@vdice
Copy link
Member

vdice commented Mar 30, 2017

W/r/t Test existing installation with no persistence and upgrade to influxdb with persistence:

When attempting to upgrade, the expectations are not met; namely, the migrate-data job/pod does not finish (see below) and the upgrade fails:

 $ helm upgrade --wait deis-workflow workflow-pr/workflow --version v2.12.1-20170330220532-sha.77e675e -f values-only-influxdb-persistent.yaml
Error: UPGRADE FAILED: timed out waiting for the condition

 $ kd describe po migrate-data-5zzmh-msld4
...
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  2m		1s		11	{default-scheduler }			Warning		FailedScheduling	[SchedulerPredicates failed due to persistentvolumeclaims "deis-monitor-influxdb" not found, which is unexpected., SchedulerPredicates failed due to persistentvolumeclaims "deis-monitor-influxdb" not found, which is unexpected., SchedulerPredicates failed due to persistentvolumeclaims "deis-monitor-influxdb" not found, which is unexpected.]

@jchauncey jchauncey force-pushed the fix-180 branch 5 times, most recently from 7f55ec5 to c8eb29a Compare April 11, 2017 20:49
@jchauncey jchauncey added this to the v2.14 milestone Apr 13, 2017
@jchauncey
Copy link
Member Author

This PR is waiting on an upstream change before it can be tested.

closes deis#180
* This moves to using the influxdb chart in the stable repository
* Sets the version to 0.4.0
* Adds two pre-upgrade hooks to the base monitor chart. These hooks
allow the chart to migrate data from the old volume mount location to
the new one provided by the influxdb chart.
* If `influxdb_location` is set to `off-cluster` it will delete the deployment resource using a post-install or post-upgrade hook.
@jchauncey
Copy link
Member Author

With the k8s bug around binding persistent volumes on pods that have moved nodes I think I will just close this PR for now.

@jchauncey jchauncey closed this Apr 27, 2017
@jchauncey jchauncey removed this from the v2.14 milestone Apr 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use stable/influxdb chart
3 participants