Table of Contents
Kubernetes Storage. Rook. The Boss Fight. Still a bit messy. But it works. Most of the time.
There must be a reason Red Hat OpenShift Data Foundation is expensive ...
Now seriously: Storage is one of the most critical bits in general. Many workloads are stateful, and not every Kubernetes infrastructure solves the problem nicely. That was where I found myself a few times in the past. We we given virtual machines with basic disks attached - VMware VMDKs in my case. Customers were in demand of ... you name it - everything: RWX-/RWO Volumes, S3, Snapshots, Backup/Recovery - superfast and always available. The code reflects these roots.
Disclaimer: We started by borrowing proven things from the Rook project - adapted them as we went along.
Demo creating a Minikube cluster and running a few tests 🪄🎩🐰
make apply-r00ki-aio test-csi-io test-csi-snapshot test-velero
- Awesome local first Rook Ceph Dev Experience
- First Class Observability
- Fail early and loud (Notifications)
- Simplicity (yes, really)
- Composability
- Target
minikube
, vanilla Kubernetes and Openshift. - Add the Rook Ops bits not covered by the Operator
- Declarative trumps Imperative
- ArgoCD is great, but
helmfile
appears even better for our use case - We aim for first class citizens. For Rook, it's the helm charts, for some operators, its OLM Subscriptions.
We cover:
- Single (All in Once Cluster) Deployments targetting
minikube
and Production Kubernetes (including Openshift) - Two Cluster Deployments (Service and Consumer) targetting
minikube
and Production Kubernetes (including Openshift) - Kube-Prometheus bits all wired up - including alerts
- Shiny Dashboards (including Grafana)
- Seamless integration with ArgoCD, specifically
deas/argcocd-conductor
Some opinions first:
- Ceph is complex
- Automating Trust Relationships is hard
make
minikube
kubectl
helmfile
Run
make
shows help for basic tasks and give you an idea where to start.
We want lifecycle of things (Create/Destroy) to be as fast as possible. We ship support to levarage registry mirrors using pull through.
- Use
dyff
to separate out value files? - Separate out Observability, add Logging and Alerting
- Support for Mon v2
- Support for TLS/encryption
- Replace imperative bits by declarative ones
- Introduce Pentesting - maybe even Chaos Scenarios
- Improve Observability / Include Alerts
- Smoketests in CI
- Cleanup bits aroud
TODO
tags sprinkled across the code - Use LVM instead of raw disks/partitions?
- Performance: How/When do multiple disks per node make sense?
- Exercise Upgrade/Recreate and Desaster Recovery + build tests
- Introduce unhappy path tests -likely leveraging Litmus
- Proper cascaded removal of
CephCluster
? - Finding-/cleaning up orphans (volumes or buckets)
- Go deeper with
nix
/devenv
- maybe even replacemise
- "To sum up: the Docker daemon does not currently support multiple registry mirrors ..." ->
minikube start --registry-mirror="http://yourmirror"
- kvm network dns(masq) slow from minikube kubernetes. Times out for s3. Patching coredns gets around the issue.
- mons on port 3300 (workaround: use port 6789 /
ROOK_EXTERNAL_CEPH_MON_DATA
):2024-12-16T16:56:02.784+0000 7fd593d1c000 -1 failed for service _ceph-mon._tcp mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized Warning FailedMount 2m25s kubelet (combined from similar events): MountVolume.MountDevice failed for volume "pvc-026c86e8-9ee4-4261-a7e4-083011b80494" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 192.168.122.231:3300:/volumes/csi/csi-vol-7072e90c-5d6b-477b-bbab-655b76d0425f/e8d828a3-a1ad-4a22-9b36-7d5bc9fe9026 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/f172f41f387d01c38f46e71a4097304d70c35494e81e1c8a070549de56234790/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-2436134297,mds_namespace=myfs,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
- Looking up Monitors through DNS
- OperatorHub Sub Outdated - at 1.1.1
- Monitor OpenShift Virtualization using user-defined projects and Grafana
- How to create a long lived service account token in RHOCP4
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.