Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

搭建 Rook Ceph #350

Open
Bpazy opened this issue Jan 3, 2025 · 3 comments
Open

搭建 Rook Ceph #350

Bpazy opened this issue Jan 3, 2025 · 3 comments

Comments

@Bpazy
Copy link
Owner

Bpazy commented Jan 3, 2025

k8s 集群有个问题,有状态应用的存储怎么处理?

所以这里考虑搭建 Ceph 集群,来解决上述问题。其 CephFS 实现了标准 POSIX。而 Rook-Ceph 又是利用 k8s 简化了 Ceph 的搭建,所以这里直接用 Rook-Ceph。

@Bpazy
Copy link
Owner Author

Bpazy commented Jan 8, 2025

1. 先安装 rook ceph operator

$ helm repo add rook-release https://charts.rook.io/release
$ helm upgrade --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph -f charts/rook-values.yaml

charts/rook-values.yaml 文件内容如下(都是默认值):

点击展开
# Default values for rook-ceph-operator
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

image:
  # -- Image
  repository: docker.io/rook/ceph
  # -- Image tag
  # @default -- `master`
  tag: v1.16.1
  # -- Image pull policy
  pullPolicy: IfNotPresent

crds:
  # -- Whether the helm chart should create and update the CRDs. If false, the CRDs must be
  # managed independently with deploy/examples/crds.yaml.
  # **WARNING** Only set during first deployment. If later disabled the cluster may be DESTROYED.
  # If the CRDs are deleted in this case, see
  # [the disaster recovery guide](https://rook.io/docs/rook/latest/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion)
  # to restore them.
  enabled: true

# -- Pod resource requests & limits
resources:
  limits:
    memory: 512Mi
  requests:
    cpu: 200m
    memory: 128Mi

# -- Kubernetes [`nodeSelector`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector) to add to the Deployment.
nodeSelector: 
# Constraint rook-ceph-operator Deployment to nodes with label `disktype: ssd`.
# For more info, see https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
#  disktype: ssd

# -- List of Kubernetes [`tolerations`](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to add to the Deployment.
tolerations:

# -- Delay to use for the `node.kubernetes.io/unreachable` pod failure toleration to override
# the Kubernetes default of 5 minutes
unreachableNodeTolerationSeconds: 5

# -- Whether the operator should watch cluster CRD in its own namespace or not
currentNamespaceOnly: false

# -- Pod annotations
annotations: {}

# -- Global log level for the operator.
# Options: `ERROR`, `WARNING`, `INFO`, `DEBUG`
logLevel: INFO

# -- If true, create & use RBAC resources
rbacEnable: true

rbacAggregate:
  # -- If true, create a ClusterRole aggregated to [user facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles) for objectbucketclaims
  enableOBCs: false

# -- If true, create & use PSP resources
pspEnable: false

# -- Set the priority class for the rook operator deployment if desired
priorityClassName:

# -- Set the container security context for the operator
containerSecurityContext:
  runAsNonRoot: true
  runAsUser: 2016
  runAsGroup: 2016
  capabilities:
    drop: ["ALL"]
# -- If true, loop devices are allowed to be used for osds in test clusters
allowLoopDevices: false

# Settings for whether to disable the drivers or other daemons if they are not
# needed
csi:
  # -- Enable Ceph CSI RBD driver
  enableRbdDriver: true
  # -- Enable Ceph CSI CephFS driver
  enableCephfsDriver: true
  # -- Disable the CSI driver.
  disableCsiDriver: "false"

  # -- Enable host networking for CSI CephFS and RBD nodeplugins. This may be necessary
  # in some network configurations where the SDN does not provide access to an external cluster or
  # there is significant drop in read/write performance
  enableCSIHostNetwork: true
  # -- Enable Snapshotter in CephFS provisioner pod
  enableCephfsSnapshotter: true
  # -- Enable Snapshotter in NFS provisioner pod
  enableNFSSnapshotter: true
  # -- Enable Snapshotter in RBD provisioner pod
  enableRBDSnapshotter: true
  # -- Enable Host mount for `/etc/selinux` directory for Ceph CSI nodeplugins
  enablePluginSelinuxHostMount: false
  # -- Enable Ceph CSI PVC encryption support
  enableCSIEncryption: false

  # -- Enable volume group snapshot feature. This feature is
  # enabled by default as long as the necessary CRDs are available in the cluster.
  enableVolumeGroupSnapshot: true
  # -- PriorityClassName to be set on csi driver plugin pods
  pluginPriorityClassName: system-node-critical

  # -- PriorityClassName to be set on csi driver provisioner pods
  provisionerPriorityClassName: system-cluster-critical

  # -- Policy for modifying a volume's ownership or permissions when the RBD PVC is being mounted.
  # supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
  rbdFSGroupPolicy: "File"

  # -- Policy for modifying a volume's ownership or permissions when the CephFS PVC is being mounted.
  # supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
  cephFSFSGroupPolicy: "File"

  # -- Policy for modifying a volume's ownership or permissions when the NFS PVC is being mounted.
  # supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
  nfsFSGroupPolicy: "File"

  # -- OMAP generator generates the omap mapping between the PV name and the RBD image
  # which helps CSI to identify the rbd images for CSI operations.
  # `CSI_ENABLE_OMAP_GENERATOR` needs to be enabled when we are using rbd mirroring feature.
  # By default OMAP generator is disabled and when enabled, it will be deployed as a
  # sidecar with CSI provisioner pod, to enable set it to true.
  enableOMAPGenerator: false

  # -- Set CephFS Kernel mount options to use https://docs.ceph.com/en/latest/man/8/mount.ceph/#options.
  # Set to "ms_mode=secure" when connections.encrypted is enabled in CephCluster CR
  cephFSKernelMountOptions:

  # -- Enable adding volume metadata on the CephFS subvolumes and RBD images.
  # Not all users might be interested in getting volume/snapshot details as metadata on CephFS subvolume and RBD images.
  # Hence enable metadata is false by default
  enableMetadata: false

  # -- Set replicas for csi provisioner deployment
  provisionerReplicas: 2

  # -- Cluster name identifier to set as metadata on the CephFS subvolume and RBD images. This will be useful
  # in cases like for example, when two container orchestrator clusters (Kubernetes/OCP) are using a single ceph cluster
  clusterName:

  # -- Set logging level for cephCSI containers maintained by the cephCSI.
  # Supported values from 0 to 5. 0 for general useful logs, 5 for trace level verbosity.
  logLevel: 0

  # -- Set logging level for Kubernetes-csi sidecar containers.
  # Supported values from 0 to 5. 0 for general useful logs (the default), 5 for trace level verbosity.
  # @default -- `0`
  sidecarLogLevel:

  # -- CSI driver name prefix for cephfs, rbd and nfs.
  # @default -- `namespace name where rook-ceph operator is deployed`
  csiDriverNamePrefix:

  # -- CSI RBD plugin daemonset update strategy, supported values are OnDelete and RollingUpdate
  # @default -- `RollingUpdate`
  rbdPluginUpdateStrategy:

  # -- A maxUnavailable parameter of CSI RBD plugin daemonset update strategy.
  # @default -- `1`
  rbdPluginUpdateStrategyMaxUnavailable:

  # -- CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate
  # @default -- `RollingUpdate`
  cephFSPluginUpdateStrategy:

  # -- A maxUnavailable parameter of CSI cephFS plugin daemonset update strategy.
  # @default -- `1`
  cephFSPluginUpdateStrategyMaxUnavailable:

  # -- CSI NFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate
  # @default -- `RollingUpdate`
  nfsPluginUpdateStrategy:

  # -- Set GRPC timeout for csi containers (in seconds). It should be >= 120. If this value is not set or is invalid, it defaults to 150
  grpcTimeoutInSeconds: 150

  # -- Burst to use while communicating with the kubernetes apiserver.
  kubeApiBurst:

  # -- QPS to use while communicating with the kubernetes apiserver.
  kubeApiQPS:

  # -- The volume of the CephCSI RBD plugin DaemonSet
  csiRBDPluginVolume:
  #  - name: lib-modules
  #    hostPath:
  #      path: /run/booted-system/kernel-modules/lib/modules/
  #  - name: host-nix
  #    hostPath:
  #      path: /nix

  # -- The volume mounts of the CephCSI RBD plugin DaemonSet
  csiRBDPluginVolumeMount:
  #  - name: host-nix
  #    mountPath: /nix
  #    readOnly: true

  # -- The volume of the CephCSI CephFS plugin DaemonSet
  csiCephFSPluginVolume:
  #  - name: lib-modules
  #    hostPath:
  #      path: /run/booted-system/kernel-modules/lib/modules/
  #  - name: host-nix
  #    hostPath:
  #      path: /nix

  # -- The volume mounts of the CephCSI CephFS plugin DaemonSet
  csiCephFSPluginVolumeMount:
  #  - name: host-nix
  #    mountPath: /nix
  #    readOnly: true

  # -- CEPH CSI RBD provisioner resource requirement list
  # csi-omap-generator resources will be applied only if `enableOMAPGenerator` is set to `true`
  # @default -- see values.yaml
  csiRBDProvisionerResource: |
    - name : csi-provisioner
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-resizer
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-attacher
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-snapshotter
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-rbdplugin
      resource:
        requests:
          memory: 512Mi
        limits:
          memory: 1Gi
    - name : csi-omap-generator
      resource:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
          memory: 1Gi
    - name : liveness-prometheus
      resource:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi

  # -- CEPH CSI RBD plugin resource requirement list
  # @default -- see values.yaml
  csiRBDPluginResource: |
    - name : driver-registrar
      resource:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi
    - name : csi-rbdplugin
      resource:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
          memory: 1Gi
    - name : liveness-prometheus
      resource:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi

  # -- CEPH CSI CephFS provisioner resource requirement list
  # @default -- see values.yaml
  csiCephFSProvisionerResource: |
    - name : csi-provisioner
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-resizer
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-attacher
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-snapshotter
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-cephfsplugin
      resource:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
          memory: 1Gi
    - name : liveness-prometheus
      resource:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi

  # -- CEPH CSI CephFS plugin resource requirement list
  # @default -- see values.yaml
  csiCephFSPluginResource: |
    - name : driver-registrar
      resource:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi
    - name : csi-cephfsplugin
      resource:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
          memory: 1Gi
    - name : liveness-prometheus
      resource:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi

  # -- CEPH CSI NFS provisioner resource requirement list
  # @default -- see values.yaml
  csiNFSProvisionerResource: |
    - name : csi-provisioner
      resource:
        requests:
          memory: 128Mi
          cpu: 100m
        limits:
          memory: 256Mi
    - name : csi-nfsplugin
      resource:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
          memory: 1Gi
    - name : csi-attacher
      resource:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
          memory: 1Gi

  # -- CEPH CSI NFS plugin resource requirement list
  # @default -- see values.yaml
  csiNFSPluginResource: |
    - name : driver-registrar
      resource:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi
    - name : csi-nfsplugin
      resource:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
          memory: 1Gi

  # Set provisionerTolerations and provisionerNodeAffinity for provisioner pod.
  # The CSI provisioner would be best to start on the same nodes as other ceph daemons.

  # -- Array of tolerations in YAML format which will be added to CSI provisioner deployment
  provisionerTolerations:
  #    - key: key
  #      operator: Exists
  #      effect: NoSchedule

  # -- The node labels for affinity of the CSI provisioner deployment [^1]
  provisionerNodeAffinity: #key1=value1,value2; key2=value3
    # requiredDuringSchedulingIgnoredDuringExecution:
    #   nodeSelectorTerms:
    #     - matchExpressions:
    #         - key: role
    #           operator: In
    #           values:
    #             - ceph
  # Set pluginTolerations and pluginNodeAffinity for plugin daemonset pods.
  # The CSI plugins need to be started on all the nodes where the clients need to mount the storage.

  # -- Array of tolerations in YAML format which will be added to CephCSI plugin DaemonSet
  pluginTolerations:
  #    - key: key
  #      operator: Exists
  #      effect: NoSchedule

  # -- The node labels for affinity of the CephCSI RBD plugin DaemonSet [^1]
  pluginNodeAffinity: # key1=value1,value2; key2=value3
    # requiredDuringSchedulingIgnoredDuringExecution:
    #   nodeSelectorTerms:
    #     - matchExpressions:
    #         - key: role
    #           operator: In
    #           values:
    #             - ceph
  # -- Enable Ceph CSI Liveness sidecar deployment
  enableLiveness: false

  # -- CSI CephFS driver metrics port
  # @default -- `9081`
  cephfsLivenessMetricsPort:

  # -- CSI Addons server port
  # @default -- `9070`
  csiAddonsPort:

  # -- Enable Ceph Kernel clients on kernel < 4.17. If your kernel does not support quotas for CephFS
  # you may want to disable this setting. However, this will cause an issue during upgrades
  # with the FUSE client. See the [upgrade guide](https://rook.io/docs/rook/v1.2/ceph-upgrade.html)
  forceCephFSKernelClient: true

  # -- Ceph CSI RBD driver metrics port
  # @default -- `8080`
  rbdLivenessMetricsPort:

  serviceMonitor:
    # -- Enable ServiceMonitor for Ceph CSI drivers
    enabled: false
    # -- Service monitor scrape interval
    interval: 10s
    # -- ServiceMonitor additional labels
    labels: {}
    # -- Use a different namespace for the ServiceMonitor
    namespace:

  # -- Kubelet root directory path (if the Kubelet uses a different path for the `--root-dir` flag)
  # @default -- `/var/lib/kubelet`
  kubeletDirPath:

  # -- Duration in seconds that non-leader candidates will wait to force acquire leadership.
  # @default -- `137s`
  csiLeaderElectionLeaseDuration:

  # -- Deadline in seconds that the acting leader will retry refreshing leadership before giving up.
  # @default -- `107s`
  csiLeaderElectionRenewDeadline:

  # -- Retry period in seconds the LeaderElector clients should wait between tries of actions.
  # @default -- `26s`
  csiLeaderElectionRetryPeriod:

  cephcsi:
    # -- Ceph CSI image repository
    repository: quay.io/cephcsi/cephcsi
    # -- Ceph CSI image tag
    tag: v3.13.0

  registrar:
    # -- Kubernetes CSI registrar image repository
    repository: registry.k8s.io/sig-storage/csi-node-driver-registrar
    # -- Registrar image tag
    tag: v2.11.1

  provisioner:
    # -- Kubernetes CSI provisioner image repository
    repository: registry.k8s.io/sig-storage/csi-provisioner
    # -- Provisioner image tag
    tag: v5.0.1

  snapshotter:
    # -- Kubernetes CSI snapshotter image repository
    repository: registry.k8s.io/sig-storage/csi-snapshotter
    # -- Snapshotter image tag
    tag: v8.2.0

  attacher:
    # -- Kubernetes CSI Attacher image repository
    repository: registry.k8s.io/sig-storage/csi-attacher
    # -- Attacher image tag
    tag: v4.6.1

  resizer:
    # -- Kubernetes CSI resizer image repository
    repository: registry.k8s.io/sig-storage/csi-resizer
    # -- Resizer image tag
    tag: v1.11.1

  # -- Image pull policy
  imagePullPolicy: IfNotPresent

  # -- Labels to add to the CSI CephFS Deployments and DaemonSets Pods
  cephfsPodLabels: #"key1=value1,key2=value2"

  # -- Labels to add to the CSI NFS Deployments and DaemonSets Pods
  nfsPodLabels: #"key1=value1,key2=value2"

  # -- Labels to add to the CSI RBD Deployments and DaemonSets Pods
  rbdPodLabels: #"key1=value1,key2=value2"

  csiAddons:
    # -- Enable CSIAddons
    enabled: false
    # -- CSIAddons sidecar image repository
    repository: quay.io/csiaddons/k8s-sidecar
    # -- CSIAddons sidecar image tag
    tag: v0.11.0

  nfs:
    # -- Enable the nfs csi driver
    enabled: false

  topology:
    # -- Enable topology based provisioning
    enabled: false
    # NOTE: the value here serves as an example and needs to be
    # updated with node labels that define domains of interest
    # -- domainLabels define which node labels to use as domains
    # for CSI nodeplugins to advertise their domains
    domainLabels:
    # - kubernetes.io/hostname
    # - topology.kubernetes.io/zone
    # - topology.rook.io/rack

  # -- Whether to skip any attach operation altogether for CephFS PVCs. See more details
  # [here](https://kubernetes-csi.github.io/docs/skip-attach.html#skip-attach-with-csi-driver-object).
  # If cephFSAttachRequired is set to false it skips the volume attachments and makes the creation
  # of pods using the CephFS PVC fast. **WARNING** It's highly discouraged to use this for
  # CephFS RWO volumes. Refer to this [issue](https://github.com/kubernetes/kubernetes/issues/103305) for more details.
  cephFSAttachRequired: true
  # -- Whether to skip any attach operation altogether for RBD PVCs. See more details
  # [here](https://kubernetes-csi.github.io/docs/skip-attach.html#skip-attach-with-csi-driver-object).
  # If set to false it skips the volume attachments and makes the creation of pods using the RBD PVC fast.
  # **WARNING** It's highly discouraged to use this for RWO volumes as it can cause data corruption.
  # csi-addons operations like Reclaimspace and PVC Keyrotation will also not be supported if set
  # to false since we'll have no VolumeAttachments to determine which node the PVC is mounted on.
  # Refer to this [issue](https://github.com/kubernetes/kubernetes/issues/103305) for more details.
  rbdAttachRequired: true
  # -- Whether to skip any attach operation altogether for NFS PVCs. See more details
  # [here](https://kubernetes-csi.github.io/docs/skip-attach.html#skip-attach-with-csi-driver-object).
  # If cephFSAttachRequired is set to false it skips the volume attachments and makes the creation
  # of pods using the NFS PVC fast. **WARNING** It's highly discouraged to use this for
  # NFS RWO volumes. Refer to this [issue](https://github.com/kubernetes/kubernetes/issues/103305) for more details.
  nfsAttachRequired: true

# -- Enable discovery daemon
enableDiscoveryDaemon: false
# -- Set the discovery daemon device discovery interval (default to 60m)
discoveryDaemonInterval: 60m

# -- The timeout for ceph commands in seconds
cephCommandsTimeoutSeconds: "15"

# -- If true, run rook operator on the host network
useOperatorHostNetwork:

# -- If true, scale down the rook operator.
# This is useful for administrative actions where the rook operator must be scaled down, while using gitops style tooling
# to deploy your helm charts.
scaleDownOperator: false

## Rook Discover configuration
## toleration: NoSchedule, PreferNoSchedule or NoExecute
## tolerationKey: Set this to the specific key of the taint to tolerate
## tolerations: Array of tolerations in YAML format which will be added to agent deployment
## nodeAffinity: Set to labels of the node to match

discover:
  # -- Toleration for the discover pods.
  # Options: `NoSchedule`, `PreferNoSchedule` or `NoExecute`
  toleration:
  # -- The specific key of the taint to tolerate
  tolerationKey:
  # -- Array of tolerations in YAML format which will be added to discover deployment
  tolerations:
  #   - key: key
  #     operator: Exists
  #     effect: NoSchedule
  # -- The node labels for affinity of `discover-agent` [^1]
  nodeAffinity:
  #   key1=value1,value2; key2=value3
  #
  #   or
  #
  #   requiredDuringSchedulingIgnoredDuringExecution:
  #     nodeSelectorTerms:
  #       - matchExpressions:
  #           - key: storage-node
  #             operator: Exists
  # -- Labels to add to the discover pods
  podLabels: # "key1=value1,key2=value2"
  # -- Add resources to discover daemon pods
  resources:
  #   - limits:
  #       memory: 512Mi
  #   - requests:
  #       cpu: 100m
  #       memory: 128Mi

# -- Runs Ceph Pods as privileged to be able to write to `hostPaths` in OpenShift with SELinux restrictions.
hostpathRequiresPrivileged: false

# -- Whether to create all Rook pods to run on the host network, for example in environments where a CNI is not enabled
enforceHostNetwork: false

# -- Disable automatic orchestration when new devices are discovered.
disableDeviceHotplug: false

# -- The revision history limit for all pods created by Rook. If blank, the K8s default is 10.
revisionHistoryLimit:

# -- Blacklist certain disks according to the regex provided.
discoverDaemonUdev:

# -- imagePullSecrets option allow to pull docker images from private docker registry. Option will be passed to all service accounts.
imagePullSecrets:
# - name: my-registry-secret

# -- Whether the OBC provisioner should watch on the operator namespace or not, if not the namespace of the cluster will be used
enableOBCWatchOperatorNamespace: true

# -- Specify the prefix for the OBC provisioner in place of the cluster namespace
# @default -- `ceph cluster namespace`
obcProvisionerNamePrefix:

monitoring:
  # -- Enable monitoring. Requires Prometheus to be pre-installed.
  # Enabling will also create RBAC rules to allow Operator to create ServiceMonitors
  enabled: false

验证安装结果:

ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl get pod -n rook-ceph
NAME                                                     READY   STATUS      RESTARTS        AGE
rook-ceph-operator-68d9f8b984-w7tmj                      1/1     Running     0               18m

2. 配置 rook ceph 集群

有了 operator,那我们就可以利用 k8s CRD 完成集群的配置了。

$ kubectl apply -f cluster.yaml

cluster.yaml 文件内容如下,我这里只调整了 devices 配置,手动指定节点和节点的磁盘:

点击展开
#################################################################################################################
# Define the settings for the rook-ceph cluster with common settings for a production cluster.
# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
# in this example. See the documentation for more details on storage settings available.

# For example, to create the cluster:
#   kubectl create -f crds.yaml -f common.yaml -f operator.yaml
#   kubectl create -f cluster.yaml
#################################################################################################################

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph # namespace:cluster
spec:
  cephVersion:
    # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
    # v18 is Reef, v19 is Squid
    # RECOMMENDATION: In production, use a specific version tag instead of the general v19 flag, which pulls the latest release and could result in different
    # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
    # If you want to be more precise, you can always use a timestamp tag such as quay.io/ceph/ceph:v19.2.0-20240927
    # This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
    image: quay.io/ceph/ceph:v19.2.0
    # Whether to allow unsupported versions of Ceph. Currently Reef and Squid are supported.
    # Future versions such as Tentacle (v20) would require this to be set to `true`.
    # Do not set to true in production.
    allowUnsupported: false
  # The path on the host where configuration files will be persisted. Must be specified. If there are multiple clusters, the directory must be unique for each cluster.
  # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
  # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
  dataDirHostPath: /var/lib/rook
  # Whether or not upgrade should continue even if a check fails
  # This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
  # Use at your OWN risk
  # To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/latest/ceph-upgrade.html#ceph-version-upgrades
  skipUpgradeChecks: false
  # Whether or not continue if PGs are not clean during an upgrade
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  # WaitTimeoutForHealthyOSDInMinutes defines the time (in minutes) the operator would wait before an OSD can be stopped for upgrade or restart.
  # If the timeout exceeds and OSD is not ok to stop, then the operator would skip upgrade for the current OSD and proceed with the next one
  # if `continueUpgradeAfterChecksEvenIfNotHealthy` is `false`. If `continueUpgradeAfterChecksEvenIfNotHealthy` is `true`, then operator would
  # continue with the upgrade of an OSD even if its not ok to stop after the timeout. This timeout won't be applied if `skipUpgradeChecks` is `true`.
  # The default wait timeout is 10 minutes.
  waitTimeoutForHealthyOSDInMinutes: 10
  # Whether or not requires PGs are clean before an OSD upgrade. If set to `true` OSD upgrade process won't start until PGs are healthy.
  # This configuration will be ignored if `skipUpgradeChecks` is `true`.
  # Default is false.
  upgradeOSDRequiresHealthyPGs: false
  mon:
    # Set the number of mons to be started. Generally recommended to be 3.
    # For highest availability, an odd number of mons should be specified.
    count: 3
    # The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.
    # Mons should only be allowed on the same node for test environments where data loss is acceptable.
    allowMultiplePerNode: false
  mgr:
    # When higher availability of the mgr is needed, increase the count to 2.
    # In that case, one mgr will be active and one in standby. When Ceph updates which
    # mgr is active, Rook will update the mgr services to match the active mgr.
    count: 2
    allowMultiplePerNode: false
    modules:
      # List of modules to optionally enable or disable.
      # Note the "dashboard" and "monitoring" modules are already configured by other settings in the cluster CR.
      - name: rook
        enabled: true
  # enable the ceph dashboard for viewing cluster status
  dashboard:
    enabled: true
    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    # urlPrefix: /ceph-dashboard
    # serve the dashboard at the given port.
    # port: 8443
    # serve the dashboard using SSL
    ssl: false
    # The url of the Prometheus instance
    # prometheusEndpoint: <protocol>://<prometheus-host>:<port>
    # Whether SSL should be verified if the Prometheus server is using https
    # prometheusEndpointSSLVerify: false
  # enable prometheus alerting for cluster
  monitoring:
    # requires Prometheus to be pre-installed
    enabled: false
    # Whether to disable the metrics reported by Ceph. If false, the prometheus mgr module and Ceph exporter are enabled.
    # If true, the prometheus mgr module and Ceph exporter are both disabled. Default is false.
    metricsDisabled: false
    # Ceph exporter metrics config.
    exporter:
      # Specifies which performance counters are exported.
      # Corresponds to --prio-limit Ceph exporter flag
      # 0 - all counters are exported
      perfCountersPrioLimit: 5
      # Time to wait before sending requests again to exporter server (seconds)
      # Corresponds to --stats-period Ceph exporter flag
      statsPeriodSeconds: 5
  network:
    connections:
      # Whether to encrypt the data in transit across the wire to prevent eavesdropping the data on the network.
      # The default is false. When encryption is enabled, all communication between clients and Ceph daemons, or between Ceph daemons will be encrypted.
      # When encryption is not enabled, clients still establish a strong initial authentication and data integrity is still validated with a crc check.
      # IMPORTANT: Encryption requires the 5.11 kernel for the latest nbd and cephfs drivers. Alternatively for testing only,
      # you can set the "mounter: rbd-nbd" in the rbd storage class, or "mounter: fuse" in the cephfs storage class.
      # The nbd and fuse drivers are *not* recommended in production since restarting the csi driver pod will disconnect the volumes.
      encryption:
        enabled: false
      # Whether to compress the data in transit across the wire. The default is false.
      # See the kernel requirements above for encryption.
      compression:
        enabled: false
      # Whether to require communication over msgr2. If true, the msgr v1 port (6789) will be disabled
      # and clients will be required to connect to the Ceph cluster with the v2 port (3300).
      # Requires a kernel that supports msgr v2 (kernel 5.11 or CentOS 8.4 or newer).
      requireMsgr2: false
    # enable host networking
    #provider: host
    # enable the Multus network provider
    #provider: multus
    #selectors:
    #  The selector keys are required to be `public` and `cluster`.
    #  Based on the configuration, the operator will do the following:
    #    1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface
    #    2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'
    #
    #  In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus
    #
    #  public: public-conf --> NetworkAttachmentDefinition object name in Multus
    #  cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus
    # Provide internet protocol version. IPv6, IPv4 or empty string are valid options. Empty string would mean IPv4
    #ipFamily: "IPv6"
    # Ceph daemons to listen on both IPv4 and Ipv6 networks
    #dualStack: false
    # Enable multiClusterService to export the mon and OSD services to peer cluster.
    # This is useful to support RBD mirroring between two clusters having overlapping CIDRs.
    # Ensure that peer clusters are connected using an MCS API compatible application, like Globalnet Submariner.
    #multiClusterService:
    #  enabled: false

  # enable the crash collector for ceph daemon crash collection
  crashCollector:
    disable: false
    # Uncomment daysToRetain to prune ceph crash entries older than the
    # specified number of days.
    #daysToRetain: 30
  # enable log collector, daemons will log on files and rotate
  logCollector:
    enabled: true
    periodicity: daily # one of: hourly, daily, weekly, monthly
    maxLogSize: 500M # SUFFIX may be 'M' or 'G'. Must be at least 1M.
  # automate [data cleanup process](https://github.com/rook/rook/blob/master/Documentation/Storage-Configuration/ceph-teardown.md#delete-the-data-on-hosts) in cluster destruction.
  cleanupPolicy:
    # Since cluster cleanup is destructive to data, confirmation is required.
    # To destroy all Rook data on hosts during uninstall, confirmation must be set to "yes-really-destroy-data".
    # This value should only be set when the cluster is about to be deleted. After the confirmation is set,
    # Rook will immediately stop configuring the cluster and only wait for the delete command.
    # If the empty string is set, Rook will not destroy any data on hosts during uninstall.
    confirmation: ""
    # sanitizeDisks represents settings for sanitizing OSD disks on cluster deletion
    sanitizeDisks:
      # method indicates if the entire disk should be sanitized or simply ceph's metadata
      # in both case, re-install is possible
      # possible choices are 'complete' or 'quick' (default)
      method: quick
      # dataSource indicate where to get random bytes from to write on the disk
      # possible choices are 'zero' (default) or 'random'
      # using random sources will consume entropy from the system and will take much more time then the zero source
      dataSource: zero
      # iteration overwrite N times instead of the default (1)
      # takes an integer value
      iteration: 1
    # allowUninstallWithVolumes defines how the uninstall should be performed
    # If set to true, cephCluster deletion does not wait for the PVs to be deleted.
    allowUninstallWithVolumes: false
  # To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
  # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
  # tolerate taints with a key of 'storage-node'.
  placement:
    # all:
    #   nodeAffinity:
    #     requiredDuringSchedulingIgnoredDuringExecution:
    #       nodeSelectorTerms:
    #         - matchExpressions:
    #             - key: role
    #               operator: In
    #               values:
    #                 - ceph
      podAffinity:
      podAntiAffinity:
      topologySpreadConstraints:
      tolerations:
  # The above placement information can also be specified for mon, osd, and mgr components
  #   mon:
  # Monitor deployments may contain an anti-affinity rule for avoiding monitor
  # collocation on the same node. This is a required rule when host network is used
  # or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
  # preferred rule with weight: 50.
  #   osd:
  #    prepareosd:
  #    mgr:
  #    cleanup:
  annotations:
  #   all:
  #   mon:
  #   mgr:
  #   osd:
  #   exporter:
  #   crashcollector:
  #   cleanup:
  #   prepareosd:
  # cmdreporter is for jobs to detect ceph and csi versions, and check network status
  #   cmdreporter:
  # clusterMetadata annotations will be applied to only `rook-ceph-mon-endpoints` configmap and the `rook-ceph-mon` and `rook-ceph-admin-keyring` secrets.
  # And clusterMetadata annotations will not be merged with `all` annotations.
  #    clusterMetadata:
  #       kubed.appscode.com/sync: "true"
  # If no mgr annotations are set, prometheus scrape annotations will be set by default.
  #   mgr:
  labels:
  #   all:
  #   mon:
  #   osd:
  #   cleanup:
  #   mgr:
  #   prepareosd:
  # These labels are applied to ceph-exporter servicemonitor only
  #   exporter:
  # monitoring is a list of key-value pairs. It is injected into all the monitoring resources created by operator.
  # These labels can be passed as LabelSelector to Prometheus
  #   monitoring:
  #   crashcollector:
  resources:
  #The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
  #   mgr:
  #     limits:
  #       memory: "1024Mi"
  #     requests:
  #       cpu: "500m"
  #       memory: "1024Mi"
  # The above example requests/limits can also be added to the other components
  #   mon:
  #   osd:
  # For OSD it also is a possible to specify requests/limits based on device class
  #   osd-hdd:
  #   osd-ssd:
  #   osd-nvme:
  #   prepareosd:
  #   mgr-sidecar:
  #   crashcollector:
  #   logcollector:
  #   cleanup:
  #   exporter:
  #   cmd-reporter:
  # The option to automatically remove OSDs that are out and are safe to destroy.
  removeOSDsIfOutAndSafeToRemove: false
  priorityClassNames:
    #all: rook-ceph-default-priority-class
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical
    #crashcollector: rook-ceph-crashcollector-priority-class
  storage: # cluster level storage configuration and selection
    useAllNodes: false
    useAllDevices: false
    #deviceFilter:
    config:
      # crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
      # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
      # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
      # osdsPerDevice: "1" # this value can be overridden at the node or device level
      # encryptedDevice: "true" # the default value for this option is "false"
      # deviceClass: "myclass" # specify a device class for OSDs in the cluster
    allowDeviceClassUpdate: false # whether to allow changing the device class of an OSD after it is created
    allowOsdCrushWeightUpdate: false # whether to allow resizing the OSD crush weight after osd pvc is increased
    # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
    # nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
    nodes:
      - name: pve-ubuntu
        devices:
          - name: vda
      - name: pve-gmk-ubuntu
        devices:
          - name: vda
      - name: nhan-ubuntu
        devices:
          - name: vdb
    # nodes:
    #   - name: "172.17.4.201"
    #     devices: # specific devices to use for storage can be specified for each node
    #       - name: "sdb"
    #       - name: "nvme01" # multiple osds can be created on high performance devices
    #         config:
    #           osdsPerDevice: "5"
    #       - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
    #     config: # configuration can be specified at the node level which overrides the cluster level config
    #   - name: "172.17.4.301"
    #     deviceFilter: "^sd."
    # Whether to always schedule OSD pods on nodes declared explicitly in the "nodes" section, even if they are
    # temporarily not schedulable. If set to true, consider adding placement tolerations for unschedulable nodes.
    scheduleAlways: false
    # when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osd
    onlyApplyOSDPlacement: false
    # Time for which an OSD pod will sleep before restarting, if it stopped due to flapping
    # flappingRestartIntervalHours: 24
    # The ratio at which Ceph should block IO if the OSDs are too full. The default is 0.95.
    # fullRatio: 0.95
    # The ratio at which Ceph should stop backfilling data if the OSDs are too full. The default is 0.90.
    # backfillFullRatio: 0.90
    # The ratio at which Ceph should raise a health warning if the OSDs are almost full. The default is 0.85.
    # nearFullRatio: 0.85
  # The section for configuring management of daemon disruptions during upgrade or fencing.
  disruptionManagement:
    # If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
    # via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
    # block eviction of OSDs by default and unblock them safely when drains are detected.
    managePodBudgets: true
    # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
    # default DOWN/OUT interval) when it is draining. This is only relevant when  `managePodBudgets` is `true`. The default value is `30` minutes.
    osdMaintenanceTimeout: 30
    # A duration in minutes that the operator will wait for the placement groups to become healthy (active+clean) after a drain was completed and OSDs came back up.
    # Operator will continue with the next drain if the timeout exceeds. It only works if `managePodBudgets` is `true`.
    # No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
    pgHealthCheckTimeout: 0

  # csi defines CSI Driver settings applied per cluster.
  csi:
    readAffinity:
      # Enable read affinity to enable clients to optimize reads from an OSD in the same topology.
      # Enabling the read affinity may cause the OSDs to consume some extra memory.
      # For more details see this doc:
      # https://rook.io/docs/rook/latest/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#enable-read-affinity-for-rbd-volumes
      enabled: false

    # cephfs driver specific settings.
    cephfs:
      # Set CephFS Kernel mount options to use https://docs.ceph.com/en/latest/man/8/mount.ceph/#options.
      # kernelMountOptions: ""
      # Set CephFS Fuse mount options to use https://docs.ceph.com/en/latest/man/8/ceph-fuse/#options.
      # fuseMountOptions: ""

  # healthChecks
  # Valid values for daemons are 'mon', 'osd', 'status'
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    # Change pod liveness probe timing or threshold values. Works for all mon,mgr,osd daemons.
    livenessProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false
    # Change pod startup probe timing or threshold values. Works for all mon,mgr,osd daemons.
    startupProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false

验证安装结果:

ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl get pod -n rook-ceph
NAME                                                     READY   STATUS      RESTARTS        AGE
csi-cephfsplugin-29kd9                                   3/3     Running     2 (3d16h ago)   3d18h
csi-cephfsplugin-k572j                                   3/3     Running     0               3d18h
csi-cephfsplugin-provisioner-74b95c5758-6k7w9            6/6     Running     0               3d18h
csi-cephfsplugin-provisioner-74b95c5758-ktxhw            6/6     Running     0               3d18h
csi-cephfsplugin-s77bw                                   3/3     Running     0               3d18h
csi-rbdplugin-9jshj                                      3/3     Running     0               3d18h
csi-rbdplugin-kl4sv                                      3/3     Running     1 (3d18h ago)   3d18h
csi-rbdplugin-provisioner-5fd9fbf6f8-225sl               6/6     Running     0               3d18h
csi-rbdplugin-provisioner-5fd9fbf6f8-w9r5s               6/6     Running     0               3d18h
csi-rbdplugin-skjc8                                      3/3     Running     0               3d18h
rook-ceph-crashcollector-nhan-ubuntu-ddbddd8d9-2frpp     1/1     Running     0               3d17h
rook-ceph-crashcollector-pve-gmk-ubuntu-89bf6dcf-cwcmg   1/1     Running     0               3d17h
rook-ceph-crashcollector-pve-ubuntu-68cff9fd47-dnbpf     1/1     Running     0               3d17h
rook-ceph-exporter-nhan-ubuntu-745dbcf58c-hqcq4          1/1     Running     0               3d17h
rook-ceph-exporter-pve-gmk-ubuntu-6f777d855c-fd54v       1/1     Running     0               3d17h
rook-ceph-exporter-pve-ubuntu-76f44c66fc-2hlw7           1/1     Running     0               3d17h
rook-ceph-mgr-a-c69f44568-bnvlh                          3/3     Running     0               3d17h
rook-ceph-mgr-b-d66bcd6d7-6xqqg                          3/3     Running     0               3d17h
rook-ceph-mon-a-54c677947c-4bhnf                         2/2     Running     1 (31h ago)     3d18h
rook-ceph-mon-b-fdf46d6f8-nb9jb                          2/2     Running     0               3d17h
rook-ceph-mon-c-6498cbd495-lfvp4                         2/2     Running     0               3d17h
rook-ceph-operator-68d9f8b984-w7tmj                      1/1     Running     0               3d18h
rook-ceph-osd-0-6dccbf576-2vq6q                          2/2     Running     0               3d17h
rook-ceph-osd-1-79cd54f5bd-ppkpr                         2/2     Running     0               3d17h
rook-ceph-osd-2-6d75f79768-rkmrb                         2/2     Running     0               3d17h
rook-ceph-osd-prepare-nhan-ubuntu-hgrqf                  0/1     Completed   0               3d17h
rook-ceph-osd-prepare-pve-gmk-ubuntu-v4gwg               0/1     Completed   0               3d17h
rook-ceph-osd-prepare-pve-ubuntu-x5vz4                   0/1     Completed   0               3d17h

最后安装 toolbox 和 dashboard:

$ kubectl apply -f dashoard-ingress.yaml
$ kubectl apply -f toolbox-ingress.yaml

dashboard-ingress.yaml 文件内容如下:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rook-ceph-mgr-dashboard
  namespace: rook-ceph
spec:
  ingressClassName: nginx
  rules:
  - host: rook-ceph-dashboard.ziyuan360.host
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rook-ceph-mgr-dashboard
            port:
              number: 7000

toolbox.yaml 文件内容如下:

点击展开
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-tools
  namespace: rook-ceph # namespace:cluster
  labels:
    app: rook-ceph-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rook-ceph-tools
  template:
    metadata:
      labels:
        app: rook-ceph-tools
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: rook-ceph-default
      containers:
        - name: rook-ceph-tools
          image: quay.io/ceph/ceph:v19
          command:
            - /bin/bash
            - -c
            - |
              # Replicate the script from toolbox.sh inline so the ceph image
              # can be run directly, instead of requiring the rook toolbox
              CEPH_CONFIG="/etc/ceph/ceph.conf"
              MON_CONFIG="/etc/rook/mon-endpoints"
              KEYRING_FILE="/etc/ceph/keyring"

              # create a ceph config file in its default location so ceph/rados tools can be used
              # without specifying any arguments
              write_endpoints() {
                endpoints=$(cat ${MON_CONFIG})

                # filter out the mon names
                # external cluster can have numbers or hyphens in mon names, handling them in regex
                # shellcheck disable=SC2001
                mon_endpoints=$(echo "${endpoints}"| sed 's/[a-z0-9_-]\+=//g')

                DATE=$(date)
                echo "$DATE writing mon endpoints to ${CEPH_CONFIG}: ${endpoints}"
                  cat <<EOF > ${CEPH_CONFIG}
              [global]
              mon_host = ${mon_endpoints}

              [client.admin]
              keyring = ${KEYRING_FILE}
              EOF
              }

              # watch the endpoints config file and update if the mon endpoints ever change
              watch_endpoints() {
                # get the timestamp for the target of the soft link
                real_path=$(realpath ${MON_CONFIG})
                initial_time=$(stat -c %Z "${real_path}")
                while true; do
                  real_path=$(realpath ${MON_CONFIG})
                  latest_time=$(stat -c %Z "${real_path}")

                  if [[ "${latest_time}" != "${initial_time}" ]]; then
                    write_endpoints
                    initial_time=${latest_time}
                  fi

                  sleep 10
                done
              }

              # read the secret from an env var (for backward compatibility), or from the secret file
              ceph_secret=${ROOK_CEPH_SECRET}
              if [[ "$ceph_secret" == "" ]]; then
                ceph_secret=$(cat /var/lib/rook-ceph-mon/secret.keyring)
              fi

              # create the keyring file
              cat <<EOF > ${KEYRING_FILE}
              [${ROOK_CEPH_USERNAME}]
              key = ${ceph_secret}
              EOF

              # write the initial config file
              write_endpoints

              # continuously update the mon endpoints if they fail over
              watch_endpoints
          imagePullPolicy: IfNotPresent
          tty: true
          securityContext:
            runAsNonRoot: true
            runAsUser: 2016
            runAsGroup: 2016
            capabilities:
              drop: ["ALL"]
          env:
            - name: ROOK_CEPH_USERNAME
              valueFrom:
                secretKeyRef:
                  name: rook-ceph-mon
                  key: ceph-username
          volumeMounts:
            - mountPath: /etc/ceph
              name: ceph-config
            - name: mon-endpoint-volume
              mountPath: /etc/rook
            - name: ceph-admin-secret
              mountPath: /var/lib/rook-ceph-mon
              readOnly: true
      volumes:
        - name: ceph-admin-secret
          secret:
            secretName: rook-ceph-mon
            optional: false
            items:
              - key: ceph-secret
                path: secret.keyring
        - name: mon-endpoint-volume
          configMap:
            name: rook-ceph-mon-endpoints
            items:
              - key: data
                path: mon-endpoints
        - name: ceph-config
          emptyDir: {}
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 5

利用 toolbox 验证集群是否正常:

ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
bash-5.1$ ceph status
  cluster:
    id:     87554414-e7da-449e-bc2e-549a11a92c1c
    health: HEALTH_WARN
            mon a is low on available space
 
  services:
    mon: 3 daemons, quorum b,a,c (age 4m)
    mgr: a(active, since 3d), standbys: b
    osd: 3 osds: 3 up (since 3d), 3 in (since 3d)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   81 MiB used, 60 GiB / 60 GiB avail
    pgs:     1 active+clean

我这里倒是因为 node 磁盘空间不足产生了 WARN,扩容即可。

@Bpazy
Copy link
Owner Author

Bpazy commented Jan 24, 2025

创建 CephFS 给 K8s Pod 使用

首先创建 fs:

apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: myfs
  namespace: rook-ceph
spec:
  metadataPool:
    replicated:
      size: 3
  dataPools:
    - name: replicated
      replicated:
        size: 3
  # 设置为true表示:删除文件系统时不要删除文件
  preserveFilesystemOnDelete: true
  metadataServer:
    activeCount: 1
    activeStandby: true

然后创建 storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs-monitoring
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  # clusterID is the namespace where the rook cluster is running
  # If you change this namespace, also change the namespace below where the secret namespaces are defined
  clusterID: rook-ceph

  # CephFS filesystem name into which the volume shall be created
  fsName: myfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: myfs-replicated

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

# 删除PVC时保留数据
reclaimPolicy: Retain

最后创建 PVC,我这里是要给 uptime 使用:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-uptime
  namespace: monitoring
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: rook-cephfs-monitoring
  resources:
    requests:
      storage: 2Gi

最后配置 pod 使用即可:

# 省略了其他配置...
    spec:
      containers:
      - name: uptime-kuma
        image: louislam/uptime-kuma:1.23.13
        ports:
        - containerPort: 3001
        volumeMounts:
        - name: uptime-kuma-pvc-local
          mountPath: /app/data
      volumes:
      - name: uptime-kuma-pvc-local
        persistentVolumeClaim:
          claimName: cephfs-uptime

@Bpazy
Copy link
Owner Author

Bpazy commented Jan 24, 2025

对 Ceph 压测

新建一个工具 Pod:

apiVersion: v1
kind: Pod
metadata:
  name: migration-pod
  namespace: monitoring
spec:
  volumes:
    - name: cephfs-uptime
      persistentVolumeClaim:
        claimName: cephfs-uptime
  containers:
    - name: ubuntu-container
      image: ubuntu:latest
      command: ["/bin/bash", "-c", "--"]
      args: ["while true; do sleep 5; done;"]
      volumeMounts:
        - name: cephfs-uptime
          mountPath: /data/uptime2

启动并进入容器:

ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl -n monitoring exec -it migration-pod -- /bin/bash

压测:

记住添加oflag参数以绕过磁盘页面缓存

root@migration-pod:/data/uptime2# dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 104.89 s, 10.2 MB/s

root@migration-pod:/data/uptime2# dd if=/dev/zero of=512 bs=512 count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 66.6174 s, 7.7 kB/s

可以看到性能差的离谱,下一步排查性能。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant