Mastering Kubernetes Storage
Storing and retrieving data is crucial for most real-world applications. Kubernetes' persistent volume subsystem allows you to connect to enterprise-grade storage systems that provide advanced data management services such as backup and recovery, replication, and snapshots.
Overview
Kubernetes supports a variety of storage systems, including those from major cloud providers and enterprise-class solutions like EMC and NetApp. This section will cover:
- The big picture of Kubernetes storage
- Various storage providers
- The Container Storage Interface (CSI)
- Kubernetes persistent volume subsystem
- Dynamic provisioning with Storage Classes
- Hands-on examples
The Big Picture
Kubernetes supports different types of storage, such as block, file, and object storage, from various external systems, either in the cloud or on-premises.
High-Level Architecture
Storage providers connect to Kubernetes through a plugin layer, often using the Container Storage Interface (CSI). This standardized interface simplifies integrating external storage resources with Kubernetes.
Key Components
- Storage Providers: External systems providing storage services, like EMC, NetApp, or cloud providers.
- Plugin Layer: Connects external storage systems with Kubernetes, typically using CSI plugins.
- Kubernetes Persistent Volume Subsystem: Standardized API objects that allow applications to consume storage easily.
Storage Providers
Kubernetes supports a wide range of external storage systems, each typically providing its own CSI plugin. These plugins are usually installed via Helm charts or YAML installers and run as Pods in the kube-system
Namespace.
Restrictions
- Cloud-Specific: You can't provision and mount GCP volumes if your cluster is on Microsoft Azure.
- Locality: Pods often need to be in the same region or zone as the storage backend.
Container Storage Interface (CSI)
The Container Storage Interface (CSI) is a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems (COS) like Kubernetes. CSI allows for the consistent configuration and management of storage solutions across various container orchestration systems.
CSI enables storage providers to develop a standardized plugin once and have it work across a multitude of container orchestration systems without requiring changes. This simplifies the process of adding new storage capabilities to Kubernetes clusters and ensures compatibility and extendibility.
While CSI is a critical piece of getting storage working in Kubernetes, unless you explicitly work on writing storage plugins you'll likely never interact with it directly. Most of your interaction with CSI will simply be referencing your relevant CSI plugin in YAML files.
Benefits of CSI
- Decoupled Updates: CSI plugins can be updated independently of Kubernetes releases.
- Broad Compatibility: CSI plugins work across different orchestration platforms.
Installing CSI Plugins
Most cloud platforms pre-install CSI plugins for native storage services. Third-party storage systems require manual installation, often available as Helm charts or YAML files.
Kubernetes Persistent Volume Subsystem
The Persistent Volume Subsystem uses several key resources to manage storage:
- PersistentVolumes (PV): Represent external storage volumes.
- PersistentVolumeClaims (PVC): Requests for storage by applications.
- StorageClasses (SC): Define different classes of storage for dynamic provisioning.
Workflow Example
- Pod Requests Storage: Via a PersistentVolumeClaim (PVC).
- PVC Requests Creation: PVC asks the StorageClass (SC) to create a new PV on the storage backend.
- CSI Plugin Interaction: The SC uses the CSI plugin to provision the volume.
- Volume Creation: The external volume is created and reported back to Kubernetes.
- PV and PVC Binding: The PV is mapped to the created volume, and the Pod mounts the PV.
Dynamic Provisioning with Storage Classes
StorageClasses (SCs) allow you to define different types of storage. How they are defined depends on the type of storage you're using. For example, if you're using Google Cloud Storage you have classes such as Standard, Nearline, Coldline, and Archive. You may also have simpler/more straightforward classes at your disposal such as SSD and HDD. When you create a SC you map both of those definitions so Pods in your cluster can use either or.
Example YAML for a StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ssd
provisioner: pd.csi.storage.gke.io # Google Cloud CSI plugin
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: pd-ssd # Google Cloud SSD drives
provisioned-iops-on-create: '10000'
Key Points
- Immutability: StorageClass objects cannot be modified once created.
- Meaningful Names: Use descriptive names for easy reference.
- Provisioner-Specific Parameters: The parameters block varies between different plugins.
Example
Example YAML: Below is the high-level flow for creating and using StorageClasses:
- Ensure you have a storage back-end (cloud, on-prem, etc.)
- Have a running Kubernetes cluster
- Install and setup the CSI storage plugin to connect to Kubernetes
- Create at least one StorageClass on Kubernetes
- Deploy Pods with PVCs that reference those Storage classes
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: mypvc
containers:
- name: my-container
image: myimage
volumeMounts:
- name: data
mountPath: /data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: fast
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
Additional Volume Settings
Access Modes
- ReadWriteOnce (RWO): Single PVC can bind to a volume in read-write mode.
- ReadWriteMany (RWM): Multiple PVCs can bind to a volume in read-write mode.
- ReadOnlyMany (ROM): Multiple PVCs can bind to a volume in read-only mode.
Reclaim Policy
- Delete: Deletes PV and external storage when PVC is released.
- Retain: Keeps PV and external storage when PVC is deleted, requiring manual cleanup.
Summary
Kubernetes provides a robust storage subsystem that allows applications to dynamically provision and manage storage from various external systems. By leveraging CSI plugins and StorageClasses, you can create flexible and scalable storage solutions tailored to your application's needs.