Backup¶
Two independent backup strategies: 1. etcd snapshots — cluster state (all k8s objects, Flux config, etc.) 2. Longhorn volume backups — persistent data (application data stored in PVCs)
Both use Garage (self-hosted S3) as the remote storage destination, plus rsync to the Debian server for etcd.
etcd Snapshots¶
What gets backed up¶
The etcd database contains all Kubernetes object state — every Deployment, Secret, ConfigMap, Service, etc. Restoring from an etcd snapshot brings the cluster back to the exact state it was in when the snapshot was taken.
What it doesn't cover: the actual data in PVCs (that's Longhorn's job).
Automatic snapshots (k3s built-in)¶
k3s takes etcd snapshots automatically:
- Default schedule: daily at 12:00 UTC
- Default retention: 5 snapshots
- Location: /var/lib/rancher/k3s/server/db/snapshots/
- Typical size: ~5-7 MB (grows with cluster size)
# List existing snapshots
sudo k3s etcd-snapshot ls
# Manual snapshot (before risky operations)
sudo k3s etcd-snapshot save --name pre-cilium-migration
Automated offsite backup via rsync¶
A daily cron job on the Debian server pulls the latest snapshots from the master node via rsync.
SSH key setup: The backup user on the Debian server needs passwordless SSH access to master. Use a dedicated key:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_for_master_rsa -N ""
ssh-copy-id -i ~/.ssh/id_for_master_rsa.pub user@master
sudoers on master (allow rsync without full sudo):
The --server flag limits rsync to server mode only — it can't run arbitrary commands.
Backup script on Debian server:
#!/bin/bash
BACKUP_BASE_DIR="/home/kcn/k3s_etcd_backup"
CURRENT_DATE=$(date +%Y-%m-%d_%H%M)
TARGET_DIR="$BACKUP_BASE_DIR/$CURRENT_DATE"
REMOTE_NODE="master"
REMOTE_PATH="/var/lib/rancher/k3s/server/db/snapshots/"
mkdir -p "$TARGET_DIR"
if ! rsync -avz \
-e "ssh -i /home/kcn/.ssh/id_for_master_rsa" \
--rsync-path="sudo rsync" \
"$REMOTE_NODE:$REMOTE_PATH/" "$TARGET_DIR"; then
echo "ERROR: rsync failed!" | logger -t etcd-backup
exit 1
fi
echo "Backup finished: $CURRENT_DATE" | logger -t etcd-backup
# Retention — remove directories older than 30 days
find "$BACKUP_BASE_DIR" -mindepth 1 -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +
Cron job (runs 1 hour after k3s auto-snapshot at 12:00 UTC):
Important: always use an explicit SSH key path in cron — cron has a minimal environment and won't read ~/.ssh/config.
Verify logs:
Snapshot before risky operations¶
Always take a manual snapshot before: - Changing TLS SAN configuration - Migrating CNI (Flannel → Cilium) - Upgrading k3s or any core component - Making changes to etcd-level configuration
ssh master "sudo k3s etcd-snapshot save --name pre-<description>"
ssh master "sudo k3s etcd-snapshot ls"
3-2-1 rule¶
| Copy | Location |
|---|---|
| 1 | Live in k3s (etcd) |
| 2 | Local on master node (/var/lib/rancher/k3s/server/db/snapshots/) |
| 3 | Offsite on Debian server (/home/kcn/k3s_etcd_backup/) |
Garage — Self-Hosted S3¶
What it is¶
Garage is a lightweight distributed S3-compatible object store written in Rust. It runs on the Debian server and serves as the S3 backend for both Longhorn volume backups and Loki log storage.
It's designed exactly for homelab/small-deployment use cases — minimal resource footprint, single-node deployment with optional clustering.
docker-compose.yml¶
services:
garage:
image: dxflrs/garage:v2.2.0
restart: unless-stopped
ports:
- "3900:3900" # S3 API
- "3901:3901" # RPC
- "3902:3902" # Admin API
volumes:
- /home/kcn/k8s/garage/data:/var/lib/garage/data
- /home/kcn/k8s/garage/meta:/var/lib/garage/meta
- /home/kcn/k8s/garage.toml:/etc/garage.toml:ro
environment:
- RUST_LOG=garage=info
garage.toml (v2.x)¶
metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
db_engine = "lmdb"
replication_factor = 1
rpc_secret = "GENERATE_WITH: openssl rand -hex 32"
rpc_bind_addr = "[::]:3901"
rpc_public_addr = "192.168.0.46:3901"
[s3_api]
api_bind_addr = "[::]:3900"
s3_region = "garage"
[admin]
api_bind_addr = "0.0.0.0:3902"
admin_token = "GENERATE_A_RANDOM_TOKEN"
Initial setup¶
# Get node ID
docker exec -it garage-garage-1 /garage node id
# Assign capacity (25GB)
docker exec -it garage-garage-1 /garage layout assign \
<NODE_ID> -z dc1 -c 25G
# Apply layout
docker exec -it garage-garage-1 /garage layout apply --version 1
# Create access key
docker exec -it garage-garage-1 /garage key create k3s-homelab
# Create buckets
docker exec -it garage-garage-1 /garage bucket create longhorn-backup
docker exec -it garage-garage-1 /garage bucket create loki-logs
# Grant permissions
docker exec -it garage-garage-1 /garage bucket allow longhorn-backup \
--read --write --owner --key k3s-homelab
docker exec -it garage-garage-1 /garage bucket allow loki-logs \
--read --write --owner --key k3s-homelab
Test S3 access from the cluster¶
kubectl run s3-test --image=amazon/aws-cli --rm -it --restart=Never \
--env="AWS_ACCESS_KEY_ID=YOUR_KEY_ID" \
--env="AWS_SECRET_ACCESS_KEY=YOUR_SECRET" \
-- s3 ls --endpoint-url http://192.168.0.46:3900 --region garage
Longhorn Volume Backups¶
BackupTarget (Longhorn v1.11+)¶
In Longhorn v1.11, the backup target configuration moved from a Setting to a proper CRD:
apiVersion: longhorn.io/v1beta2
kind: BackupTarget
metadata:
name: default
namespace: longhorn-system
spec:
backupTargetURL: s3://longhorn-backup@garage/
credentialSecret: longhorn-s3-secret
pollInterval: 300s # note: must end with "s"
S3 credentials as SealedSecret¶
kubectl create secret generic longhorn-s3-secret \
--namespace longhorn-system \
--from-literal=AWS_ACCESS_KEY_ID=YOUR_KEY_ID \
--from-literal=AWS_SECRET_ACCESS_KEY=YOUR_SECRET \
--from-literal=AWS_ENDPOINTS=http://192.168.0.46:3900 \
--from-literal=AWS_CERT="" \
--dry-run=client -o yaml | \
kubeseal --format yaml \
--cert ~/.config/kubeseal/pub-sealed-secrets.pem \
> apps/base/longhorn/longhorn-s3-secret-sealed.yaml
RecurringJob — automated daily backups¶
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: daily-backup
namespace: longhorn-system
spec:
cron: "0 11 * * *" # 11:00 UTC daily
task: backup
groups:
- default # applies to all volumes in the default group
retain: 2 # keep last 2 backups per volume
concurrency: 1
labels:
backup: daily
Verifying backup target status¶
kubectl -n longhorn-system describe backuptarget default
# Look for: Status.Available: true
# Verify data in Garage
docker exec -it garage-garage-1 /garage bucket info longhorn-backup
# Size > 0 means data is flowing
Useful Commands¶
# etcd
sudo k3s etcd-snapshot ls
sudo k3s etcd-snapshot save --name <name>
journalctl -t etcd-backup # backup script logs
# Garage
docker exec -it garage-garage-1 /garage status
docker exec -it garage-garage-1 /garage bucket info longhorn-backup
docker exec -it garage-garage-1 /garage bucket info loki-logs
# Longhorn
kubectl -n longhorn-system describe backuptarget default
kubectl get recurringjob -n longhorn-system