From 717fcb9c81d2bc3cc7a84a3ebea6572d7ff0f5cf Mon Sep 17 00:00:00 2001 From: doc Date: Mon, 30 Jun 2025 20:06:28 +0000 Subject: uploading documentation --- casestudies/chaosmonkey.md | 90 ++++++++++++++++++++++++++++++++++++++++ casestudies/genesissynccs.md | 99 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 189 insertions(+) create mode 100644 casestudies/chaosmonkey.md create mode 100644 casestudies/genesissynccs.md (limited to 'casestudies') diff --git a/casestudies/chaosmonkey.md b/casestudies/chaosmonkey.md new file mode 100644 index 0000000..4b64906 --- /dev/null +++ b/casestudies/chaosmonkey.md @@ -0,0 +1,90 @@ +# πŸ›‘οΈ Case Study: Bulletproofing Genesis Infrastructure with ChaosMonkey DR Drills + +**Date:** May 10, 2025 +**Organization:** Genesis Hosting Technologies +**Lead Engineer:** Doc (Genesis Radio, Infrastructure Director) + +--- + +## 🎯 Objective + +Design and validate a robust, automated disaster recovery (DR) system for Genesis infrastructure β€” including PostgreSQL, MinIO object storage, and ZFS-backed media β€” with an external testbed (Linode-hosted) named **ChaosMonkey**. + +--- + +## 🧩 Infrastructure Overview + +| Component | Role | Location | +|------------------|--------------------------------------|-----------------------------| +| PostgreSQL | Primary/replica database nodes | zcluster.technodrome1/2 | +| MinIO | S3-compatible object storage | shredder | +| ZFS | Primary media storage backend | minioraid5, thevault | +| GenesisSync | Hybrid mirroring and integrity check | Deployed to all asset nodes | +| ChaosMonkey | DR simulation and restore target | Linode | + +--- + +## 🧰 Tools Developed + +### `genesis_sync.sh` +- Mirrors local ZFS to MinIO and vice versa +- Supports verification, dry-run, and audit mode +- Alerts via KrangBot on error or drift + +### `run_dr_failover.sh` & `run_dr_failback.sh` +- Safely fail over and restore PostgreSQL + GenesisSync +- Auto-promotes DB nodes +- Sends alerts via Telegram + +### `genesis_clone_manager_multihost.sh` +- Clones live systems (DB, ZFS, MinIO) from prod to ChaosMonkey +- Runs with dry-run preview mode +- Multi-host orchestration via SSH + +### `genesis_clone_validator.sh` +- Runs on ChaosMonkey +- Verifies PostgreSQL snapshot, ZFS datasets, and MinIO content +- Can optionally trigger a GenesisSync `--verify` + +--- + +## πŸ§ͺ DR Drill Process (Stage 3 - Controlled Live Test) + +1. πŸ”’ Freeze writes on production nodes +2. πŸ“€ Snapshot and clone entire stack to ChaosMonkey +3. πŸ” Promote standby PostgreSQL and redirect test traffic +4. πŸ§ͺ Validate application behavior and data consistency +5. πŸ“© Alert via KrangBot with sync/report logs +6. βœ… Trigger safe failback using snapshot + delta sync + +--- + +## 🚨 Results + +- **Recovery time (RTO)**: PostgreSQL in 3 min, full app < 10 min +- **Zero data loss** using basebackups and WAL +- **GenesisSync** completed with verified parity between ZFS and MinIO +- **Repeatable**: Same scripts reused weekly for validation + +--- + +## πŸ’‘ Key Takeaways + +- **Scripts are smarter than sleepy admins** β€” guardrails matter +- **ZFS + WAL + GitOps-style orchestration = rock solid DR** +- **Testing DR live on ChaosMonkey builds real confidence** +- **Failure Friday is not a risk β€” it’s a training ground** + +--- + +## 🌟 Final Thoughts + +By taking DR out of theory and into action, Genesis Hosting Technologies ensures that not only is data safe β€” it’s recoverable, testable, and fully verified on demand. With ChaosMonkey in the mix, Genesis now embraces disaster… on its own terms. + + + +--- + +## πŸ“ A Note on Naming + +"ChaosMonkey" is inspired by the original [Chaos Monkey](https://github.com/Netflix/chaosmonkey) tool created by Netflix, designed to test the resilience of their infrastructure by randomly terminating instances. Our use of the name pays homage to the same principles of reliability, failover testing, and engineering with failure in mind. No affiliation or endorsement by Netflix is implied. diff --git a/casestudies/genesissynccs.md b/casestudies/genesissynccs.md new file mode 100644 index 0000000..0eeb23e --- /dev/null +++ b/casestudies/genesissynccs.md @@ -0,0 +1,99 @@ +# GenesisSync: Hybrid Object–Block Media Architecture for Broadcast Reliability and Scalable Archiving + +## Executive Summary + +GenesisSync is a hybrid storage architecture developed by Genesis Hosting Technologies to solve a persistent challenge in modern broadcast environments: enabling fast, local access for traditional DJ software while simultaneously ensuring secure, scalable, and redundant storage using object-based infrastructure. + +The system has been implemented in a live production environment, integrating StationPlaylist (SPL), AzuraCast, Mastodon, and MinIO object storage with ZFS-backed block storage. GenesisSync enables near-real-time file synchronization, integrity checking, and disaster recovery with no vendor lock-in or reliance on fragile mount hacks. + +--- + +## The Problem + +- **SPL and similar DJ automation systems** require low-latency, POSIX-style file access for real-time media playback and cue-point accuracy. +- **Web-native applications** (like Mastodon and AzuraCast) operate more efficiently using scalable object storage (e.g., S3, MinIO). +- Legacy systems often can't interface directly with object storage without middleware or fragile FUSE mounts. +- Previous attempts to unify object and block storage often led to file locking issues, broken workflows, or manual copy loops. + +--- + +## The GenesisSync Architecture + +### Components + +- **Primary Storage**: ZFS-backed local block volumes (ext4 or ZFS) +- **Backup Target**: MinIO object storage with S3-compatible APIs +- **Apps**: StationPlaylist (Windows via SMB), AzuraCast (Docker), Mastodon +- **Sync Tooling**: `rsync` for local, `mc mirror` for object sync + +### Sync Strategy + +- Local paths like `/mnt/azuracast` and `/mnt/stations` serve as the source of truth +- Hourly cronjob or systemd timer mirrors data to MinIO using: + ```bash + mc mirror --overwrite --remove /mnt/azuracast localminio/azuracast-backup + ``` +- Optionally, `rsync` is used for internal ZFS β†’ block migrations + +### Benefits + +- 🎧 Local-first for performance-sensitive apps +- ☁️ Cloud-capable for redundancy and long-term archiving +- πŸ” Resilient to network blips, container restarts, or media sync delays + +--- + +## Real-World Implementation + +| Component | Role | +|------------------|--------------------------------------------------| +| SPL | Reads from ZFS mirror via SMB | +| AzuraCast | Writes directly to MinIO via S3 API | +| MinIO | Remote object store for backups | +| ZFS | Local resilience, snapshots, and fast access | +| `mc` | Handles object sync from local storage | +| `rsync` | Handles safe internal migration and deduplication | + +### Recovery Drill + +- Snapshot-based rollback with ZFS for quick recovery +- Verified `mc mirror` restore from MinIO to cold boot new environment + +--- + +## Results + +| Metric | Value | +|-------------------------------|----------------------------------------| +| Playback latency (SPL) | <10ms via local ZFS | +| Average mirror time (100MB) | ~12 seconds | +| Recovery time (5GB) | <2 minutes | +| Deployment size | ~4.8TB usable | +| Interruption events | 0 file-level issues since deployment | + +--- + +## Lessons Learned + +- Object storage is powerful, but it's not a filesystem β€” don't pretend it is. +- Legacy apps need real disk paths β€” even if the data lives in the cloud. +- Syncing on your terms (with tools like `rsync` and `mc`) beats fighting with FUSE. +- Snapshot + mirror = peace of mind. + +--- + +## Future Roadmap + +- πŸ“¦ Add bidirectional sync detection for selective restores +- βœ… Build in sync integrity verification (hash/diff-based) +- πŸ”” Hook Telegram alerts for failed syncs or staleness +- 🌐 Publish GenesisSync as an open-source utility +- πŸ“„ Full documentation for third-party station adoption + +--- + +## About Genesis Hosting Technologies + +Genesis Hosting Technologies operates media infrastructure for Genesis Radio and affiliated stations. With a focus on low-latency access, hybrid cloud flexibility, and disaster resilience, GenesisSync represents a foundational step toward a smarter, mirrored media future. + +_"Fast on the air, safe on the backend."_ -- cgit v1.2.3