September 2024 – Kalles blog

I gave up on RAID for high availability

I now use Ceph instead. When i needed to restart the server with a RAID volume i had to stop all VMs using it. With Ceph i can restart a node without stopping anything. Ceph is an excellent choice for high availability (HA) due to its design and architecture.

Why Ceph Excels at Availability

Here are some key reasons why Ceph is well-suited for HA:

Distributed Architecture: Ceph’s distributed architecture ensures that data is striped across multiple nodes, making it more resilient to failures. If one node fails, the remaining nodes can continue to operate and provide access to data.
Self-healing: Ceph’s self-healing capabilities allow it to detect and automatically recover from node failures. This ensures that your storage system remains available even when individual components fail.
No Single Point of Failure (SPOF): Ceph’s design eliminates SPOFs by distributing data across multiple nodes. If one node fails, the other nodes can take over without impacting availability.
Scalability: Ceph scales horizontally to meet increasing storage demands, ensuring that your HA setup remains performant and efficient even as your data grows.
Multi-site Replication: Ceph supports multi-site replication, which enables you to maintain a copy of your data at a secondary site for disaster recovery or load balancing purposes.
High-performance replication: Ceph’s high-performance replication capabilities ensure that replicated data is kept up-to-date in real-time, minimizing the risk of data inconsistency.

How Ceph Achieves High Availability

Ceph achieves HA through several mechanisms:

OSD (Object Storage Daemon) failures: If an OSD fails, other OSDs can take over its role and continue to provide access to data.
PG (Placement Group) rebalancing: When a node fails, Ceph rebalances PGs across remaining nodes to ensure continued availability.
Monitors: Ceph’s monitors monitor the health of OSDs and automatically detect failures, triggering self-healing mechanisms.

Benefits of Using Ceph for HA

By using Ceph for your storage needs, you can enjoy:

High uptime: Ceph’s design ensures that data remains accessible even in the event of node failures.
Scalability: Ceph scales horizontally to meet increasing storage demands without impacting performance.
Reduced maintenance: With Ceph, you can focus on running your applications rather than worrying about storage maintenance and upgrades.

In Conclusion

Ceph’s distributed architecture, self-healing capabilities, and multi-site replication make it an ideal choice for high availability. By leveraging Ceph for your storage needs, you can ensure that data remains accessible even in the face of hardware failures or outages.

Why rsync is bad for backups

While rsync is an excellent tool for transferring files, it has some limitations when it comes to creating consistent backups. You want at least crash consistent backups. There must be some kind of snapshotting of the filesystem.

Why Rsync Can’t Do Consistent Backups

Here are the main reasons why rsync can’t do consistent backups:

File system snapshots: To create a consistent backup, you need to take a snapshot of the file system at a specific point in time. However, rsync relies on the file system’s metadata to determine which files have changed, and it doesn’t capture any information about the overall consistency of the file system.
Transaction logs: Modern databases use transaction logs to maintain consistency. These logs track all changes made to the database since the last checkpoint or backup. rsync can’t understand these logs or replicate them, which means it can’t ensure consistency.
Locking and concurrency: In a multi-user environment, multiple users might be modifying files simultaneously. rsync has no way of knowing whether a file was modified before or after the point at which you want to create a consistent backup.
Partial writes: When writing data to disk, many applications don’t write the entire buffer in one go; instead, they break it up into smaller chunks and perform multiple partial writes. rsync can’t detect these partial writes or ensure that all parts of a file are written correctly.

What Rsync Does Instead

While rsync can’t create consistent backups like some other tools (e.g., snapshotting software), it excels at:

Incremental backups: By keeping track of which files have changed, rsync allows you to perform incremental backups, significantly reducing the time and space needed for backup purposes.
File-level consistency: rsync ensures that each file is consistent within itself; it just doesn’t guarantee overall system consistency.

Alternatives for Consistent Backups

If you need consistent backups, consider using other tools specifically designed for this purpose:

Snapshots: Take regular snapshots of your file systems or volumes using software like LVM (Logical Volume Manager) or ZFS.
Database backup solutions: Use specialized database backup tools, such as PostgreSQL’s pg_dump or MySQL’s mysqldump, to capture the entire database state at a given point in time.
Backup software with consistency features: Utilize backup software that includes consistency features, like Veeam backup and replication, which can create consistent backups by taking snapshots of file systems and capturing transaction logs.