Dung (Donny) Nguyen

Senior Software Engineer

Backup and Recovery Strategy for an Online Game Database System

This is a comprehensive backup and recovery strategy for an online game database system, which needs to handle player data, game states, transactions, and maintain high availability.

Example Scenario:

Backup Strategy:

  1. Primary Backups (On-Premise):
  1. Secondary Backups (Off-site):
    • Full Database Backup Replication
      • Copied to cloud storage within 6 hours
      • Encrypted during transit and at rest
      • Geo-redundant storage across multiple regions
      • Retained for 3 months
  2. Real-time Protection:
    • Always-On Availability Groups
      • Synchronous replica in the same datacenter
      • Asynchronous replica in different geographical location
      • Automatic failover capability
      • Used for read-scaling during peak times

Recovery Procedures:

  1. Minor Issues (Corrupt Data/Accidental Changes):
-- Example point-in-time recovery
RESTORE DATABASE GameDB 
FROM DISK = 'E:\Backup\GameDB_Full.bak'
WITH NORECOVERY;

RESTORE DATABASE GameDB 
FROM DISK = 'E:\Backup\GameDB_Diff.bak'
WITH NORECOVERY;

RESTORE LOG GameDB 
FROM DISK = 'E:\Backup\GameDB_Log.trn'
WITH RECOVERY,
STOPAT = '2024-12-11 14:30:00';
  1. Major Failure (Primary Server Down):
    • Automatic failover to synchronous replica
    • RTO: Less than 1 minute
    • RPO: Zero data loss
  2. Disaster Recovery (Data Center Loss):
    • Failover to asynchronous replica
    • RTO: Less than 15 minutes
    • RPO: Potential loss of last few transactions

Monitoring and Maintenance:

  1. Automated Health Checks
    • Backup success/failure monitoring
    • Replica synchronization status
    • Storage space monitoring
    • Backup file integrity checks
  2. Regular Testing
    • Monthly restore tests of full backup
    • Quarterly disaster recovery drills
    • Validation of restored data integrity
  3. Performance Optimization
    • Regular index maintenance
    • Statistics updates
    • Backup compression optimization
    • Storage cleanup procedures

Example Recovery Time Estimates:

Scenario 1: Single table corruption
- Recovery time: 10-15 minutes
- Uses point-in-time recovery
- Minimal impact on players

Scenario 2: Complete server failure
- Recovery time: <1 minute
- Automatic failover to replica
- Players experience brief disconnection

Scenario 3: Data center disaster
- Recovery time: 10-15 minutes
- Manual failover to DR site
- Some data loss possible

Documentation:

  1. Maintain detailed recovery procedures
  2. Keep configuration scripts in version control
  3. Document all backup schedules and retention policies
  4. Regular updates to emergency contact list

Special Considerations for Game Data:

  1. Player Economy Protection
    • Transaction log backup frequency increased during special events
    • Additional verification steps for high-value item transactions
    • Separate backup chain for economy-related tables
  2. Character Data Safety
    • More frequent backups of character progress tables
    • Hash verification of character data integrity
    • Special restore procedures for character rollbacks