Business Continuity Plan
Effective Date: December 2025
Review Date: December 2026
1. Purpose
This plan ensures Acrux Education can maintain critical operations and recover from disruptions while minimising impact to schools and educators who rely on our platform.
2. Scope
This plan covers the Acrux Education platform (Acrux Mark, Acrux Analyse) and supporting infrastructure hosted on Google Cloud Platform.
3. Recovery Objectives
Metric | Target | Objective |
|---|---|---|
Recovery Time Objective (RTO) | 4 hours | Maximum time to restore service following a major incident |
Recovery Point Objective (RPO) | 24 hours (standard) / Minutes (with PITR) | Minutes (with PITR)Maximum acceptable data loss |
4. Infrastructure Overview
4.1 Production Environment
Platform: Google Cloud Platform (GCP)
Region: australia-southeast1 (Sydney) for Australian customers
Region: europe-west2 (London) for United Kingdom customers
Compute: Cloud Run (multi-zone within region)
Database: Cloud SQL PostgreSQL 16 (Enterprise Edition)
Storage: Google Cloud Storage (regional redundancy)
4.2 Backup Strategy
Component | Backup Frequency | Retention | Recovery Method |
|---|---|---|---|
Database | Daily (02:00 UTC) | 30 days | Automated restore |
Point-in-Time Recovery | Continuous | 7 days | Transaction log replay |
Application Code | Continuous (BitBucket) | Indefinite | Git deployment |
Configuration | Version controlled | Indefinite | Infrastructure as Code |
5. Risk Assessment
5.1 Failure Scenarios
Scenario | Likelihood | Impact | Recovery Time |
|---|---|---|---|
Single zone outage | Low | Minimal | Minutes (auto-failover) |
Database zone failure | Low | Service degradation | Hours |
Full region outage | Very Low | Full outage | 4+ hours |
Data corruption | Very Low | Data loss risk | 1–4 hours |
Cyber attack | Low | Variable | Hours to days |
5.2 Mitigation Controls
Cloud Run automatically distributes across multiple zones within Sydney region
GCS buckets replicated across zones within the region
Database backups stored in GCS with regional redundancy
Point-in-time recovery enabled for granular data restoration
6. Incident Response Procedures
6.1 Detection & Assessment
Automated monitoring alerts via GCP Monitoring, Sentry, and internal dashboards
On-call personnel assess severity and impact
Incident classified as Critical, High, Medium, or Low
6.2 Response Actions
Critical Incident (Complete service outage):
Immediate notification to all technical personnel
Assess root cause and estimated recovery time
Communicate status to affected customers within 1 hour
Execute recovery procedures
Provide regular updates until resolution
High Incident (Partial service degradation):
Technical personnel notified and begin investigation
Implement workarounds where possible
Communicate to affected customers if impact exceeds 30 minutes
Execute remediation
6.3 Recovery Procedures
Database Recovery:
Identify point of failure
If data corruption: Restore from point-in-time recovery
If zone failure: Restore from daily backup to alternate zone
Verify data integrity
Resume service
Application Recovery:
Cloud Run automatically handles zone failures
If region-wide: Deploy to alternate region from BitBucket
Update DNS/routing as required
Verify functionality
7. Communication Plan
7.1 Internal Communication
Role | Notification Method | Timeframe |
|---|---|---|
Technical Team | Automated alerts + Phone | Immediate |
Leadership | Phone + Email | Within 15 minutes |
7.2 External Communication
Audience | Method | Timeframe |
|---|---|---|
Affected Schools | Email + Status Page | Within 1 hour |
Distribution Partners | Direct contact | Within 2 hours |
7.3 Status Page
Service status updates published at: https://status.acrux.education
8. Testing & Maintenance
8.1 Testing Schedule
Test Type | Frequency | Description |
|---|---|---|
Backup Restoration | Quarterly | Verify backups can be restored successfully |
Failover Simulation | Annually | Test zone failover procedures |
Communication Test | Annually | Verify contact lists and notification systems |
8.2 Plan Maintenance
This plan is reviewed annually
Updated following any significant incident
Updated when infrastructure changes occur
All personnel informed of material changes
9. Dependencies
9.1 Third-Party Services
Service | Purpose | Criticality | Fallback |
|---|---|---|---|
Google Cloud Platform | Infrastructure (Cloud Run, Cloud SQL, GCS) | Critical | None (primary platform) |
Cloudflare | DNS, CDN, DDoS protection, domain registrar | Critical | Manual DNS / registrar recovery |
Wonde | UK school MIS connectivity | High (UK only) | Manual roster import |
Google Workspace | Email, Drive, internal comms | High | Personal email contingency |
Praxis (self-hosted) | CRM, support, status page, KB | High | GCP recovery procedures apply |
Sentry | Error monitoring | Medium | GCP native monitoring |
GitHub | Source control | Medium | Local copies |
9.2 GCP Service Level Agreement
GCP provides a 99.95% monthly uptime SLA for Cloud Run and Cloud SQL, with service credits for failures to meet this target.
10. Roles & Responsibilities
Role Responsibility
CTO (Paul Kamarudin) Overall BCP ownership, technical recovery decisions
CEO (Dr. Leanne Russell) Customer communication, business decisions
Emergency Contact:
Email: [email protected]
Website: https://www.acrux.education