Aurora Global Database: Sub-Second Cross-Region Replication for MySQL and PostgreSQL

Bits Lovers
Written by Bits Lovers on
Aurora Global Database: Sub-Second Cross-Region Replication for MySQL and PostgreSQL

Aurora Global Database replicates your MySQL or PostgreSQL data across up to five AWS regions with typical replication lag under one second. Writes happen in a single primary region; secondary regions receive replicated data through storage-layer replication, not binlog-based logical replication. This distinction matters: storage replication operates below the database engine, which is how Aurora achieves sub-second lag even under heavy write load and why secondary clusters come back online within minutes during a regional failure.

This is the right architecture when you need active-read capacity in multiple regions, sub-second RPO for disaster recovery, or when your Aurora cluster is approaching the single-region read scaling limit. It’s different from DynamoDB Global Tables (which is multi-master) — Aurora Global Database has one writable primary region, with secondaries handling reads only unless you promote them during failover.

How Storage-Level Replication Works

Standard RDS Multi-AZ replication sends transaction log records from the primary instance to the standby across the same region. Aurora Global Database operates differently: the Aurora storage layer replicates data pages directly from the primary region’s storage to secondary region storage nodes using dedicated replication infrastructure. The database engines in secondary regions read from their local storage layer — they don’t receive or process log streams.

This is why secondary regions can serve reads with extremely low latency (same as single-region Aurora) and why promoting a secondary during failover takes under a minute rather than the several minutes it takes to replay logs on a traditional replica.

The trade-off: writes go exclusively to the primary region. If your primary region goes down entirely, you manually promote a secondary to become the new primary. There’s no automatic write failover — this is intentional, to prevent split-brain scenarios.

Creating an Aurora Global Database

# Step 1: Create a primary Aurora cluster in us-east-1
aws rds create-db-cluster \
  --db-cluster-identifier my-global-primary \
  --engine aurora-postgresql \
  --engine-version 15.4 \
  --master-username admin \
  --master-user-password your-secure-password \
  --db-subnet-group-name my-subnet-group \
  --vpc-security-group-ids sg-primary \
  --enable-global-write-forwarding \
  --storage-encrypted \
  --region us-east-1

# Add primary instance
aws rds create-db-instance \
  --db-instance-identifier my-global-primary-instance \
  --db-cluster-identifier my-global-primary \
  --engine aurora-postgresql \
  --db-instance-class db.r6g.large \
  --region us-east-1

# Wait for cluster to be available
aws rds wait db-cluster-available \
  --db-cluster-identifier my-global-primary \
  --region us-east-1

# Step 2: Create the Global Database from the primary cluster
GLOBAL_ARN=$(aws rds create-global-cluster \
  --global-cluster-identifier my-aurora-global \
  --source-db-cluster-identifier arn:aws:rds:us-east-1:123456789012:cluster:my-global-primary \
  --query 'GlobalCluster.GlobalClusterArn' \
  --output text \
  --region us-east-1)

echo "Global Cluster: $GLOBAL_ARN"

# Step 3: Add secondary region (eu-west-1)
aws rds create-db-cluster \
  --db-cluster-identifier my-global-secondary-eu \
  --engine aurora-postgresql \
  --engine-version 15.4 \
  --global-cluster-identifier my-aurora-global \
  --db-subnet-group-name my-eu-subnet-group \
  --vpc-security-group-ids sg-secondary-eu \
  --region eu-west-1

# Add secondary instance in eu-west-1
aws rds create-db-instance \
  --db-instance-identifier my-global-secondary-eu-instance \
  --db-cluster-identifier my-global-secondary-eu \
  --engine aurora-postgresql \
  --db-instance-class db.r6g.large \
  --region eu-west-1

# Check global cluster status and replication lag
aws rds describe-global-clusters \
  --global-cluster-identifier my-aurora-global \
  --query 'GlobalClusters[0].GlobalClusterMembers[].{Cluster:DBClusterArn,Writer:IsWriter,Region:DBClusterArn}'

The secondary cluster in eu-west-1 receives a copy of all data from the primary and keeps it in sync automatically. You can add up to 5 secondary regions.

Your application uses different connection endpoints for writes vs reads:

# Get endpoints
# Writes: always the primary cluster endpoint
aws rds describe-db-clusters \
  --db-cluster-identifier my-global-primary \
  --query 'DBClusters[0].Endpoint' \
  --region us-east-1

# Reads in eu-west-1: use the secondary cluster reader endpoint
aws rds describe-db-clusters \
  --db-cluster-identifier my-global-secondary-eu \
  --query 'DBClusters[0].ReaderEndpoint' \
  --region eu-west-1

An application in Europe reads from the eu-west-1 secondary (local, fast) but sends writes to the us-east-1 primary (cross-region, adds ~80ms latency). Whether that write latency is acceptable depends on your workload. Read-heavy applications with relatively few writes — content platforms, analytics frontends, customer portals — benefit from Global Database without feeling the write latency. Write-heavy applications may need a different design.

Write Forwarding

Write forwarding lets secondary clusters accept writes and forward them to the primary transparently. Reads are served locally; writes travel to the primary and are committed there before the secondary gets the replication update.

# Enable write forwarding on a secondary cluster
aws rds modify-db-cluster \
  --db-cluster-identifier my-global-secondary-eu \
  --enable-global-write-forwarding \
  --region eu-west-1

# Application connection with write forwarding enabled
# Works with both reads (local) and writes (forwarded to primary)
# aurora_replica_read_consistency session variable controls read-after-write behavior

Write forwarding simplifies application architecture — you don’t need to maintain separate write and read endpoints in your connection layer. But writes still incur cross-region latency, and transactions that mix reads and writes can be slower due to the forwarding round trip. Benchmark before relying on write forwarding in latency-sensitive write paths.

Monitoring Replication Lag

The most important metric for Global Database is AuroraGlobalDBReplicationLag — how far behind the secondary is from the primary.

# Check replication lag on secondary cluster
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name AuroraGlobalDBReplicationLag \
  --dimensions \
    Name=DBClusterIdentifier,Value=my-global-secondary-eu \
  --statistics Average Maximum \
  --period 60 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ)

# Set alarm on replication lag > 5 seconds
aws cloudwatch put-metric-alarm \
  --alarm-name aurora-global-lag-high \
  --metric-name AuroraGlobalDBReplicationLag \
  --namespace AWS/RDS \
  --dimensions Name=DBClusterIdentifier,Value=my-global-secondary-eu \
  --statistic Maximum \
  --period 60 \
  --evaluation-periods 3 \
  --threshold 5000 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:eu-west-1:123456789012:ops-alerts

Lag is measured in milliseconds. Under normal conditions, expect 1-500ms. Lag spikes during primary write bursts and recovers quickly. Sustained lag above 1 second warrants investigation — usually indicates a primary write rate the replication infrastructure is struggling to keep up with, or network congestion between regions.

Planned Failover (Switchover)

Planned failover (called “switchover” in Aurora) moves the write primary to a different region gracefully. The current primary stops accepting writes, waits for the target secondary to be fully caught up (zero lag), then promotes the secondary to primary. Total time: typically under a minute.

# Initiate a planned failover to eu-west-1
aws rds switchover-global-cluster \
  --global-cluster-identifier my-aurora-global \
  --target-db-cluster-identifier arn:aws:rds:eu-west-1:123456789012:cluster:my-global-secondary-eu

# Monitor switchover progress
aws rds describe-global-clusters \
  --global-cluster-identifier my-aurora-global \
  --query 'GlobalClusters[0].{Status:Status,Members:GlobalClusterMembers[].{Cluster:DBClusterArn,Writer:IsWriter}}'

Use planned failover for: region migrations, disaster recovery drills (practice makes the real thing faster), and maintenance where you want to move the write primary away from a region you’re updating.

After the switchover, the old primary becomes a secondary. Update your application’s write endpoint to point at the new primary’s cluster endpoint. If you’re using Route 53 CNAMEs to abstract database endpoints, this is where that investment pays off — update the CNAME rather than touching every application config.

Unplanned Failover (Regional Failure)

If the primary region becomes unavailable (not a graceful switchover), you manually promote a secondary. The secondary is detached from the global cluster and promoted to a standalone primary.

# Remove the secondary from the global cluster (detach it)
aws rds remove-from-global-cluster \
  --global-cluster-identifier my-aurora-global \
  --db-cluster-identifier arn:aws:rds:eu-west-1:123456789012:cluster:my-global-secondary-eu \
  --region eu-west-1

# The cluster is now standalone — promote it by modifying it to be writable
# (it becomes writable automatically after removal from global cluster)

# Verify it's now the writer
aws rds describe-db-clusters \
  --db-cluster-identifier my-global-secondary-eu \
  --query 'DBClusters[0].{Status:Status,Endpoint:Endpoint}' \
  --region eu-west-1

After detachment, the promoted cluster’s endpoint becomes the new write endpoint. Data loss depends on the replication lag at the moment of failure. With typical lag under one second, RPO is usually under one second for well-monitored deployments.

Once the primary region recovers, you re-attach it as a secondary to the new primary, which rebuilds the data by replicating from the new primary back to the recovered cluster. Then you can switchover back to the original region if desired.

Aurora Serverless v2 with Global Database

Aurora Serverless v2 works with Global Database. Serverless instances scale ACUs (Aurora Capacity Units) automatically based on load, which is useful for secondary regions that handle variable read traffic.

# Create secondary instance as Serverless v2
aws rds create-db-instance \
  --db-instance-identifier my-global-secondary-serverless \
  --db-cluster-identifier my-global-secondary-eu \
  --engine aurora-postgresql \
  --db-instance-class db.serverless \
  --region eu-west-1

# Set Serverless v2 scaling range for the secondary cluster
aws rds modify-db-cluster \
  --db-cluster-identifier my-global-secondary-eu \
  --serverless-v2-scaling-configuration '{"MinCapacity": 0.5, "MaxCapacity": 32}' \
  --region eu-west-1

A secondary cluster running Serverless v2 scales down to 0.5 ACU during low-traffic hours and scales up automatically when reads increase. This makes secondary regions cost-effective for workloads where read traffic varies significantly by time of day.

Pricing

Aurora Global Database pricing adds to the base Aurora cluster costs. The key additions:

Replicated write I/O: Every write to the primary cluster is replicated to secondary regions. You pay the primary region’s standard Aurora I/O rate, plus a replicated write I/O charge for each secondary region. Replicated write I/O costs $0.20 per million I/Os (compared to standard Aurora I/O at $0.20/million for provisioned, included in Serverless v2).

Data transfer: Replication traffic between regions uses AWS backbone network. Typical charges are the standard cross-region data transfer rates for your region pair — roughly $0.02/GB for US to EU.

Secondary cluster compute: Each secondary region runs its own cluster instances. A db.r6g.large in eu-west-1 costs the same as in us-east-1 ($0.26/hour). Secondary instances consume compute even when serving zero reads.

For a 2-region Global Database with db.r6g.large primary and secondary: ~$0.52/hour in compute + replication overhead. At moderate write volume (100 million writes/month), replication adds roughly $20/month in I/O charges and $5-10 in data transfer. Total additional cost over single-region: $50-100/month depending on write volume.

Global Database is the right choice when you need SQL semantics, complex transactions, and relational integrity across regions — use cases where DynamoDB Global Tables’ eventual consistency model doesn’t fit. For simpler key-value patterns with higher write throughput requirements, the DynamoDB Streams and Global Tables guide covers the managed NoSQL approach to multi-region data. For the connection management layer in front of your Aurora clusters, RDS Proxy handles connection pooling between your applications and Aurora instances regardless of which region the primary is in.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus