DynamoDB Streams and Global Tables: Event-Driven Patterns and Multi-Region Replication

Bits Lovers
Written by Bits Lovers on
DynamoDB Streams and Global Tables: Event-Driven Patterns and Multi-Region Replication

DynamoDB Streams is the feature that turns DynamoDB from a storage layer into an event source. Every write to a table — every put, update, and delete — produces a stream record that captures what changed. Lambda reads those records in near real-time (typically under a second) and triggers whatever processing you need: replicate to a search index, invalidate a cache, send a notification, write to an audit log. The stream is ordered within a partition key, so events for the same item arrive in sequence.

Global Tables builds on Streams internally to replicate data across AWS regions. You create a table once, add replica regions, and DynamoDB handles bidirectional replication automatically. Writes in any region propagate to all others within a second or two. This is the architecture for active-active multi-region applications — not just DR, but live traffic in multiple regions simultaneously.

This guide covers Stream configuration and view types, Lambda trigger patterns with proper error handling, and Global Tables setup with the tradeoffs around conflict resolution and consistency.

DynamoDB Streams: How They Work

When you enable a stream on a table, DynamoDB captures a time-ordered sequence of item-level modifications. The stream is divided into shards, one per partition in the underlying table. Each shard holds records for up to 24 hours — data older than 24 hours is gone.

# Enable streams on an existing table
aws dynamodb update-table \
  --table-name Orders \
  --stream-specification '{
    "StreamEnabled": true,
    "StreamViewType": "NEW_AND_OLD_IMAGES"
  }'

# Get the stream ARN (needed for Lambda trigger)
STREAM_ARN=$(aws dynamodb describe-table \
  --table-name Orders \
  --query 'Table.LatestStreamArn' \
  --output text)

echo "Stream ARN: $STREAM_ARN"

The StreamViewType controls what data each stream record contains:

  • KEYS_ONLY — only the partition key and sort key of the changed item. Smallest records, useful when you only need to know which item changed, not what it changed to.
  • NEW_IMAGE — the entire item after the change. Useful for replication — you have the full new state.
  • OLD_IMAGE — the entire item before the change. Useful for auditing — what did it look like before the write.
  • NEW_AND_OLD_IMAGES — both before and after. Most complete, largest records. Use this for change-tracking use cases where you need to compute diffs.

Choose the minimal view type your use case requires. NEW_AND_OLD_IMAGES doubles the storage and bandwidth compared to just one image.

Lambda Trigger from DynamoDB Streams

Lambda polls stream shards continuously when registered as a trigger. You pay for Lambda invocations, not for the polling itself.

# Create Lambda event source mapping to the stream
aws lambda create-event-source-mapping \
  --event-source-arn $STREAM_ARN \
  --function-name process-orders-stream \
  --starting-position TRIM_HORIZON \
  --batch-size 100 \
  --maximum-batching-window-in-seconds 5 \
  --bisect-batch-on-function-error true \
  --destination-config '{
    "OnFailure": {
      "Destination": "arn:aws:sqs:us-east-1:123456789012:orders-stream-dlq"
    }
  }' \
  --filter-criteria '{
    "Filters": [{
      "Pattern": "{\"dynamodb\": {\"NewImage\": {\"status\": {\"S\": [\"COMPLETED\"]}}}}"
    }]
  }'

TRIM_HORIZON starts processing from the oldest available record. Use LATEST if you only want records going forward from the moment the trigger is created.

BisectBatchOnFunctionError: true is essential for poison pill handling. If a batch fails, Lambda splits it in half and retries each half separately. This isolates the specific record causing the failure rather than blocking the entire shard on one bad record indefinitely.

FilterCriteria reduces the number of Lambda invocations by filtering records at the stream level before Lambda is called. The example processes only orders where status equals COMPLETED. Filtering happens before the Lambda invocation, so you don’t pay for processing records you’d discard anyway.

The Lambda function receives batches of stream records:

import json
import boto3
from decimal import Decimal

dynamodb = boto3.resource('dynamodb')
search_client = boto3.client('opensearch-service-domains')

def handler(event, context):
    processed = 0
    failed = 0
    
    for record in event['Records']:
        try:
            process_record(record)
            processed += 1
        except Exception as e:
            print(f"Failed to process record {record['dynamodb']['SequenceNumber']}: {e}")
            failed += 1
            # Re-raise to trigger bisect-batch-on-function-error
            raise
    
    print(f"Processed {processed} records, {failed} failed")

def process_record(record):
    event_name = record['eventName']  # INSERT, MODIFY, REMOVE
    
    if event_name == 'REMOVE':
        # Item was deleted — remove from search index
        keys = record['dynamodb']['Keys']
        order_id = keys['orderId']['S']
        delete_from_search(order_id)
        return
    
    # For INSERT and MODIFY, get the new image
    new_image = record['dynamodb'].get('NewImage', {})
    old_image = record['dynamodb'].get('OldImage', {})
    
    # Deserialize DynamoDB typed JSON to Python dict
    order = deserialize(new_image)
    
    if event_name == 'MODIFY':
        old_order = deserialize(old_image)
        # Log the specific change
        changed_fields = {k for k in order if order.get(k) != old_order.get(k)}
        print(f"Order {order['orderId']} modified: {changed_fields}")
    
    # Index in OpenSearch for full-text search
    index_in_search(order)

def deserialize(dynamodb_json):
    """Convert DynamoDB JSON format to regular Python dict"""
    from boto3.dynamodb.types import TypeDeserializer
    deserializer = TypeDeserializer()
    return {k: deserializer.deserialize(v) for k, v in dynamodb_json.items()}

def delete_from_search(order_id):
    # OpenSearch delete call
    pass

def index_in_search(order):
    # OpenSearch index call
    pass

The eventName field tells you whether each record is an INSERT, MODIFY, or REMOVE. This lets a single Lambda function handle all three cases differently. REMOVE records have keys and (if you used OLD_IMAGE or NEW_AND_OLD_IMAGES) the deleted item’s data, but no NewImage.

Error Handling and Dead Letter Queues

Stream processing fails silently if you don’t set up proper error handling. Without a DLQ, failed records retry until they age out of the 24-hour window or get bisected down to a single record that keeps failing — blocking all subsequent records on that shard.

# Create DLQ for failed stream records
DLQ_ARN=$(aws sqs create-queue \
  --queue-name orders-stream-dlq \
  --attributes '{"MessageRetentionPeriod": "1209600"}' \
  --query 'QueueUrl' \
  --output text | xargs -I {} aws sqs get-queue-attributes \
    --queue-url {} --attribute-names QueueArn \
    --query 'Attributes.QueueArn' --output text)

# Update event source mapping with retry settings
aws lambda update-event-source-mapping \
  --uuid your-mapping-uuid \
  --maximum-retry-attempts 3 \
  --maximum-record-age-in-seconds 3600 \
  --destination-config "{
    \"OnFailure\": {\"Destination\": \"$DLQ_ARN\"}
  }"

MaximumRetryAttempts: 3 gives each record 3 chances before sending to the DLQ. MaximumRecordAgeInSeconds: 3600 discards records older than an hour — prevents processing stale data during a long Lambda outage. Together these prevent a brief Lambda failure from clogging the stream with retries.

Global Tables: Multi-Region Active-Active

Global Tables version 2019.11.21 (the current version) uses streams internally to replicate writes between regions. When you write an item in us-east-1, DynamoDB captures the write in the stream, replicates it to eu-west-1, and the replica applies it — typically within one second under normal conditions.

# Create a table with Global Tables enabled (new table)
aws dynamodb create-table \
  --table-name GlobalOrders \
  --attribute-definitions \
    AttributeName=orderId,AttributeType=S \
    AttributeName=customerId,AttributeType=S \
  --key-schema \
    AttributeName=orderId,KeyType=HASH \
    AttributeName=customerId,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES \
  --region us-east-1

# Add replica regions
aws dynamodb update-table \
  --table-name GlobalOrders \
  --replica-updates '[
    {"Create": {"RegionName": "eu-west-1"}},
    {"Create": {"RegionName": "ap-southeast-1"}}
  ]' \
  --region us-east-1

# Monitor replica status
aws dynamodb describe-table \
  --table-name GlobalOrders \
  --region us-east-1 \
  --query 'Table.Replicas[].{Region:RegionName,Status:ReplicaStatus}'

StreamViewType: NEW_AND_OLD_IMAGES is required for Global Tables — the replication mechanism uses streams and needs both images for conflict resolution. Don’t change the stream view type on a Global Table.

Each replica region is a full copy of the table. Reads and writes work in any region. There’s no primary — all regions are peers.

# Write in us-east-1
aws dynamodb put-item \
  --table-name GlobalOrders \
  --item '{"orderId": {"S": "order-123"}, "customerId": {"S": "cust-456"}, "status": {"S": "pending"}}' \
  --region us-east-1

# Read from eu-west-1 (should appear within ~1 second)
aws dynamodb get-item \
  --table-name GlobalOrders \
  --key '{"orderId": {"S": "order-123"}, "customerId": {"S": "cust-456"}}' \
  --region eu-west-1

Conflict Resolution

The key limitation of active-active replication: two regions can write the same item simultaneously. DynamoDB resolves conflicts using “last writer wins” based on wall clock time with microsecond precision. The write with the later timestamp survives; the other is discarded.

This is fine for many use cases — user profile updates, session data, preference settings. It’s a problem for anything requiring strict consistency: financial ledgers, inventory counts, anything where concurrent updates must be serialized.

For use cases that can’t tolerate lost writes, restrict writes to a single region and use other regions for reads only. Your routing layer (Route 53 geolocation or latency routing) sends all writes to one region, all reads to the nearest region.

# Force reads to be strongly consistent within the local replica
aws dynamodb get-item \
  --table-name GlobalOrders \
  --key '{"orderId": {"S": "order-123"}, "customerId": {"S": "cust-456"}}' \
  --consistent-read \
  --region eu-west-1

--consistent-read on a Global Table replica returns the latest data written to that replica — not necessarily the latest data globally. If eu-west-1 is 500ms behind us-east-1, strongly consistent reads from eu-west-1 return data that’s up to 500ms stale. “Strongly consistent” means consistent within the replica, not globally consistent across replicas.

Pricing

Global Tables adds a replication cost on top of standard DynamoDB read/write unit costs. Each write to a Global Table is counted in every region it replicates to. A write to a 3-region Global Table costs 3x the standard write unit cost.

Standard write: $1.25 per million write request units. Global Table replicated write: $1.875 per million write request units. Each replicated region adds another $1.875 per million. At 3 regions total: $1.875 × 3 = $5.625 per million writes — 4.5x the cost of a single-region table.

This is the most important cost consideration for Global Tables. A write-heavy application in 3 regions pays significantly more than single-region. Budget accordingly.

Stream reads are priced at $0.02 per 100,000 read request units. If you have Lambda triggers on streams, the stream reads add to cost. For a busy table processing 100 million writes/day, stream reads alone add roughly $20/day.

For the single-table design patterns that make DynamoDB tables efficient before adding streams, the DynamoDB single-table design guide covers access patterns and key design. For event-driven architectures that extend beyond DynamoDB Streams, AWS EventBridge Pipes can route DynamoDB stream events through transformation steps before they reach downstream consumers.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus