DynamoDB Single-Table Design: Practical Patterns for AWS

Bits Lovers
Written by Bits Lovers on
DynamoDB Single-Table Design: Practical Patterns for AWS

Rick Houlihan built the methodology for DynamoDB at Amazon before turning it into what the community calls single-table design. The idea is genuinely odd if you’re coming from SQL: put all your application data in one table. Users, orders, products, sessions — not one table per entity, but one table total. DynamoDB’s pricing and performance model rewards this in ways a relational database never could.

This guide covers the practical implementation. Not the theory slides, not the happy-path examples — the actual design decisions, the key overloading patterns, and the situations where single-table design is the wrong choice despite what the conference talks say.

Why DynamoDB Works Differently

Relational databases earn their flexibility at query time. You model your data to represent reality (third normal form, relationships, constraints), then write whatever queries you need against that model. The query engine handles joins, filters, and aggregations at read time. This works beautifully until your dataset gets large enough that the query engine becomes the bottleneck.

DynamoDB flips this. All query capability must be designed in at modeling time. You can only query by partition key, optionally with sort key conditions. You can only filter on non-key attributes after DynamoDB reads the items from the partition — and you pay for those reads regardless of whether they pass the filter. There are no joins. The only way to query data efficiently is to have planned for that access pattern when you designed the table.

This sounds like a constraint, and it is. But it’s also what makes DynamoDB capable of single-millisecond response times at any scale. There’s no query planner overhead, no index scans, no joins — just a direct key lookup into a B-tree partition. If you’ve modeled your access patterns correctly, every query is efficient by construction.

Core Concepts

Two attributes do all the heavy lifting. The partition key (PK) routes each item to a specific storage partition. Items sharing the same PK land together physically. Add a sort key (SK) and items within that partition are stored in sorted order — which opens up range queries: give me PK = “USER#123” and everything between SK = “ORDER#2024-01” and “ORDER#2024-12”. Without an SK, you get a single item back per PK value. With one, you get a sorted collection that you can slice.

PK + SK together must be unique per item — that pair is the primary key. No two items in the table can share the same PK and SK. If you need to query data by something other than the primary key, you need a secondary index:

Global Secondary Indexes (GSI) have their own PK and SK that are completely independent of the table’s PK/SK. They’re eventually consistent copies of the data stored in a different key structure. You can have up to 20 GSIs per table. They have their own read/write capacity and billing.

Local Secondary Indexes (LSI) share the table’s PK but have a different SK. They must be defined at table creation time (unlike GSIs, which you can add later). LSIs have a 10GB per-partition-key limit. For most designs, GSIs are more flexible.

Storage runs $0.25 per GB per month. The reads and writes are where it gets complicated. Provisioned mode has you pre-declare read and write capacity units and pay per hour for whatever you reserved. On-demand flips that: $1.25 per million writes, $0.25 per million reads, no forecasting required. For traffic that spikes unpredictably, on-demand typically wins — you don’t get throttled because you under-provisioned, and you don’t waste money on capacity that sits idle at 2am.

The Access-Pattern-First Methodology

The single-table design process starts with access patterns, not entities. Before writing any code or creating any table, list every query your application needs to make:

Access Patterns for an E-commerce Application:
1. Get user by userId
2. Get all orders for a user
3. Get order by orderId
4. Get all items in an order
5. Get all orders with status=PENDING (for fulfillment)
6. Get product by productId
7. Get all products in a category

Each of these access patterns maps to a specific key structure. Design the key structure to satisfy all patterns, then figure out how to store the data to fit.

For the e-commerce example, a single-table key structure might look like:

Entity    | PK              | SK
----------|-----------------|------------------
User      | USER#userId     | USER#userId
Order     | USER#userId     | ORDER#orderId
OrderItem | ORDER#orderId   | ITEM#itemId
Product   | PRODUCT#pid     | PRODUCT#pid

The # separator is convention, not required — just a readable delimiter for composite key values.

Key Overloading in Practice

Here’s the part that confuses people at first. Your table has one PK column and one SK column. A user record stores USER#123 in both. An order record stores USER#123 in PK but ORDER#456 in SK. An order item stores something different again. Same column name, completely different content per row, depending on entity type. That’s key overloading — the key attributes aren’t locked to one meaning.

Fetching a user is a point lookup: PK = "USER#123", SK = "USER#123". One item back, no scan.

Getting all orders for that user uses the same partition: PK = "USER#123", SK begins_with "ORDER#". DynamoDB returns every item under that PK with a sort key starting with ORDER#, sorted by the SK value. No second query, no join.

Order items work the same way. Set PK = "ORDER#456" and use SK begins_with "ITEM#" to pull back every line item for that order. The items come back sorted by item ID, because that’s how they’re stored.

This is what makes single-table design feel strange at first. The same attribute PK holds USER#123 for one item and ORDER#456 for another. There’s no type enforcement at the DynamoDB level — your application code reads the EntityType attribute (or infers it from the key prefix) to know what kind of item it received.

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('AppTable')

def get_user_with_orders(user_id: str):
    response = table.query(
        KeyConditionExpression='PK = :pk',
        ExpressionAttributeValues={
            ':pk': f'USER#{user_id}'
        }
    )
    
    items = response['Items']
    user = next((i for i in items if i['SK'].startswith('USER#')), None)
    orders = [i for i in items if i['SK'].startswith('ORDER#')]
    
    return {'user': user, 'orders': orders}

One query returns both the user record and all their orders. No join. No second query. The data physically lives in the same partition, so DynamoDB reads it in one operation.

GSI Overloading for Cross-Entity Queries

Access pattern 5 (get all orders with status=PENDING) doesn’t fit the base table structure — there’s no single partition that contains all pending orders. This is where GSIs come in.

Add GSI1PK and GSI1SK attributes to whichever items should appear in that secondary index. Items without those attributes simply won’t show up in GSI queries — DynamoDB only indexes items that have values for the index’s key attributes. This is called a sparse index, and it’s intentional. Only the items you care about appear in the GSI, which keeps index size down and query costs low.

For pending order access, add GSI attributes to order items:

Order item when PENDING:
  PK = "USER#123"
  SK = "ORDER#456"
  GSI1PK = "STATUS#PENDING"
  GSI1SK = "2024-11-15T10:30:00Z"  # order creation time
  EntityType = "ORDER"
  ... other order attributes ...

Query all pending orders:

response = table.query(
    IndexName='GSI1',
    KeyConditionExpression='GSI1PK = :pk',
    ExpressionAttributeValues={
        ':pk': 'STATUS#PENDING'
    }
)

The issue with this: if you have a high volume of pending orders, you’ve created a hot partition in the GSI. STATUS#PENDING as a partition key concentrates all write and read traffic into one partition. DynamoDB partitions have throughput limits (3,000 RCUs and 1,000 WCUs per second per partition). For high-volume status queries, shard the key: STATUS#PENDING#0 through STATUS#PENDING#9 with a random or hash-based shard assignment, then query all 10 shards.

A Worked Example: Blog Platform

A blog platform needs to handle these access patterns:

  • Get post by postId
  • Get all posts by an author
  • Get all comments on a post
  • Get all posts in a category, sorted by publish date

Table design:

Post    | PK = POST#postId       | SK = POST#postId
Author  | PK = AUTHOR#authorId   | SK = AUTHOR#authorId
Post←   | PK = AUTHOR#authorId   | SK = POST#publishDate#postId
Comment | PK = POST#postId       | SK = COMMENT#timestamp#commentId

GSI1:
Category posts | GSI1PK = CAT#categorySlug | GSI1SK = POST#publishDate#postId

The Post← items (relationship items) store duplicate post data under the author’s partition key. This is denormalization — you’re storing the same data twice so that you can query it two different ways. DynamoDB applications trade storage for query performance. It’s intentional and expected.

Queries:

  1. Get post: PK = "POST#abc", SK = "POST#abc" — point lookup
  2. Get author’s posts: PK = "AUTHOR#xyz", SK begins_with "POST#" — range query
  3. Get post’s comments: PK = "POST#abc", SK begins_with "COMMENT#" — range query
  4. Get posts in category: GSI1 query with GSI1PK = "CAT#tech", sorted by GSI1SK

All four access patterns resolve in one query each. No application-level joins. The trade-off is that publishing a post requires two writes: one for the POST#postId item and one for the AUTHOR#authorId relationship item. DynamoDB transactions (TransactWriteItems) let you make multiple writes atomically — if one fails, all fail — which is how you keep these denormalized copies in sync.

When Single-Table Is the Wrong Answer

Single-table design is not universally correct. There are legitimate cases where multiple tables are the right choice.

If different entities have radically different access patterns and throughput requirements, separate tables let you provision capacity independently. A read-heavy product catalog and a write-heavy order log have no business sharing throughput capacity.

If your team is unfamiliar with DynamoDB modeling, single-table design adds significant cognitive overhead. The key overloading and GSI patterns require everyone touching the data layer to understand the model. A team of engineers comfortable with SQL but new to DynamoDB will make fewer mistakes with a simpler multi-table design, even if it’s less efficient.

If your access patterns are genuinely unknown at design time, start with a table-per-entity structure and migrate later. Single-table design requires knowing your access patterns upfront. Designing a key structure for access patterns you discover later means restructuring your data and backfilling GSI attributes — painful at scale.

If you’re building an internal tool or MVP with a short lifespan, the modeling overhead isn’t worth it. Use a simpler structure or a different database entirely.

Operational Patterns

A few things that trip up DynamoDB implementations in production:

Consumed capacity and monitoring: DynamoDB’s ReturnConsumedCapacity parameter tells you how many RCUs and WCUs a request consumed. Log this in your application. A query that reads 50KB of data before filtering to 1KB result consumed 50KB worth of reads — and you paid for all of it. The CloudWatch alarms guide covers how to set up alarms on ConsumedReadCapacityUnits and ThrottledRequests before they become outages.

Pagination: DynamoDB returns at most 1MB per query call. If your result set is larger, the response includes LastEvaluatedKey. Paginate by passing that key as ExclusiveStartKey on the next call. Forgetting to paginate means silently missing data when your dataset grows past 1MB.

Condition expressions for writes: Use ConditionExpression on writes to implement optimistic locking or uniqueness constraints. attribute_not_exists(PK) on a write prevents accidental overwrites. Without conditions, DynamoDB writes are unconditional — it’ll happily overwrite an existing item with no error.

IAM scoping: DynamoDB IAM policies support condition keys on dynamodb:LeadingKeys — you can restrict a role to only read or write items where the partition key matches a specific value or pattern. This matters for multi-tenant applications where one Lambda function serves multiple customers. The AWS IAM roles and policies guide covers how to write these conditions correctly.

Getting DynamoDB right requires more upfront design work than spinning up a relational database and normalizing your schema. But once the access patterns are modeled correctly, it delivers consistent single-digit millisecond performance at any scale — and that performance doesn’t degrade as your data volume grows.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus