Unlock The Power of Amazon S3 With Amazon S3 Inventory

Bits Lovers
Written by Bits Lovers on
Unlock The Power of Amazon S3 With Amazon S3 Inventory

If you’ve been working with S3 for a while, you’ve probably hit a point where you need to audit what’s actually in your buckets. Maybe you’re dealing with compliance requirements, or you need to check replication status across regions. That’s where S3 Inventory comes in handy.

I’ve been using it alongside S3 Batch Operations for some automated cleanup tasks, and it’s saved me a bunch of time compared to writing scripts that call the List API synchronously.

What is Amazon S3 Inventory?

S3 Inventory gives you scheduled reports of everything in your bucket. Instead of writing code to list objects yourself, you can have AWS generate CSV or ORC files containing your objects and their metadata on a daily, weekly, or monthly schedule. The reports land in an S3 bucket you specify.

The ORC format is particularly useful if you’re working with large datasets and want faster query performance when you later analyze the output with tools like Athena. It’s a columnar format, so scanning specific fields is more efficient than parsing CSV rows.

Benefits of S3 Inventory

The main reasons I reach for S3 Inventory:

  • Audit replication and encryption status without building custom tooling
  • Generate compliance reports on a schedule
  • Feed the data into Athena for quick SQL-based analysis
  • Process large-scale data jobs through S3 Batch Operations

Since the reports run on a schedule, you avoid hammering the List API with synchronous calls. AWS handles the heavy lifting and drops the results in your bucket.

How It Works

You configure the inventory by specifying a source bucket, a destination bucket for the reports, the output format (CSV or ORC), and how often you want the report generated.

Once configured, S3 Inventory runs on your schedule and writes manifest files to the destination bucket. These manifest files list all your objects along with their metadata. The reports don’t affect your bucket’s request rate, so your normal operations keep running without interruption.

Inventory List Metadata

Each inventory report includes metadata for every object in the bucket:

  • Key - the object’s name
  • Version ID - useful if you have versioning enabled
  • Last modified date
  • ETag - a hash of the object content
  • Size
  • Storage class
  • Replication status
  • Encryption status - whether SSE-S3, SSE-KMS, or no encryption
  • Owner information - the AWS account that owns the object

The encryption status field is a simple true/false, which makes filtering for non-compliant objects straightforward.

Inventory Consistency

S3 Inventory is consistent with the state of your bucket at the time the report runs. If you have versioning enabled, all versions and delete markers show up in the report. AWS encrypts the manifest files by default before storing them in your destination bucket.

Querying S3 Inventory with Athena

One of the most practical setups is connecting your inventory reports to Athena. Point Athena at the destination bucket, specify the format (CSV or ORC), and you can run SQL queries against the data immediately.

This is useful for finding objects that don’t have encryption enabled, identifying objects in the wrong storage class, or generating compliance reports for auditors. Much faster than writing custom parsers.

Where Your Inventory List Goes

When S3 publishes a new inventory, the manifest files appear at this location in your destination bucket:

<bucket_name>/<prefix>/<date>/<hour>/

Navigate there to download the files, or access them programmatically through the AWS SDKs or CLI.

Integrations

S3 Inventory works well with several other AWS services:

  • Athena - SQL queries on inventory data
  • Amazon EMR - large-scale data processing
  • AWS Glue - data catalog and ETL
  • Amazon Redshift - data warehousing
  • Amazon S3 Analytics - storage analysis
  • Lambda - trigger workflows based on inventory results

You can also feed the inventory into S3 Batch Operations to take action on thousands of objects at once, like changing storage classes or updating encryption.

Empty Version ID Strings

One quirk I’ve run into: if you’re using versioned buckets, objects created before versioning was enabled will have empty version IDs represented as "". Some downstream tools don’t handle this gracefully.

The fix is to use Athena to convert those empty strings to NULL values in your queries. This makes the data cleaner for applications that expect proper NULL handling.

Wrapping Up

S3 Inventory won’t magically solve all your storage auditing problems, but it’s a solid, low-effort way to get regular reports on what’s in your buckets. For compliance work, automation pipelines, or just understanding your data at scale, it’s worth setting up.

If you’re preparing for audits or need to regularly report on encryption and replication status, configure an inventory and point it at Athena. You’ll have SQL-level visibility without building custom listing infrastructure.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus