AWS CloudTrail Deep Dive: Audit Logging and Security Monitoring

Bits Lovers
Written by Bits Lovers on
AWS CloudTrail Deep Dive: Audit Logging and Security Monitoring

Every API call made to AWS — from the console, CLI, SDK, or another service — generates a CloudTrail event. Who created that security group rule? When was that IAM role modified? Which Lambda function called DeleteItem on that DynamoDB table at 3am? CloudTrail has the answer, if you’ve set it up right and you know where to look.

The default trail AWS creates records management events only, in a single region, with no alerting. For real security monitoring you need a multi-region trail, data events on critical resources, CloudWatch Logs integration for real-time alarms, and either Athena or CloudTrail Lake for ad-hoc investigation. This guide covers all of it.

Event Types

CloudTrail captures three categories of events, with very different pricing:

Management events record control plane operations — IAM changes, security group updates, EC2 instance starts, S3 bucket creation. These are always the most useful for security. AWS includes management events in the first trail in each region at no charge. Additional trails pay $2 per 100,000 management events.

Data events record data plane operations — S3 GetObject/PutObject/DeleteObject, Lambda Invoke, DynamoDB GetItem/PutItem. These generate massive volume for busy services. An S3 bucket serving a few thousand requests per day can produce millions of data events per month. Pricing is $0.10 per 100,000 data events. Be selective about which resources to enable data events on. Amazon Bedrock InvokeModel calls are also recorded as data events — enabling them lets you audit exactly which IAM principal called which model and when, which complements the IAM-based Bedrock cost allocation via CUR 2.0 for a complete picture of AI spend and access.

CloudTrail Insights detect unusual API call rates and error rates compared to historical baselines. When a deployment accidentally runs aws s3 rm --recursive 1,000 times more than normal, Insights fires. It costs $0.35 per 100,000 write management events analyzed.

Creating a Proper Trail

The AWS-created default trail isn’t enough for security. Create an organization-level or multi-region trail manually with proper controls:

# Create an S3 bucket for trail logs with versioning and access logging
aws s3 mb s3://my-cloudtrail-logs-${ACCOUNT_ID}

# Enable versioning (protects against accidental deletion)
aws s3api put-bucket-versioning \
  --bucket my-cloudtrail-logs-${ACCOUNT_ID} \
  --versioning-configuration Status=Enabled

# Bucket policy required by CloudTrail
cat > /tmp/cloudtrail-bucket-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AWSCloudTrailAclCheck",
      "Effect": "Allow",
      "Principal": {"Service": "cloudtrail.amazonaws.com"},
      "Action": "s3:GetBucketAcl",
      "Resource": "arn:aws:s3:::my-cloudtrail-logs-ACCOUNT_ID"
    },
    {
      "Sid": "AWSCloudTrailWrite",
      "Effect": "Allow",
      "Principal": {"Service": "cloudtrail.amazonaws.com"},
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-cloudtrail-logs-ACCOUNT_ID/AWSLogs/ACCOUNT_ID/*",
      "Condition": {
        "StringEquals": {"s3:x-amz-acl": "bucket-owner-full-control"}
      }
    }
  ]
}
EOF

aws s3api put-bucket-policy \
  --bucket my-cloudtrail-logs-${ACCOUNT_ID} \
  --policy file:///tmp/cloudtrail-bucket-policy.json

# Create the trail
aws cloudtrail create-trail \
  --name main-trail \
  --s3-bucket-name my-cloudtrail-logs-${ACCOUNT_ID} \
  --is-multi-region-trail \
  --include-global-service-events \
  --enable-log-file-validation \
  --kms-key-id arn:aws:kms:us-east-1:${ACCOUNT_ID}:key/YOUR-KEY-ID \
  --cloud-watch-logs-log-group-arn arn:aws:logs:us-east-1:${ACCOUNT_ID}:log-group:CloudTrail/main-trail:* \
  --cloud-watch-logs-role-arn arn:aws:iam::${ACCOUNT_ID}:role/CloudTrailCWLogsRole

# Start logging
aws cloudtrail start-logging --name main-trail

--is-multi-region-trail captures events from all regions in a single trail. Without this, API calls in us-west-2 aren’t captured if your trail is in us-east-1. --include-global-service-events captures IAM, STS, and CloudFront events which are global and not tied to a specific region.

--enable-log-file-validation creates a digest file every hour with SHA-256 hashes of all log files delivered. This proves log files haven’t been tampered with — critical for compliance audits where you need to demonstrate the integrity of audit logs.

Adding Data Events

Enable data events only for resources where the audit trail is worth the cost:

# Enable data events for a specific S3 bucket (high-value data)
aws cloudtrail put-event-selectors \
  --trail-name main-trail \
  --event-selectors '[
    {
      "ReadWriteType": "All",
      "IncludeManagementEvents": true,
      "DataResources": [
        {
          "Type": "AWS::S3::Object",
          "Values": [
            "arn:aws:s3:::my-sensitive-data-bucket/",
            "arn:aws:s3:::my-pci-data-bucket/"
          ]
        },
        {
          "Type": "AWS::Lambda::Function",
          "Values": ["arn:aws:lambda:us-east-1:123456789012:function:payment-processor"]
        }
      ]
    }
  ]'

# Advanced event selectors give more control (filter out noisy S3 health checks)
aws cloudtrail put-event-selectors \
  --trail-name main-trail \
  --advanced-event-selectors '[
    {
      "Name": "S3-sensitive-buckets-write-only",
      "FieldSelectors": [
        {"Field": "eventCategory", "Equals": ["Data"]},
        {"Field": "resources.type", "Equals": ["AWS::S3::Object"]},
        {"Field": "readOnly", "Equals": ["false"]},
        {"Field": "resources.ARN", "StartsWith": [
          "arn:aws:s3:::my-sensitive-data-bucket/",
          "arn:aws:s3:::my-pci-data-bucket/"
        ]}
      ]
    }
  ]'

Advanced event selectors (the second form) let you filter by readOnly, eventName, and other fields. Excluding readOnly: true cuts data event volume roughly in half for S3 buckets with heavy GET traffic, while still capturing all writes and deletes.

CloudWatch Logs: Real-Time Alerting

CloudTrail alone is reactive — you search after something happens. Integrating with CloudWatch Logs lets you build metric filters that fire alarms on security events as they happen.

# Create CloudWatch Log Group
aws logs create-log-group --log-group-name CloudTrail/main-trail

# IAM role allowing CloudTrail to write to CloudWatch Logs
aws iam create-role \
  --role-name CloudTrailCWLogsRole \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"cloudtrail.amazonaws.com"},"Action":"sts:AssumeRole"}]}'

aws iam put-role-policy \
  --role-name CloudTrailCWLogsRole \
  --policy-name CloudTrailCWLogsPolicy \
  --policy-document '{
    "Version":"2012-10-17",
    "Statement":[{
      "Effect":"Allow",
      "Action":["logs:CreateLogStream","logs:PutLogEvents"],
      "Resource":"arn:aws:logs:us-east-1:*:log-group:CloudTrail/main-trail:*"
    }]
  }'

With CloudTrail writing to CloudWatch Logs, create metric filters for critical events:

# 1. Root account usage
aws logs put-metric-filter \
  --log-group-name CloudTrail/main-trail \
  --filter-name RootAccountUsage \
  --filter-pattern '{ $.userIdentity.type = "Root" && $.userIdentity.invokedBy NOT EXISTS && $.eventType != "AwsServiceEvent" }' \
  --metric-transformations metricName=RootAccountUsageCount,metricNamespace=CloudTrailMetrics,metricValue=1

# 2. Unauthorized API calls
aws logs put-metric-filter \
  --log-group-name CloudTrail/main-trail \
  --filter-name UnauthorizedAPICalls \
  --filter-pattern '{ ($.errorCode = "AccessDenied") || ($.errorCode = "*UnauthorizedAccess*") }' \
  --metric-transformations metricName=UnauthorizedAPICallsCount,metricNamespace=CloudTrailMetrics,metricValue=1

# 3. IAM policy changes
aws logs put-metric-filter \
  --log-group-name CloudTrail/main-trail \
  --filter-name IAMPolicyChanges \
  --filter-pattern '{ ($.eventName=DeleteGroupPolicy) || ($.eventName=DeleteRolePolicy) || ($.eventName=DeleteUserPolicy) || ($.eventName=PutGroupPolicy) || ($.eventName=PutRolePolicy) || ($.eventName=PutUserPolicy) || ($.eventName=CreatePolicy) || ($.eventName=DeletePolicy) || ($.eventName=AttachRolePolicy) || ($.eventName=DetachRolePolicy) }' \
  --metric-transformations metricName=IAMPolicyChangesCount,metricNamespace=CloudTrailMetrics,metricValue=1

# 4. Security group changes
aws logs put-metric-filter \
  --log-group-name CloudTrail/main-trail \
  --filter-name SecurityGroupChanges \
  --filter-pattern '{ ($.eventName = AuthorizeSecurityGroupIngress) || ($.eventName = RevokeSecurityGroupIngress) || ($.eventName = AuthorizeSecurityGroupEgress) || ($.eventName = CreateSecurityGroup) || ($.eventName = DeleteSecurityGroup) }' \
  --metric-transformations metricName=SecurityGroupChangesCount,metricNamespace=CloudTrailMetrics,metricValue=1

# Create alarm for root usage (should NEVER happen in production)
aws cloudwatch put-metric-alarm \
  --alarm-name RootAccountUsageAlarm \
  --alarm-description "Root account was used — investigate immediately" \
  --metric-name RootAccountUsageCount \
  --namespace CloudTrailMetrics \
  --statistic Sum \
  --period 300 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:security-alerts \
  --treat-missing-data notBreaching

The root account alarm threshold is 1 — any root usage fires an alert. In a well-managed AWS account, the root account should never be used operationally. A root login is either a compromise or a procedural violation that needs immediate investigation either way.

Querying with Athena

CloudTrail logs land in S3 as gzipped JSON. Athena lets you query them with SQL without downloading anything:

-- Create Athena table over CloudTrail S3 logs
CREATE EXTERNAL TABLE cloudtrail_logs (
    eventVersion STRING,
    userIdentity STRUCT<
        type: STRING,
        principalId: STRING,
        arn: STRING,
        accountId: STRING,
        userName: STRING
    >,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIPAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestParameters STRING,
    responseElements STRING,
    requestId STRING,
    eventId STRING,
    resources ARRAY<STRUCT<ARN: STRING, accountId: STRING, type: STRING>>,
    eventType STRING,
    recipientAccountId STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://my-cloudtrail-logs-123456789012/AWSLogs/123456789012/CloudTrail/';
-- Find all IAM changes in the last 7 days
SELECT eventTime, userIdentity.userName, eventName, requestParameters
FROM cloudtrail_logs
WHERE eventSource = 'iam.amazonaws.com'
  AND eventTime > date_format(date_add('day', -7, current_date), '%Y-%m-%dT%H:%i:%sZ')
ORDER BY eventTime DESC;

-- Find who deleted something in S3 (data events required)
SELECT eventTime, userIdentity.arn, requestParameters
FROM cloudtrail_logs
WHERE eventName = 'DeleteObject'
  AND eventTime > '2026-07-01T00:00:00Z'
ORDER BY eventTime DESC;

-- Find failed API calls by IP address (potential scanning/bruteforce)
SELECT sourceIPAddress, COUNT(*) as failed_calls, array_agg(DISTINCT eventName) as events
FROM cloudtrail_logs
WHERE errorCode IN ('AccessDenied', 'AuthFailure', 'UnauthorizedOperation')
  AND eventTime > date_format(date_add('day', -1, current_date), '%Y-%m-%dT%H:%i:%sZ')
GROUP BY sourceIPAddress
HAVING COUNT(*) > 50
ORDER BY failed_calls DESC;

Athena charges $5 per TB of data scanned. CloudTrail logs compress well — a year of management events for a medium account is typically 5-20 GB compressed. Partition the table by year/month/day to avoid scanning the full history:

-- Partitioned table (more efficient)
CREATE EXTERNAL TABLE cloudtrail_logs_partitioned (
    -- same columns as above
)
PARTITIONED BY (region string, year string, month string, day string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
...

-- Add partition for a specific date (or use MSCK REPAIR TABLE to auto-discover)
ALTER TABLE cloudtrail_logs_partitioned ADD PARTITION (
    region='us-east-1', year='2026', month='07', day='08'
) LOCATION 's3://my-cloudtrail-logs-123456789012/AWSLogs/123456789012/CloudTrail/us-east-1/2026/07/08/';

CloudTrail Lake

CloudTrail Lake (launched in 2022) is an alternative to S3+Athena. It ingests events into a managed data store optimized for SQL queries, with 7-year retention available and no need to manage S3, Athena tables, or partitioning.

# Create an event data store
aws cloudtrail create-event-data-store \
  --name my-event-store \
  --retention-period 365 \
  --multi-region-enabled \
  --organization-enabled \
  --termination-protection-enabled

# Query with SQL
aws cloudtrail start-query \
  --query-statement "
    SELECT eventTime, userIdentity.principalId, eventName, awsRegion, errorCode
    FROM my-event-store-id
    WHERE eventTime > '2026-07-01 00:00:00'
      AND eventSource = 'iam.amazonaws.com'
    ORDER BY eventTime DESC
    LIMIT 100
  "

CloudTrail Lake’s SQL is more capable than Athena’s for CloudTrail queries — it understands nested fields like userIdentity.principalId directly without the STRUCT syntax. The trade-off is cost: CloudTrail Lake charges $0.035 per GB ingested for event storage plus $0.005 per GB scanned for queries. For large accounts with heavy event volume, S3+Athena is considerably cheaper.

Log Integrity Validation

The log file validation feature creates digest files every hour. Each digest contains the SHA-256 of every log file delivered in that hour plus a signature using CloudTrail’s private key. Validating integrity proves the logs weren’t modified after delivery:

# Validate logs for a specific time range
aws cloudtrail validate-logs \
  --trail-arn arn:aws:cloudtrail:us-east-1:123456789012:trail/main-trail \
  --start-time 2026-07-01T00:00:00Z \
  --end-time 2026-07-08T00:00:00Z \
  --verbose

# Output shows each file: VALID or INVALID
# An INVALID file means tampering occurred

This is the feature compliance auditors actually care about. “Prove your audit logs haven’t been modified” is a common SOC 2 and PCI requirement. The digest chain validation is cryptographic proof.

For the broader AWS security and compliance picture, the AWS IAM best practices guide covers the IAM policy changes you’d be alerting on here. The CloudWatch deep dive covers the metrics and alarms side — CloudTrail feeds the security events while CloudWatch handles application performance.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus