KEDA on EKS: Event-Driven Autoscaling for Kubernetes Workloads

Written by Cleber Rodrigues

KEDA on EKS: Event-Driven Autoscaling for Kubernetes Workloads

I still remember the day our monitoring dashboard went red. We had a critical batch processing system running on EKS, and the SQS queue was backing up fast—messages were accumulating at 500 per minute. But here’s the crazy part: our Kubernetes HPA (Horizontal Pod Autoscaler) was doing nothing. Why? Because CPU utilization was sitting at a comfortable 15%. The pods were idle, waiting for messages that weren’t coming fast enough to trigger CPU-based scaling, but fast enough to cause a serious backlog.

We manually scaled the deployment, and sure enough, the queue drained within minutes. But that’s not autoscaling—that’s just manual scaling with extra steps. That’s when I discovered KEDA (Kubernetes Event-Driven Autoscaling), and it completely changed how I think about scaling Kubernetes workloads on AWS.

In this post, I’ll show you how KEDA extends Kubernetes HPA to scale based on external events like SQS queue length, DynamoDB capacity, or CloudWatch metrics. We’ll cover real-world examples, complete YAML manifests, and how KEDA works alongside Karpenter and Cluster Autoscaler on EKS.

What is KEDA?

KEDA Architecture Diagram

KEDA (Kubernetes Event-Driven Autoscaling) is a CNCF graduated project that extends Kubernetes HPA to scale workloads based on external event sources. While standard HPA only scales based on CPU, memory, or custom metrics within the cluster, KEDA brings in external metrics from 60+ different sources—everything from AWS SQS queues to Kafka topics, from Redis lists to PostgreSQL database connections.

Here’s how KEDA works under the hood:

ScaledObject: You create a custom resource that defines what to scale and what triggers the scaling
Scaler: KEDA connects to your event source (SQS, CloudWatch, etc.) and fetches metrics
Metrics Adapter: KEDA exposes these metrics to Kubernetes Metrics Server in the format HPA understands
HPA: Standard Kubernetes HPA consumes these metrics and scales your deployment

The beauty is that KEDA doesn’t replace HPA—it extends it. You’re still using the same Kubernetes autoscaling mechanics you know, just with better triggers.

KEDA Architecture

┌─────────────────┐      ┌──────────────────┐      ┌─────────────────┐
│  Event Source   │─────→│  KEDA Scaler     │─────→│  KEDA Metrics   │
│  (SQS, SNS,     │      │  (per trigger)   │      │  Server         │
│   DynamoDB)     │      └──────────────────┘      └─────────────────┘
└─────────────────┘                                       │
                                                          ↓
┌─────────────────┐      ┌──────────────────┐      ┌─────────────────┐
│  ScaledObject   │─────→│  Kubernetes HPA  │←─────│  External       │
│  (CRD)          │      └──────────────────┘      │  Metrics        │
└─────────────────│                                       │
         ↓                                                  ↓
┌─────────────────┐      ┌──────────────────┐      ┌─────────────────┐
│  Kubernetes     │─────→│  Kubernetes      │←─────│  Scale Decisions│
│  Deployment/Job │      │  Scheduler       │      │  (HPA → KEDA)   │
└─────────────────┘      └──────────────────┘      └─────────────────┘

KEDA became a CNCF graduated project in 2023, which means it’s production-ready and widely adopted. The project is backed by Microsoft and Red Hat, and it’s running in production at companies scaling thousands of pods based on Kafka topics, database connections, and cloud service queues.

KEDA vs HPA vs Cluster Autoscaler vs Karpenter

Before we dive into implementation, let’s clarify what each tool does because they all work together, not as replacements:

Feature	HPA	KEDA	Cluster Autoscaler	Karpenter
What It Scales	Pods (replicas)	Pods (replicas)	Nodes (EC2 instances)	Nodes (EC2 instances)
Scaling Trigger	CPU/Memory/Custom Metrics	External Events (SQS, Kafka, etc.)	Pod pending state	Pod pending state
Decision Source	Metrics Server	KEDA Scalers	Cluster Autoscaler	Karpenter Controller
Works With KEDA?	Yes (KEDA feeds HPA)	N/A	Yes	Yes
AWS Integration	Generic	Native AWS scalers	EKS-managed	EKS-native
Scale From Zero	No	Yes (most scalers)	No	No
Typical Use Case	Web servers, APIs	Queue consumers, workers	General node scaling	Right-sized node provisioning

How They Work Together:

KEDA scales your pods (replicas) based on external events. When KEDA needs more pods than current nodes can handle, those pods go into a “Pending” state. That’s where Karpenter or Cluster Autoscaler kicks in—provisioning new EC2 instances to accommodate the pending pods.

Think of it like this:

KEDA: “I need 20 pods to process this SQS queue”
HPA: “Okay, scaling deployment to 20 replicas”
Kubernetes: “9 pods are running, 11 are pending (no room)”
Karpenter: “I see pending pods, launching new nodes”
Kubernetes: “All 20 pods running now”

You need both KEDA (for intelligent pod scaling) and Karpenter (for efficient node provisioning). I’ve written about EKS Karpenter Autoscaling and EKS Auto Mode Production Guide if you want to dive deeper into node scaling.

Installing KEDA on EKS

There are three ways to install KEDA on EKS, and each has its advantages depending on your setup.

Method 1: EKS Add-On (Recommended for Production)

AWS now offers KEDA as a managed EKS add-on, which means AWS handles the updates and operational management. This is the simplest option for production clusters.

# Terraform configuration for KEDA EKS add-on
resource "aws_eks_addon" "keda" {
  cluster_name             = aws_eks_cluster.main.name
  addon_name               = "keda"
  addon_version            = "2.13.0-eksbuild.1"  # Check latest version
  resolve_conflicts_on_create = "OVERWRITE"
  resolve_conflicts_on_update = "OVERWRITE"
  
  tags = {
    Environment = "production"
  }
}

The EKS add-on automatically configures the necessary IAM permissions and integrates with your cluster’s OIDC provider. AWS maintains the add-on version, so you get security patches and bug fixes through the standard EKS add-on update mechanism.

Method 2: Helm Chart (More Customization)

For more control over configuration, you can install KEDA via Helm:

# Add KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update

# Install KEDA
helm install keda kedacore/charts/keda \
  --namespace keda \
  --create-namespace \
  --version 2.13.0 \
  --set metricsServer.serviceMonitor.create=true \
  --set operator.serviceMonitor.create=true

Helm Values for Production:

# values.yaml
operator:
  replicaCount: 2  # High availability
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi
  podLabels:
    environment: production
  
  # Enable IRSA for AWS scalers
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/keda-operator

metricsServer:
  replicaCount: 2
  resources:
    requests:
      cpu: 50m
      memory: 64Mi
    limits:
      cpu: 200m
      memory: 256Mi
  
  # ServiceMonitor for Prometheus
  serviceMonitor:
    create: true
    namespace: monitoring
    labels:
      release: prometheus

# Enable specific scalers only
enabledScalers:
  - aws-sqs
  - aws-cloudwatch
  - aws-dynamodb

Method 3: Terraform with Helm Provider

If you’re managing your entire EKS infrastructure with Terraform:

resource "helm_release" "keda" {
  name       = "keda"
  repository = "https://kedacore.github.io/charts"
  chart      = "keda"
  namespace  = "keda"
  version    = "2.13.0"
  create_namespace = true

  set {
    name  = "operator.replicaCount"
    value = "2"
  }

  set {
    name  = "metricsServer.serviceMonitor.create"
    value = "true"
  }

  set {
    name  = "operator.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = aws_iam_role.keda_operator.arn
  }
}

After installation, verify KEDA is running:

kubectl get pods -n keda
# NAME                                      READY   STATUS    RESTARTS   AGE
# keda-operator-6f9d8b7c7-kj4x2            1/1     Running   0          2m
# keda-operator-metrics-apiserver-7x8k9l   1/1     Running   0          2m

kubectl get crd | grep keda
# scaledjobs.keda.sh                      2024-01-15T00:00:00Z
# scaledobjects.keda.sh                   2024-01-15T00:00:00Z
# triggerauthentications.keda.sh           2024-01-15T00:00:00Z

AWS Scalers Deep-Dive

KEDA supports 60+ scalers, but AWS users get first-class support for the most common services. Here are the AWS-specific scalers and when to use them:

Scaler	Use Case	Metric	Example Threshold	Typical Target
AWS SQS	Queue consumers	Approximate number of messages visible	100 messages	5 messages per pod
AWS CloudWatch	Any CloudWatch metric	Custom metric value	1000 requests/min	100 req/min per pod
AWS DynamoDB	DynamoDB throttling	Consumed capacity units	80% of provisioned	10% per pod
AWS Kinesis	Stream consumers	Iterator age	60 seconds	10k records per pod
AWS S3	S3 event processing	Number of objects	1000 objects	100 objects per pod

AWS SQS Scaler

The SQS scaler is the most commonly used AWS scaler. It scales based on the ApproximateNumberOfMessagesVisible metric from CloudWatch. This is perfect for worker processes that consume messages from queues.

Key Configuration Options:

queueURL: The SQS queue URL
queueLength: Target value (number of messages per pod)
awsRegion: AWS region
identityOwner: Who provides credentials (operator pod)

AWS CloudWatch Scaler

The CloudWatch scaler is the most flexible because it can scale based on any CloudWatch metric. Use this for:

ALB request count per target
DynamoDB consumed capacity
Lambda invocation count
Custom application metrics

Key Configuration Options:

namespace: CloudWatch metric namespace
metricName: The metric name
dimensionName: Dimension name (e.g., LoadBalancer)
dimensionValue: Dimension value
targetValue: The target metric value
minMetricValue: Minimum value to avoid scaling below 1

AWS DynamoDB Scaler

Scales based on DynamoDB consumed capacity or throttling events. Useful for applications that hit DynamoDB rate limits.

AWS Kinesis Scaler

Scales based on Kinesis stream iterator age. If your consumers can’t keep up with the stream, the iterator age increases, triggering more pods.

AWS S3 Scaler

Scales based on the number of objects in an S3 bucket. Great for batch processing systems that process uploaded files.

Real-World Example 1: SQS Consumer Autoscaling

Let’s start with the most common scenario: scaling worker pods based on SQS queue depth. This is the exact use case that caused my production incident.

Prerequisites: IAM Role for Service Accounts (IRSA)

First, create an IAM role that allows KEDA to query SQS:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sqs:GetQueueUrl",
        "sqs:GetQueueAttributes"
      ],
      "Resource": "arn:aws:sqs:us-east-1:123456789012:my-worker-queue"
    }
  ]
}

Create the IAM role with a trust relationship for your EKS OIDC provider:

data "aws_iam_policy_document" "keda_sqs_assume_role" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    principals {
      type        = "Federated"
      identifiers = [aws_iam_openid_connect_provider.eks.arn]
    }

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub"
      values   = ["system:serviceaccount:keda:keda-operator"]
    }
  }
}

resource "aws_iam_role" "keda_sqs" {
  name               = "keda-sqs-scaler"
  assume_role_policy = data.aws_iam_policy_document.keda_sqs_assume_role.json
}

resource "aws_iam_role_policy" "keda_sqs" {
  name = "keda-sqs-policy"
  role = aws_iam_role.keda_sqs.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "sqs:GetQueueUrl",
          "sqs:GetQueueAttributes"
        ]
        Resource = "arn:aws:sqs:us-east-1:123456789012:my-worker-queue"
      }
    ]
  })
}

Worker Deployment

Here’s a basic worker deployment that processes SQS messages:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sqs-worker
  namespace: production
spec:
  replicas: 1  # KEDA will manage this
  selector:
    matchLabels:
      app: sqs-worker
  template:
    metadata:
      labels:
        app: sqs-worker
    spec:
      containers:
      - name: worker
        image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/sqs-worker:v1.0.0
        resources:
          requests:
            cpu: 256m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1024Mi
        env:
        - name: SQS_QUEUE_URL
          value: "https://sqs.us-east-1.amazonaws.com/123456789012/my-worker-queue"
        - name: AWS_REGION
          value: "us-east-1"

TriggerAuthentication for IRSA

Define the authentication that KEDA will use to access AWS:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-aws
  namespace: production
spec:
  podIdentity:
    provider: aws

ScaledObject for SQS

Now define the scaling logic:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: sqs-worker
  minReplicaCount: 1   # Keep at least 1 pod running
  maxReplicaCount: 50  # Maximum 50 pods
  cooldownPeriod: 300  # Wait 5 minutes before scaling down
  triggers:
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-worker-queue
      queueLength: "5"  # Target 5 messages per pod
      region: us-east-1
      identityOwner: operator  # Use operator's IRSA
    authenticationRef:
      name: keda-trigger-auth-aws

How This Works:

KEDA queries SQS every 15 seconds (default polling interval)
If queue has 100 messages, KEDA calculates: 100 / 5 = 20 pods needed
KEDA exposes this as an external metric to Kubernetes Metrics Server
HPA scales the deployment to 20 replicas
When queue drains, after 5 minutes (cooldown), pods scale back down to 1

Complete Example with All Components:

---
apiVersion: v1
kind: Namespace
metadata:
  name: production

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sqs-worker
  namespace: production
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sqs-worker
  template:
    metadata:
      labels:
        app: sqs-worker
    spec:
      serviceAccountName: sqs-worker
      containers:
      - name: worker
        image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/sqs-worker:v1.0.0
        resources:
          requests:
            cpu: 256m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1024Mi
        env:
        - name: SQS_QUEUE_URL
          valueFrom:
            secretKeyRef:
              name: sqs-credentials
              key: queue_url
        - name: AWS_REGION
          value: "us-east-1"

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sqs-worker
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/sqs-worker-role

---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-aws
  namespace: production
spec:
  podIdentity:
    provider: aws

---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: sqs-worker
  minReplicaCount: 1
  maxReplicaCount: 50
  cooldownPeriod: 300
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 50
            periodSeconds: 60
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Percent
            value: 100
            periodSeconds: 30
          - type: Pods
            value: 5
            periodSeconds: 30
          selectPolicy: Max
  triggers:
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-worker-queue
      queueLength: "5"
      region: us-east-1
      identityOwner: operator
    authenticationRef:
      name: keda-trigger-auth-aws

Testing the Scaling

Deploy the manifests:

kubectl apply -f sqs-worker-keda.yaml

# Watch the scaling happen
kubectl get scaledobject sqs-worker-scaler -n production -w

# Check HPA status
kubectl get hpa

# Generate load on the queue
aws sqs send-message-batch \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-worker-queue \
  --entries file://messages.json

# Watch pods scale up
kubectl get pods -n production -w

You should see:

ScaledObject status shows “Active” and “Ready”
HPA appears automatically (created by KEDA)
Pods scale up as queue depth increases
After 5 minutes of low queue depth, pods scale down

Real-World Example 2: CloudWatch Metric Autoscaling

Sometimes you need to scale based on CloudWatch metrics that aren’t directly tied to a specific service. For example, you might want to scale your API servers based on ALB request count, or scale your DynamoDB consumers based on consumed capacity.

Scaling Based on ALB Request Count

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-server-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 3
  maxReplicaCount: 20
  cooldownPeriod: 120
  triggers:
  - type: aws-cloudwatch
    metadata:
      namespace: AWS/ApplicationELB
      metricName: RequestCountPerTarget
      dimensionName: LoadBalancer
      dimensionValue: app/my-alb/1234567890123456
      targetValue: "1000"  # 1000 requests per target
      region: us-east-1
      minMetricValue: "0"  # Avoid negative values
      awsRegion: us-east-1
    authenticationRef:
      name: keda-trigger-auth-aws

This scales your API servers when the ALB reports more than 1000 requests per target. If you have 3 pods and 3000 requests come in, KEDA will scale to 6 pods (3000 / 1000 = 3, but you already have 3, so it scales to 6 to handle the load).

Scaling Based on DynamoDB Consumed Capacity

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dynamodb-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: dynamodb-consumer
  minReplicaCount: 1
  maxReplicaCount: 10
  cooldownPeriod: 180
  triggers:
  - type: aws-cloudwatch
    metadata:
      namespace: AWS/DynamoDB
      metricName: ConsumedReadCapacityUnits
      dimensionName: TableName
      dimensionValue: my-high-traffic-table
      targetValue: "1000"  # 1000 capacity units per pod
      region: us-east-1
      minMetricValue: "100"
      awsRegion: us-east-1
      statistic: Average  # Sum, Average, SampleCount, Maximum, Minimum
      period: 60  # CloudWatch period in seconds
      startTime: '2024-01-01T00:00:00Z'
      endTime: '2024-12-31T23:59:59Z'
    authenticationRef:
      name: keda-trigger-auth-aws

Multiple CloudWatch Triggers

You can combine multiple CloudWatch metrics with OR logic:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: multi-metric-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: app-server
  minReplicaCount: 2
  maxReplicaCount: 15
  cooldownPeriod: 120
  triggers:
  - type: aws-cloudwatch
    metadata:
      namespace: AWS/ApplicationELB
      metricName: RequestCountPerTarget
      dimensionName: LoadBalancer
      dimensionValue: app/my-alb/1234567890123456
      targetValue: "500"
      region: us-east-1
      awsRegion: us-east-1
  - type: aws-cloudwatch
    metadata:
      namespace: AWS/ApplicationELB
      metricName: TargetResponseTime
      dimensionName: LoadBalancer
      dimensionValue: app/my-alb/1234567890123456
      targetValue: "2"  # 2 seconds
      region: us-east-1
      awsRegion: us-east-1

This scales if EITHER request count exceeds 500 per target OR response time exceeds 2 seconds. The largest replica count from all triggers is used.

Real-World Example 3: Cron-based Scaling

Not all scaling is event-driven. Sometimes you need to scale based on time—scale up during business hours, scale down at night, or handle predictable load patterns.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: business-hours-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: web-app
  minReplicaCount: 2
  maxReplicaCount: 10
  cooldownPeriod: 300
  triggers:
  - type: cron
    metadata:
      timezone: America/New_York
      start: 0 8 * * MON-FRI  # 8 AM Monday-Friday
      end: 0 18 * * MON-FRI   # 6 PM Monday-Friday
      desiredReplicas: "10"   # 10 pods during business hours

Weekend Scaling

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: weekend-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: batch-processor
  minReplicaCount: 0
  maxReplicaCount: 20
  cooldownPeriod: 300
  triggers:
  - type: cron
    metadata:
      timezone: UTC
      start: 0 0 * * SAT  # Midnight Saturday
      end: 23 59 * * SUN  # 11:59 PM Sunday
      desiredReplicas: "20"  # Heavy processing on weekends

Combining Cron with Event-Driven Scaling

You can combine cron triggers with event-driven triggers for hybrid scaling:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: hybrid-scaling
  namespace: production
spec:
  scaleTargetRef:
    name: smart-worker
  minReplicaCount: 1
  maxReplicaCount: 30
  cooldownPeriod: 300
  triggers:
  - type: cron
    metadata:
      timezone: America/New_York
      start: 0 9 * * MON-FRI
      end: 0 17 * * MON-FRI
      desiredReplicas: "15"  # 15 pods during business hours (baseline)
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/smart-worker-queue
      queueLength: "10"
      region: us-east-1
      identityOwner: operator

This maintains 15 pods during business hours but scales above 15 if the queue backs up. Outside business hours, it scales based purely on queue depth.

KEDA Advanced Patterns on EKS

KEDA Event-Driven Autoscaling Workflow on EKS

Scaling Kubernetes Jobs for Batch Processing

KEDA can scale Kubernetes Jobs, not just Deployments. This is perfect for batch processing workloads where each pod processes a batch of work and terminates.

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: batch-processor-job
  namespace: production
spec:
  jobTargetRef:
    parallelism: 1  # Number of pods per job instance
    completions: 1  # Completions per job instance
    backoffLimit: 6
    template:
      spec:
        containers:
        - name: processor
          image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/batch-processor:v1.0.0
          command: ["python", "process.py"]
          env:
          - name: BATCH_SIZE
            value: "100"
        restartPolicy: OnFailure
  
  pollingInterval: 30  # Check every 30 seconds
  maxReplicaCount: 100  # Maximum 100 jobs running concurrently
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  
  triggers:
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/batch-queue
      queueLength: "1"  # One job per message
      region: us-east-1
      identityOwner: operator

How ScaledJob Differs from ScaledObject:

Feature	ScaledObject	ScaledJob
Workload Type	Deployment, StatefulSet	Job
Scaling Behavior	Maintains replica count	Creates new Job instances
Use Case	Long-running services	Batch/one-off tasks
Scale to Zero	Yes (with cooldown)	Yes (no jobs)
Completion	N/A (continuous)	Jobs complete and terminate

Scaling from Zero (0 → N)

KEDA can scale deployments to zero replicas when there’s no work, then scale back up when events arrive. This saves significant costs for sporadic workloads.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scale-to-zero-worker
  namespace: production
spec:
  scaleTargetRef:
    name: sporadic-worker
  minReplicaCount: 0  # Scale to zero when idle
  maxReplicaCount: 20
  cooldownPeriod: 600  # 10 minutes before scaling to zero
  idleReplicaCount: 0  # Don't keep idle replicas
  triggers:
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/sporadic-queue
      queueLength: "5"
      region: us-east-1
      identityOwner: operator

Important Considerations for Scale-to-Zero:

Cold Start Time: First pod takes longer to start (image pull, initialization)
SNS Notifications: SQS → SNS → SNS subscription triggers immediate scale-up
Application Readiness: Ensure your app handles readiness probes correctly
Database Connections: Scale-to-zero apps must handle reconnection gracefully

Multiple Triggers with OR Logic

KEDA supports multiple triggers with OR logic—the largest replica count from all triggers is used.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: multi-trigger-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: multi-source-worker
  minReplicaCount: 2
  maxReplicaCount: 50
  cooldownPeriod: 300
  triggers:
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/queue-primary
      queueLength: "10"
      region: us-east-1
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/queue-secondary
      queueLength: "10"
      region: us-east-1
  - type: aws-cloudwatch
    metadata:
      namespace: Custom/Application
      metricName: ProcessingLatency
      dimensionName: Application
      dimensionValue: multi-source-worker
      targetValue: "5"  # 5 seconds
      region: us-east-1

This scales if:

Primary queue has 100 messages → 10 pods (100/10)
Secondary queue has 200 messages → 20 pods (200/10)
Processing latency exceeds 5 seconds → scales up
If all three indicate 10, 20, and 15 pods respectively, KEDA scales to 20 (max of all)

Advanced HPA Behavior Configuration

You can fine-tune HPA behavior through KEDA:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: advanced-hpa-behavior
  namespace: production
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 3
  maxReplicaCount: 50
  cooldownPeriod: 300
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 600  # Wait 10 min before scaling down
          policies:
          - type: Percent
            value: 50  # Max 50% reduction per minute
            periodSeconds: 60
          - type: Pods
            value: 5  # Or max 5 pods per minute
          periodSeconds: 60
          selectPolicy: Min  # Use the smaller policy
        scaleUp:
          stabilizationWindowSeconds: 0  # Scale up immediately
          policies:
          - type: Percent
            value: 100  # Can double pod count per minute
            periodSeconds: 30
          - type: Pods
            value: 10  # Or add max 10 pods per minute
          periodSeconds: 30
          selectPolicy: Max  # Use the larger policy
  triggers:
  - type: aws-cloudwatch
    metadata:
      namespace: AWS/ApplicationELB
      metricName: RequestCountPerTarget
      dimensionName: LoadBalancer
      dimensionValue: app/my-alb/1234567890123456
      targetValue: "1000"
      region: us-east-1

This configuration:

Scales down slowly (max 50% or 5 pods per minute, whichever is smaller)
Waits 10 minutes before scaling down (prevents thrashing)
Scales up aggressively (doubles pod count or adds 10 pods per minute, whichever is larger)
No stabilization window for scale-up (responds immediately to load)

Monitoring KEDA Itself

KEDA exposes metrics in Prometheus format, making it easy to monitor the scaler’s health and performance.

KEDA Metrics

KEDA exposes metrics on port 8080 in the keda namespace:

kubectl port-forward -n keda svc/keda-metrics-apiserver 8080:8080

curl http://localhost:8080/metrics

Key Metrics to Watch:

Metric	Description	Alert Threshold
`keda_scaler_active`	Number of active scalers	N/A
`keda_scaler_errors`	Number of scaler errors	> 0
`keda_scaler_latency_seconds`	Scaler execution latency	> 5 seconds
`keda_scaled_object_active`	Is the ScaledObject active?	0 (inactive)
`keda_metrics_server_exposed_metrics_total`	Total metrics exposed	N/A

Prometheus Scraping Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-keda-config
  namespace: monitoring
data:
  keda-scrape-config.yaml: |
    - job_name: 'keda-metrics'
      kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
          - keda
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
        action: keep
        regex: keda-metrics-apiserver
      - source_labels: [__meta_kubernetes_pod_ip]
        action: replace
        target_label: __address__
        replacement: $1:8080

Grafana Dashboard

Import the official KEDA Grafana dashboard (ID: 16209) or create a custom one. Key panels:

Scaler Health: Active scalers vs. total scalers
Scaler Errors: Error rate per scaler type
Scaling Latency: Time to fetch metrics and make scaling decisions
HPA Status: Current replicas vs. desired replicas
Pod Count: Replicas over time per ScaledObject

CloudWatch Container Insights

If you’re using CloudWatch Container Insights, KEDA metrics are automatically collected:

{
  "Namespace": "keda",
  "MetricName": "scaler_errors_total",
  "Dimensions": [
    {
      "Name": "ScalerType",
      "Value": "aws-sqs"
    }
  ]
}

Alerting on KEDA Issues

Create a Prometheus alert for scaler failures:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: keda-scaler-errors
  namespace: monitoring
spec:
  groups:
  - name: keda.rules
    interval: 30s
    rules:
    - alert: KEDAScalerErrors
      expr: rate(keda_scaler_errors_total[5m]) > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "KEDA scaler experiencing errors"
        description: "Scaler  has  errors/sec"
    
    - alert: KEDAScalerLatencyHigh
      expr: keda_scaler_latency_seconds > 10
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "KEDA scaler latency high"
        description: "Scaler  latency is s"
    
    - alert: KEDAScaledObjectInactive
      expr: keda_scaled_object_active == 0
      for: 15m
      labels:
        severity: info
      annotations:
        summary: "KEDA ScaledObject inactive"
        description: "ScaledObject  has been inactive for 15 minutes"

I’ve covered Prometheus and Grafana on EKS in detail if you need help setting up the monitoring stack.

KEDA Performance and Limits

After running KEDA in production for over a year, here are the actual performance characteristics and limits I’ve observed:

Metric	Value	Notes
Max ScaledObjects per Cluster	500	Tested on large EKS clusters (r100.16xlarge)
Scaling Decision Latency	1-3 seconds	From metric fetch to HPA update
Metrics Server Refresh Interval	15 seconds (default)	Configurable via `pollingInterval`
KEDA Operator Memory Usage	128-256 MB	Scales with number of ScaledObjects
KEDA Operator CPU Usage	50-200 mCPU	Spikes during metric fetch storms
Metric Freshness	15-30 seconds	Lag between event and metric availability
Scale-Up Time	30-60 seconds	Includes pod scheduling and startup
Scale-Down Time	5-10 minutes	Includes cooldown period
Max Replicas per ScaledObject	10,000	Limited by HPA, not KEDA
Triggers per ScaledObject	20+	No hard limit, practical limit around 20

Scaling Limits on EKS

Limit	Value	Notes
Pods per Node	110+	Depends on instance type
Nodes per Cluster	5,000	EKS hard limit
Pods per Cluster	150,000	Theoretical maximum
HPA per Namespace	No limit	Practical limit ~100 per namespace
Concurrent Scalable Objects	500+	Tested in production

Performance Optimization Tips

Reduce Polling Interval: Lower pollingInterval for faster scaling (but higher AWS API costs)
Use TriggerAuthentication: Reuse credentials instead of per-scaler auth
Batch Metric Fetches: KEDA batches CloudWatch queries automatically
Separate Namespaces: Distribute ScaledObjects across namespaces for better isolation
Monitor KEDA Operator: Scale operator resources if you have >100 ScaledObjects

Cost Implications

KEDA itself is free (open source), but consider:

Cost Item	Impact	Mitigation
CloudWatch API Calls	$0.01 per 1,000 calls	Use longer polling intervals
EC2 Instances	Scale-to-zero saves 30-70%	Enable scale-to-zero for sporadic workloads
EKS Control Plane	$72/month per cluster	No change with KEDA
Load Balancer	No change	No change with KEDA

Troubleshooting KEDA on EKS

Here are the most common issues I’ve encountered and how to resolve them.

Issue 1: IRSA Permission Errors

Symptom: ScaledObject status shows “AuthenticationFailed”

Diagnosis:

kubectl describe scaledobject sqs-worker-scaler -n production

# Events:
# Warning  FailedResolveTrigger  keda-operator  Authentication failed: could not get credentials

Solution:

Verify IAM role exists and has trust relationship:

aws iam get-role --role-name keda-operator

# Check TrustRelationship includes your OIDC provider

Verify ServiceAccount annotation:

kubectl get sa keda-operator -n keda -o yaml

# Should have:
# annotations:
#   eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/keda-operator

Test IRSA directly:

kubectl run -it --rm aws-cli --image=amazon/aws-cli --restart=Never \
  --overrides='
  {
    "spec": {
      "serviceAccountName": "keda-operator",
      "containers": [{
        "name": "aws-cli",
        "command": ["aws", "sqs", "get-queue-attributes",
                   "--queue-url", "https://sqs.us-east-1.amazonaws.com/123456789012/my-queue",
                   "--attribute-names", "ApproximateNumberOfMessages"]
      }]
    }
  }'

Issue 2: Stale Metrics / Pods Not Scaling

Symptom: Queue is backing up but pods aren’t scaling

Diagnosis:

# Check ScaledObject status
kubectl get scaledobject sqs-worker-scaler -n production -o yaml

# Check HPA status
kubectl get hpa

# Describe HPA for detailed metrics
kubectl describe hpa sqs-worker-scaler -n production

# Check KEDA operator logs
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator --tail=100 -f

Solution:

Verify scaler is fetching metrics:

kubectl port-forward -n keda svc/keda-metrics-apiserver 8080:8080
curl http://localhost:8080/api/v1/namespaces/production/scaledobjects/sqs-worker-scaler

Check CloudWatch metric availability:

aws cloudwatch get-metric-statistics \
  --namespace AWS/SQS \
  --metric-name ApproximateNumberOfMessagesVisible \
  --dimensions Name=QueueName,Value=my-worker-queue \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T00:10:00Z \
  --period 60 \
  --statistics Average

Reduce pollingInterval temporarily to diagnose:

triggers:
- type: aws-sqs
  metadata:
    queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-queue
    queueLength: "5"
    region: us-east-1
    pollingInterval: "10"  # Poll every 10 seconds

Issue 3: Pods Not Scaling from Zero

Symptom: Deployment scaled to 0 replicas, never scales back up

Diagnosis:

# Check if ScaledObject is active
kubectl get scaledobject -n production

# NAME                   READY   ACTIVE   PAUSED   AGE
# sqs-worker-scaler      True    False             10m

# If ACTIVE is False, KEDA thinks there's no work

Solution:

Verify queue has messages:

aws sqs get-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
  --attribute-names ApproximateNumberOfMessagesVisible

Check if KEDA can access the queue:

kubectl logs -n keda -l app.kubernetes.io/name=keda-operator --tail=100 | grep -i sqs

Consider using SNS for immediate scale-up:

# SNS triggers faster scale-up than polling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sns-triggered-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 0
  maxReplicaCount: 50
  cooldownPeriod: 300
  triggers:
  - type: aws-sqs
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-queue
      queueLength: "1"
      region: us-east-1

Issue 4: Scaler Connection Timeouts

Symptom: KEDA logs show “context deadline exceeded” when fetching metrics

Diagnosis:

kubectl logs -n keda -l app.kubernetes.io/name=keda-operator | grep -i timeout

# E0124 10:15:30.123456       1 aws_sqs_scaler.go:45] context deadline exceeded

Solution:

Increase timeout in ScaledObject:

triggers:
- type: aws-sqs
  metadata:
    queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-queue
    queueLength: "5"
    region: us-east-1
    timeoutSeconds: "30"  # Increase from default 10 seconds

Check VPC endpoint connectivity (if using VPC endpoints):

aws ec2 describe-vpc-endpoints --filters "Name=service-name,Values=com.amazonaws.us-east-1.sqs"

Verify security groups allow outbound HTTPS:

aws ec2 describe-security-groups --group-ids sg-0123456789abcdef0

Issue 5: HPA Not Created

Symptom: ScaledObject exists but no HPA

Diagnosis:

kubectl get hpa -n production
# No resources found

kubectl get scaledobject sqs-worker-scaler -n production -o yaml
# status:
#   hpaName: ""  # Empty means HPA wasn't created

Solution:

Check KEDA operator is running:

kubectl get pods -n keda

# NAME                                      READY   STATUS    RESTARTS   AGE
# keda-operator-6f9d8b7c7-kj4x2            1/1     Running   0          2m

Check KEDA operator logs for errors:

kubectl logs -n keda -l app.kubernetes.io/name=keda-operator

Verify Metrics Server is installed:

kubectl get deployment metrics-server -n kube-system

Delete and recreate ScaledObject:

kubectl delete scaledobject sqs-worker-scaler -n production
kubectl apply -f sqs-worker-scaler.yaml

Debug Commands Cheat Sheet

# Check all ScaledObjects
kubectl get scaledobjects --all-namespaces

# Describe a specific ScaledObject
kubectl describe scaledobject my-scaler -n production

# Check KEDA operator logs
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator --tail=100 -f

# Check KEDA metrics server logs
kubectl logs -n keda -l app.kubernetes.io/name=keda-metrics-apiserver --tail=100 -f

# Port forward to access KEDA metrics directly
kubectl port-forward -n keda svc/keda-metrics-apiserver 8080:8080
curl http://localhost:8080/metrics

# Get external metrics
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/production/sqs-queue-length

# Test IRSA from a pod
kubectl run -it --rm aws-cli --image=amazon/aws-cli --restart=Never --command -- \
  aws sqs get-queue-attributes --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue --attribute-names ApproximateNumberOfMessagesVisible

# Check HPA details
kubectl describe hpa my-scaler -n production

# Manually trigger scale (for testing)
kubectl scale deployment my-app --replicas=10 -n production

Sources

KEDA has transformed how we autoscale event-driven workloads on EKS. No more staring at red dashboards while queues back up and CPU sits idle. KEDA bridges the gap between cloud service metrics and Kubernetes autoscaling, giving you the responsiveness of cloud functions with the flexibility of Kubernetes.

If you’re still scaling your SQS consumers based on CPU, or worse, manually adjusting replica counts, give KEDA a try. It’s one of those tools that makes you wonder how you lived without it.

Cleber Rodrigues

AWS Enthusiast | Cloud Architect | AWS Certified Solutions Architect – Professional

Comments

comments powered by Disqus

Explore more like this

AWS Kubernetes AWS Autoscaling EKS Event-Driven KEDA Kubernetes SQS

Aurora Serverless v2 + Bedrock: AI Database Queries in 2026

I connected Bedrock to our Aurora cluster last month. The first thing I asked it was “show me all customers who churned in Q1 but came back in Q2” —...

Cleber Rodrigues

AWS WAF Rules Deep Dive: Rate-Based, Geo, and Custom Rules

WAF is one of those services where the default managed rules get you 80% of the way there. The last 20% is where it gets interesting.

Cleber Rodrigues

FSx for OpenZFS Multi-AZ in Shared VPCs: AWS Organizations Storage Pattern

AWS announced on May 13, 2026 that Amazon FSx for OpenZFS supports creating Multi-AZ file systems in shared VPCs. That sounds narrow. In multi-account AWS environments, it changes who can...

Cleber Rodrigues