Prometheus and Grafana on EKS: Kubernetes Monitoring from Scratch

Written by Bits Lovers on 10 Apr 2026

Prometheus and Grafana on EKS: Kubernetes Monitoring from Scratch

The kube-prometheus-stack Helm chart installs Prometheus, Alertmanager, Grafana, and a collection of default Kubernetes dashboards in about five minutes. That’s the fastest path to useful EKS monitoring. The harder part is what comes after: understanding what the metrics mean, configuring scraping for your own services, setting up alerts that fire on real problems rather than noise, and deciding whether Amazon Managed Prometheus and Managed Grafana are worth the cost over self-managed.

This guide covers installing kube-prometheus-stack, writing ServiceMonitors for your applications, building PodMonitor and PrometheusRule resources, configuring Alertmanager for SNS and Slack, using IRSA with Amazon Managed Prometheus, and the key metrics every EKS cluster should alert on.

Installing kube-prometheus-stack

The kube-prometheus-stack Helm chart from the prometheus-community repository is the standard starting point. It bundles everything — don’t install Prometheus and Grafana separately.

# Add the Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install with persistent storage enabled
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=15d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=gp3 \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
  --set grafana.persistence.enabled=true \
  --set grafana.persistence.storageClassName=gp3 \
  --set grafana.persistence.size=10Gi \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=gp3 \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi \
  --version 58.x.x

By default kube-prometheus-stack uses emptyDir storage — every pod restart loses all metrics history. For any real use, specify a storage class. On EKS, gp3 requires the EBS CSI driver add-on to be installed on the cluster. If it’s not installed yet:

# Install EBS CSI driver add-on
aws eks create-addon \
  --cluster-name my-cluster \
  --addon-name aws-ebs-csi-driver \
  --service-account-role-arn arn:aws:iam::123456789012:role/AmazonEKS_EBS_CSI_DriverRole

The CSI driver needs an IAM role with the AmazonEBSCSIDriverPolicy managed policy, created with IRSA (IAM Roles for Service Accounts).

After installing, check all pods are running:

kubectl get pods -n monitoring

# Access Grafana locally
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Default credentials: admin / prom-operator
# Change immediately: kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d

The default install includes dashboards for node resource usage, pod CPU/memory, namespace quotas, persistent volumes, API server latency, and kubelet metrics. These cover the infrastructure layer. You need ServiceMonitors for your application metrics.

ServiceMonitors: Scraping Your Applications

A ServiceMonitor tells Prometheus which Services to scrape and on what path/port. The kube-prometheus-stack installs a Prometheus Operator that watches for ServiceMonitor resources and automatically updates Prometheus’s scrape configuration.

# my-api-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-api
  namespace: my-api          # Same namespace as the Service
  labels:
    release: kube-prometheus-stack  # Must match the Prometheus selector
spec:
  selector:
    matchLabels:
      app: my-api            # Matches the Service's labels
  endpoints:
    - port: http             # The named port in the Service
      path: /metrics         # Default Prometheus metrics endpoint
      interval: 30s          # Scrape every 30 seconds
      scrapeTimeout: 10s
  namespaceSelector:
    matchNames:
      - my-api

The critical part that catches everyone: the ServiceMonitor’s release label must match what the Prometheus Operator is configured to watch. The default value is kube-prometheus-stack (whatever you named the Helm release). If Prometheus isn’t picking up your ServiceMonitor, this label mismatch is usually the cause.

# Check what selector the Prometheus Operator is using
kubectl get prometheus -n monitoring kube-prometheus-stack-prometheus \
  -o jsonpath='{.spec.serviceMonitorSelector}' | jq

# Verify your ServiceMonitor is being discovered
kubectl get servicemonitor -n my-api
kubectl describe servicemonitor my-api -n my-api

Your application must expose a /metrics endpoint in Prometheus text format. For Python services, the prometheus_client library handles this:

# Python app with Prometheus metrics
from prometheus_client import Counter, Histogram, Gauge, start_http_server
import time

REQUEST_COUNT = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status_code']
)

REQUEST_LATENCY = Histogram(
    'http_request_duration_seconds',
    'HTTP request latency',
    ['method', 'endpoint'],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)

ACTIVE_CONNECTIONS = Gauge(
    'active_connections',
    'Currently active connections'
)

# In your request handler:
@REQUEST_LATENCY.labels(method='GET', endpoint='/orders').time()
def get_orders():
    REQUEST_COUNT.labels(method='GET', endpoint='/orders', status_code='200').inc()
    return orders

# Start metrics server on port 8081 (separate from app port)
start_http_server(8081)

Expose the metrics port in the Service so the ServiceMonitor can reach it:

apiVersion: v1
kind: Service
metadata:
  name: my-api
  labels:
    app: my-api
spec:
  ports:
    - name: http
      port: 8080
      targetPort: 8080
    - name: metrics
      port: 8081
      targetPort: 8081

PrometheusRules: Alerting

PrometheusRule resources define recording rules and alerting rules. Recording rules pre-compute expensive queries so dashboards load fast. Alerting rules fire when conditions are met.

# my-api-prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-api-alerts
  namespace: my-api
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: my-api.rules
      interval: 30s
      rules:
        # Recording rule: pre-compute request rate
        - record: job:http_requests_total:rate5m
          expr: rate(http_requests_total[5m])

        # Alert: high error rate
        - alert: HighErrorRate
          expr: |
            sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (job)
            /
            sum(rate(http_requests_total[5m])) by (job)
            > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate on "
            description: "Error rate is  for "
            runbook: "https://runbook.example.com/high-error-rate"

        # Alert: high latency (p99 > 2 seconds)
        - alert: HighP99Latency
          expr: |
            histogram_quantile(0.99,
              sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le)
            ) > 2.0
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "P99 latency high for "
            description: "P99 latency is s for "

        # Alert: pod restarts
        - alert: PodCrashLooping
          expr: |
            rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod / is restarting"

The for: 5m clause means the alert must be true for 5 consecutive minutes before firing. Without it, a single blip fires an alert. The for window prevents false alarms from transient spikes.

Alertmanager receives alerts from Prometheus and routes them based on labels. Configure it in the Helm values:

# alertmanager-values.yaml
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    
    route:
      group_by: ['alertname', 'job', 'severity']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'default'
      routes:
        - match:
            severity: critical
          receiver: pagerduty
          continue: true
        - match:
            severity: critical
          receiver: slack-critical
        - match:
            severity: warning
          receiver: slack-warning
    
    receivers:
      - name: default
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/T00000/B00000/XXXXXXXX'
            channel: '#alerts'
            text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
      
      - name: slack-critical
        slack_configs:
          - api_url: '{{ .ExternalURL }}'
            channel: '#alerts-critical'
            title: '[CRITICAL] {{ .GroupLabels.alertname }}'
            text: |
              {{ range .Alerts }}
              *Alert:* {{ .Annotations.summary }}
              *Description:* {{ .Annotations.description }}
              *Runbook:* {{ .Annotations.runbook }}
              {{ end }}
      
      - name: pagerduty
        pagerduty_configs:
          - routing_key: "{{ .SecretKey }}"
            description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'

# Apply the values
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  -n monitoring \
  -f alertmanager-values.yaml

# Test Alertmanager config
kubectl exec -n monitoring \
  $(kubectl get pods -n monitoring -l app.kubernetes.io/name=alertmanager -o name | head -1) \
  -- amtool check-config /etc/alertmanager/config_out/alertmanager.env.yaml

For SNS instead of Slack, use the webhook receiver pointing to a Lambda that publishes to SNS — Alertmanager has no native SNS support. The webhook approach works: Alertmanager sends a JSON POST, the Lambda publishes the formatted message to an SNS topic, SNS delivers to email or other subscribers.

Amazon Managed Prometheus (AMP) with IRSA

Self-managed Prometheus on EKS works fine for single clusters. When you have multiple clusters, or when you need years of retention without managing storage, Amazon Managed Prometheus makes more sense. AMP handles storage, scaling, and high availability — you keep Prometheus as a data collection layer but remote-write to AMP instead of storing locally.

# Create an AMP workspace
aws amp create-workspace --alias my-eks-metrics

# Get the workspace ID and remote write URL
WORKSPACE_ID=$(aws amp list-workspaces --alias my-eks-metrics \
  --query 'workspaces[0].workspaceId' --output text)

REMOTE_WRITE_URL="https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/remote_write"

Prometheus needs an IAM role to authenticate to AMP. Use IRSA:

# Create the IRSA role for Prometheus
eksctl create iamserviceaccount \
  --cluster my-cluster \
  --namespace monitoring \
  --name prometheus-amp \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
  --approve

# Get the role ARN
ROLE_ARN=$(aws iam get-role --role-name eksctl-my-cluster-addon-iamserviceaccount-Role1 \
  --query 'Role.Arn' --output text)

Configure remote write in the Helm values:

# amp-values.yaml
prometheus:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/prometheus-amp-role"
  
  prometheusSpec:
    retention: 2h     # Keep only 2h locally — AMP stores the rest
    
    remoteWrite:
      - url: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-XXXXX/api/v1/remote_write"
        sigv4:
          region: us-east-1
          roleArn: "arn:aws:iam::123456789012:role/prometheus-amp-role"
        queueConfig:
          capacity: 2500
          maxSamplesPerSend: 1000
          batchSendDeadline: 5s

With this config, Prometheus scrapes metrics locally and ships them to AMP. Local retention of 2 hours means you keep only recent data on the EBS volume while AMP retains the full history. Amazon Managed Grafana can then query AMP directly without needing the Prometheus pod at all.

Grafana Dashboards and Data Sources

The kube-prometheus-stack installs Grafana with the Prometheus data source pre-configured. The default dashboards cover most Kubernetes infrastructure metrics. Add custom dashboards as ConfigMaps — Grafana’s sidecar watches for ConfigMaps with the grafana_dashboard: "1" label and imports them automatically:

# custom-dashboard-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: my-api-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  my-api-dashboard.json: |
    {
      "title": "My API",
      "panels": [
        {
          "title": "Request Rate",
          "type": "graph",
          "targets": [{
            "expr": "sum(rate(http_requests_total[5m])) by (endpoint)",
            "legendFormat": ""
          }]
        },
        {
          "title": "P99 Latency",
          "type": "graph",
          "targets": [{
            "expr": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (endpoint, le))",
            "legendFormat": "p99 "
          }]
        },
        {
          "title": "Error Rate",
          "type": "singlestat",
          "targets": [{
            "expr": "sum(rate(http_requests_total{status_code=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))"
          }]
        }
      ]
    }

For production-grade dashboards, import from Grafana’s community library using the dashboard ID. The Kubernetes cluster overview (ID: 15760), node exporter (ID: 1860), and namespace resource usage (ID: 13770) are the most useful starting points. Import them via Grafana’s UI (Dashboards → Import → Enter ID) or as ConfigMaps in CI.

Key EKS Metrics to Alert On

These are the alerts that catch real problems before users notice:

Node-level:

node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.10 — node running out of memory (10% free)
100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85 — node CPU sustained above 85%
(node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.15 — disk getting full

Kubernetes workload:

kube_deployment_status_replicas_unavailable > 0 — deployment has unavailable replicas
kube_horizontalpodautoscaler_status_current_replicas >= kube_horizontalpodautoscaler_spec_max_replicas — HPA at max, can’t scale further
kube_pod_status_phase{phase="Pending"} > 0 for >10 minutes — pod stuck pending (often a node scheduling issue)

API server health:

apiserver_request_duration_seconds{verb="GET",quantile="0.99"} > 1 — API server slow on reads (cascades to everything)
apiserver_request_total{code=~"5.."} rate above baseline — API server returning errors

The HPA max replicas alert is particularly important. When a deployment hits its HPA maximum, it can’t scale further regardless of load. Traffic keeps arriving, the pods get overwhelmed, and you get elevated latency or errors — all without any immediately obvious alarm. This alert surfaces the problem before it becomes an incident.

Self-Managed vs Amazon Managed

Self-managed kube-prometheus-stack costs only EBS storage and compute (roughly $15-30/month for a medium cluster). Amazon Managed Prometheus charges $0.90 per active metric series per month plus $0.03 per GB ingested. A cluster with 50,000 active metrics costs around $45/month in AMP charges alone, before data ingestion.

The break-even point is roughly when operational overhead and high-availability requirements outweigh storage cost. For a single team with one or two clusters, self-managed is almost always cheaper. Multi-cluster environments, or teams without Kubernetes operations expertise, benefit from AMP’s managed availability and cross-cluster aggregation.

For teams already deep in AWS observability, the AWS X-Ray distributed tracing guide covers the request tracing side of the observability stack — Prometheus handles aggregate metrics while X-Ray handles individual request traces. The ArgoCD on EKS guide covers GitOps workflows for deploying the ServiceMonitors and PrometheusRules described here as part of your application’s Helm chart.