Prometheus and Grafana on EKS: Kubernetes Monitoring from Scratch
The kube-prometheus-stack Helm chart installs Prometheus, Alertmanager, Grafana, and a collection of default Kubernetes dashboards in about five minutes. That’s the fastest path to useful EKS monitoring. The harder part is what comes after: understanding what the metrics mean, configuring scraping for your own services, setting up alerts that fire on real problems rather than noise, and deciding whether Amazon Managed Prometheus and Managed Grafana are worth the cost over self-managed.
This guide covers installing kube-prometheus-stack, writing ServiceMonitors for your applications, building PodMonitor and PrometheusRule resources, configuring Alertmanager for SNS and Slack, using IRSA with Amazon Managed Prometheus, and the key metrics every EKS cluster should alert on.
Installing kube-prometheus-stack
The kube-prometheus-stack Helm chart from the prometheus-community repository is the standard starting point. It bundles everything — don’t install Prometheus and Grafana separately.
# Add the Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install with persistent storage enabled
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=gp3 \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
--set grafana.persistence.enabled=true \
--set grafana.persistence.storageClassName=gp3 \
--set grafana.persistence.size=10Gi \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=gp3 \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi \
--version 58.x.x
By default kube-prometheus-stack uses emptyDir storage — every pod restart loses all metrics history. For any real use, specify a storage class. On EKS, gp3 requires the EBS CSI driver add-on to be installed on the cluster. If it’s not installed yet:
# Install EBS CSI driver add-on
aws eks create-addon \
--cluster-name my-cluster \
--addon-name aws-ebs-csi-driver \
--service-account-role-arn arn:aws:iam::123456789012:role/AmazonEKS_EBS_CSI_DriverRole
The CSI driver needs an IAM role with the AmazonEBSCSIDriverPolicy managed policy, created with IRSA (IAM Roles for Service Accounts).
After installing, check all pods are running:
kubectl get pods -n monitoring
# Access Grafana locally
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Default credentials: admin / prom-operator
# Change immediately: kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d
The default install includes dashboards for node resource usage, pod CPU/memory, namespace quotas, persistent volumes, API server latency, and kubelet metrics. These cover the infrastructure layer. You need ServiceMonitors for your application metrics.
ServiceMonitors: Scraping Your Applications
A ServiceMonitor tells Prometheus which Services to scrape and on what path/port. The kube-prometheus-stack installs a Prometheus Operator that watches for ServiceMonitor resources and automatically updates Prometheus’s scrape configuration.
# my-api-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-api
namespace: my-api # Same namespace as the Service
labels:
release: kube-prometheus-stack # Must match the Prometheus selector
spec:
selector:
matchLabels:
app: my-api # Matches the Service's labels
endpoints:
- port: http # The named port in the Service
path: /metrics # Default Prometheus metrics endpoint
interval: 30s # Scrape every 30 seconds
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- my-api
The critical part that catches everyone: the ServiceMonitor’s release label must match what the Prometheus Operator is configured to watch. The default value is kube-prometheus-stack (whatever you named the Helm release). If Prometheus isn’t picking up your ServiceMonitor, this label mismatch is usually the cause.
# Check what selector the Prometheus Operator is using
kubectl get prometheus -n monitoring kube-prometheus-stack-prometheus \
-o jsonpath='{.spec.serviceMonitorSelector}' | jq
# Verify your ServiceMonitor is being discovered
kubectl get servicemonitor -n my-api
kubectl describe servicemonitor my-api -n my-api
Your application must expose a /metrics endpoint in Prometheus text format. For Python services, the prometheus_client library handles this:
# Python app with Prometheus metrics
from prometheus_client import Counter, Histogram, Gauge, start_http_server
import time
REQUEST_COUNT = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status_code']
)
REQUEST_LATENCY = Histogram(
'http_request_duration_seconds',
'HTTP request latency',
['method', 'endpoint'],
buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)
ACTIVE_CONNECTIONS = Gauge(
'active_connections',
'Currently active connections'
)
# In your request handler:
@REQUEST_LATENCY.labels(method='GET', endpoint='/orders').time()
def get_orders():
REQUEST_COUNT.labels(method='GET', endpoint='/orders', status_code='200').inc()
return orders
# Start metrics server on port 8081 (separate from app port)
start_http_server(8081)
Expose the metrics port in the Service so the ServiceMonitor can reach it:
apiVersion: v1
kind: Service
metadata:
name: my-api
labels:
app: my-api
spec:
ports:
- name: http
port: 8080
targetPort: 8080
- name: metrics
port: 8081
targetPort: 8081
PrometheusRules: Alerting
PrometheusRule resources define recording rules and alerting rules. Recording rules pre-compute expensive queries so dashboards load fast. Alerting rules fire when conditions are met.
# my-api-prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-api-alerts
namespace: my-api
labels:
release: kube-prometheus-stack
spec:
groups:
- name: my-api.rules
interval: 30s
rules:
# Recording rule: pre-compute request rate
- record: job:http_requests_total:rate5m
expr: rate(http_requests_total[5m])
# Alert: high error rate
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (job)
/
sum(rate(http_requests_total[5m])) by (job)
> 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate on "
description: "Error rate is for "
runbook: "https://runbook.example.com/high-error-rate"
# Alert: high latency (p99 > 2 seconds)
- alert: HighP99Latency
expr: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le)
) > 2.0
for: 10m
labels:
severity: warning
annotations:
summary: "P99 latency high for "
description: "P99 latency is s for "
# Alert: pod restarts
- alert: PodCrashLooping
expr: |
rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Pod / is restarting"
The for: 5m clause means the alert must be true for 5 consecutive minutes before firing. Without it, a single blip fires an alert. The for window prevents false alarms from transient spikes.
Alertmanager: Routing to SNS and Slack
Alertmanager receives alerts from Prometheus and routes them based on labels. Configure it in the Helm values:
# alertmanager-values.yaml
alertmanager:
config:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'job', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'default'
routes:
- match:
severity: critical
receiver: pagerduty
continue: true
- match:
severity: critical
receiver: slack-critical
- match:
severity: warning
receiver: slack-warning
receivers:
- name: default
slack_configs:
- api_url: 'https://hooks.slack.com/services/T00000/B00000/XXXXXXXX'
channel: '#alerts'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
- name: slack-critical
slack_configs:
- api_url: '{{ .ExternalURL }}'
channel: '#alerts-critical'
title: '[CRITICAL] {{ .GroupLabels.alertname }}'
text: |
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }}
*Description:* {{ .Annotations.description }}
*Runbook:* {{ .Annotations.runbook }}
{{ end }}
- name: pagerduty
pagerduty_configs:
- routing_key: "{{ .SecretKey }}"
description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'
# Apply the values
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
-n monitoring \
-f alertmanager-values.yaml
# Test Alertmanager config
kubectl exec -n monitoring \
$(kubectl get pods -n monitoring -l app.kubernetes.io/name=alertmanager -o name | head -1) \
-- amtool check-config /etc/alertmanager/config_out/alertmanager.env.yaml
For SNS instead of Slack, use the webhook receiver pointing to a Lambda that publishes to SNS — Alertmanager has no native SNS support. The webhook approach works: Alertmanager sends a JSON POST, the Lambda publishes the formatted message to an SNS topic, SNS delivers to email or other subscribers.
Amazon Managed Prometheus (AMP) with IRSA
Self-managed Prometheus on EKS works fine for single clusters. When you have multiple clusters, or when you need years of retention without managing storage, Amazon Managed Prometheus makes more sense. AMP handles storage, scaling, and high availability — you keep Prometheus as a data collection layer but remote-write to AMP instead of storing locally.
# Create an AMP workspace
aws amp create-workspace --alias my-eks-metrics
# Get the workspace ID and remote write URL
WORKSPACE_ID=$(aws amp list-workspaces --alias my-eks-metrics \
--query 'workspaces[0].workspaceId' --output text)
REMOTE_WRITE_URL="https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/remote_write"
Prometheus needs an IAM role to authenticate to AMP. Use IRSA:
# Create the IRSA role for Prometheus
eksctl create iamserviceaccount \
--cluster my-cluster \
--namespace monitoring \
--name prometheus-amp \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
--approve
# Get the role ARN
ROLE_ARN=$(aws iam get-role --role-name eksctl-my-cluster-addon-iamserviceaccount-Role1 \
--query 'Role.Arn' --output text)
Configure remote write in the Helm values:
# amp-values.yaml
prometheus:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/prometheus-amp-role"
prometheusSpec:
retention: 2h # Keep only 2h locally — AMP stores the rest
remoteWrite:
- url: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-XXXXX/api/v1/remote_write"
sigv4:
region: us-east-1
roleArn: "arn:aws:iam::123456789012:role/prometheus-amp-role"
queueConfig:
capacity: 2500
maxSamplesPerSend: 1000
batchSendDeadline: 5s
With this config, Prometheus scrapes metrics locally and ships them to AMP. Local retention of 2 hours means you keep only recent data on the EBS volume while AMP retains the full history. Amazon Managed Grafana can then query AMP directly without needing the Prometheus pod at all.
Grafana Dashboards and Data Sources
The kube-prometheus-stack installs Grafana with the Prometheus data source pre-configured. The default dashboards cover most Kubernetes infrastructure metrics. Add custom dashboards as ConfigMaps — Grafana’s sidecar watches for ConfigMaps with the grafana_dashboard: "1" label and imports them automatically:
# custom-dashboard-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: my-api-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
my-api-dashboard.json: |
{
"title": "My API",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [{
"expr": "sum(rate(http_requests_total[5m])) by (endpoint)",
"legendFormat": ""
}]
},
{
"title": "P99 Latency",
"type": "graph",
"targets": [{
"expr": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (endpoint, le))",
"legendFormat": "p99 "
}]
},
{
"title": "Error Rate",
"type": "singlestat",
"targets": [{
"expr": "sum(rate(http_requests_total{status_code=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))"
}]
}
]
}
For production-grade dashboards, import from Grafana’s community library using the dashboard ID. The Kubernetes cluster overview (ID: 15760), node exporter (ID: 1860), and namespace resource usage (ID: 13770) are the most useful starting points. Import them via Grafana’s UI (Dashboards → Import → Enter ID) or as ConfigMaps in CI.
Key EKS Metrics to Alert On
These are the alerts that catch real problems before users notice:
Node-level:
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.10— node running out of memory (10% free)100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85— node CPU sustained above 85%(node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.15— disk getting full
Kubernetes workload:
kube_deployment_status_replicas_unavailable > 0— deployment has unavailable replicaskube_horizontalpodautoscaler_status_current_replicas >= kube_horizontalpodautoscaler_spec_max_replicas— HPA at max, can’t scale furtherkube_pod_status_phase{phase="Pending"} > 0for >10 minutes — pod stuck pending (often a node scheduling issue)
API server health:
apiserver_request_duration_seconds{verb="GET",quantile="0.99"} > 1— API server slow on reads (cascades to everything)apiserver_request_total{code=~"5.."}rate above baseline — API server returning errors
The HPA max replicas alert is particularly important. When a deployment hits its HPA maximum, it can’t scale further regardless of load. Traffic keeps arriving, the pods get overwhelmed, and you get elevated latency or errors — all without any immediately obvious alarm. This alert surfaces the problem before it becomes an incident.
Self-Managed vs Amazon Managed
Self-managed kube-prometheus-stack costs only EBS storage and compute (roughly $15-30/month for a medium cluster). Amazon Managed Prometheus charges $0.90 per active metric series per month plus $0.03 per GB ingested. A cluster with 50,000 active metrics costs around $45/month in AMP charges alone, before data ingestion.
The break-even point is roughly when operational overhead and high-availability requirements outweigh storage cost. For a single team with one or two clusters, self-managed is almost always cheaper. Multi-cluster environments, or teams without Kubernetes operations expertise, benefit from AMP’s managed availability and cross-cluster aggregation.
For teams already deep in AWS observability, the AWS X-Ray distributed tracing guide covers the request tracing side of the observability stack — Prometheus handles aggregate metrics while X-Ray handles individual request traces. The ArgoCD on EKS guide covers GitOps workflows for deploying the ServiceMonitors and PrometheusRules described here as part of your application’s Helm chart.
Comments