AWS App Mesh Is Dead on September 30, 2026 — Your ECS Service Connect Migration Guide

Bits Lovers
Written by Bits Lovers on
AWS App Mesh Is Dead on September 30, 2026 — Your ECS Service Connect Migration Guide

September 30, 2026. That’s when AWS App Mesh stops running.

Not “reaches end of standard support.” Not “enters maintenance mode.” Stops. After that date, App Mesh resources stop working. AWS has already blocked new customers since September 24, 2024, and will keep security patches running for existing users until the shutdown date. But after September 30, if you haven’t migrated, your service-to-service communication breaks.

Most teams know App Mesh is being shut down. Fewer have actually started the migration. That’s a problem, because this isn’t an in-place upgrade — every ECS service in your App Mesh mesh needs to be recreated from scratch.

You have a few months. Start now.

Why App Mesh Is Being Shut Down

App Mesh was AWS’s attempt at a managed service mesh. Deploy an Envoy proxy as a sidecar in every task. Configure virtual services, virtual nodes, virtual routers, and routes to define how services communicate. Get mutual TLS, circuit breaking, retries, and observability for free.

The problem was the operational overhead. Getting App Mesh configured correctly meant understanding four resource types just to express “service A can call service B.” The Envoy sidecar configuration was complex — the mesh concept was straightforward, but the IAM permissions, proxy configuration blocks, and Cloud Map integration created a real learning curve. Debugging Envoy internals when something went wrong required deep familiarity with a system most ECS teams didn’t want to be experts in.

The replacement is simpler by design. ECS Service Connect solves the same problem — service discovery and reliable service-to-service communication — without requiring you to manage the proxy configuration.

Your Two Paths

For ECS workloads, the path is Amazon ECS Service Connect. AWS introduced it at re:Invent 2022 specifically as the ECS-native replacement. The proxy is fully managed: AWS injects and configures Envoy for you, you never touch the sidecar.

For EKS workloads, the path is Amazon VPC Lattice. It handles service-to-service networking across VPCs and accounts. The migration approach is similar conceptually but uses different tooling. AWS published a separate migration guide for EKS → VPC Lattice.

This guide focuses on ECS → Service Connect. That’s where most App Mesh users are.

How Service Connect Works

Service Connect reduces App Mesh’s four resource types to two concepts: Client services and Server services. A Server service exposes an endpoint. A Client service calls it. A service that does both is a Client/Server. That’s the entire abstraction model.

Under the hood, Service Connect uses a managed Envoy proxy — the Service Connect Agent — that AWS injects and configures automatically. You don’t create virtual nodes, virtual services, or routes. You define the service’s port and DNS name in the ECS service configuration, and ECS handles the proxy setup.

Service Connect uses an HTTP-Only Cloud Map namespace. Service addresses are registered in Cloud Map but not published to Route 53 DNS. This is a key difference from App Mesh, which could use a Private DNS namespace that published to Route 53. If you have services outside your ECS cluster resolving service names via Route 53, you’ll need to account for that during migration.

Services within a Service Connect namespace find each other automatically using the names you configure. Your order service calls http://payment-service:8080. No virtual service ARN, no manual Route 53 record, just the name and port.

Built-in health checks, outlier detection, and retry mechanisms are managed by the Service Connect Agent. You configure the behavior in the service definition; the agent enforces it.

The Hard Part: You Must Recreate Every Service

An ECS service can’t be in both an App Mesh mesh and a Service Connect namespace at the same time. There’s no migration mode, no gradual shift within the same service. You can’t update an existing ECS service from App Mesh to Service Connect — the configurations are mutually exclusive and require recreation.

This means you’ll be running two parallel environments during the cutover: your existing App Mesh services and the new Service Connect versions. Plan for that in your infrastructure sizing and your cost estimates for the migration window.

The recommended approach is blue/green migration:

  1. Create a Service Connect namespace in Cloud Map
  2. For each ECS service, write a new task definition without the App Mesh proxy configuration block, with Service Connect port mappings added
  3. Create a new ECS service using the new task definition, with Service Connect enabled pointing at your namespace
  4. Validate the new service is healthy before touching traffic
  5. Gradually shift traffic using Route 53 weighted routing, ALB weighted target groups, or CloudFront continuous deployment
  6. Once you’re confident in the Service Connect version, cut traffic fully and decommission the App Mesh services

ALB weighted target groups are the most practical approach for most teams. Create two target groups — one pointing at your App Mesh ECS service, one at the Service Connect version — and start at a 90/10 split. Watch your error rates and latency for 24 hours, move to 50/50, watch again, then 100/0. If anything goes wrong, flip back immediately.

For microservices architecture patterns and thinking about service dependencies during the migration sequence, the microservices vs monolithic guide is worth reviewing to confirm you understand your service graph before planning the migration order.

What the Task Definition Change Looks Like

Your current App Mesh task definition includes a proxy configuration block:

{
  "proxyConfiguration": {
    "type": "APPMESH",
    "containerName": "envoy",
    "properties": [
      {"name": "IgnoredUID", "value": "1337"},
      {"name": "ProxyIngressPort", "value": "15000"},
      {"name": "ProxyEgressPort", "value": "15001"},
      {"name": "AppPorts", "value": "8080"},
      {"name": "EgressIgnoredIPs", "value": "169.254.170.2,169.254.169.254"}
    ]
  }
}

You also have an Envoy sidecar container definition alongside your application container. Remove all of that.

The Service Connect configuration goes on the ECS service, not the task definition:

{
  "serviceConnectConfiguration": {
    "enabled": true,
    "namespace": "my-production-namespace",
    "services": [
      {
        "portName": "http",
        "clientAliases": [
          {
            "port": 8080,
            "dnsName": "payment-service"
          }
        ]
      }
    ]
  }
}

Other services in the namespace call this service at http://payment-service:8080. No virtual service. No virtual node. No route. Just the name and the port.

The port mapping in your container definition needs a name field matching the portName in Service Connect:

{
  "portMappings": [
    {
      "containerPort": 8080,
      "protocol": "tcp",
      "name": "http",
      "appProtocol": "http"
    }
  ]
}

The Real Migration Timeline

A documented production migration of a real App Mesh ECS environment gives useful benchmarks: plan on 4 to 6 hours for the first service you migrate. About 2 hours of that is active migration work. The remaining time is debugging.

The two most common debugging targets:

Security groups. App Mesh routes traffic through the Envoy proxy, so inbound connections to your containers came from the proxy’s IP. Service Connect removes the proxy from the inbound path — traffic arrives from your load balancer or from other tasks directly. Security groups configured to allow traffic from the proxy need updating for direct access. This is the issue that eats most of the troubleshooting time.

Health checks. If your ALB health checks were pointing at a proxy endpoint rather than your application’s health route, they’ll fail after migration because there’s no proxy to answer them. Update health check paths to hit your application’s actual health endpoint directly.

After the first service, you’ll have debugged these patterns. Subsequent migrations take 1 to 2 hours each.

CloudWatch Metrics Included

One thing that genuinely improves with Service Connect: observability comes free without extra configuration. The Service Connect Agent automatically generates CloudWatch metrics:

  • ActiveConnectionCount — current open connections per service
  • NewConnectionCount — new connections per minute
  • ProcessedBytes — traffic through the proxy
  • RequestCount — requests per minute
  • HTTPCode_Target_2XX_Count — successful responses
  • HTTPCode_Target_4XX_Count and 5XX_Count — client and server errors

In App Mesh, getting equivalent metrics required configuring Envoy’s stats endpoint and either scraping it or setting up custom CloudWatch metric filters. With Service Connect, they’re available in CloudWatch immediately.

Pair these with the observability patterns for distributed systems in your stack. For teams running OpenTelemetry alongside AWS native tooling, the OpenTelemetry and CloudWatch observability guide covers how to correlate these metrics with traces.

Fargate Simplifies the Networking

If your ECS workloads run on Fargate rather than EC2 launch type, the security group situation is somewhat simpler to reason about. Fargate tasks each get their own elastic network interface and security group. Traffic from the ALB hits the task’s ENI directly.

The Fargate autoscaling with GitLab CI post covers Fargate networking model in detail. The key point during App Mesh migration: with Fargate, you don’t have the EC2 instance security group layer to consider — only the task security group. That’s one fewer place to check when debugging connection failures during the cutover.

Migration Order

Migrate services from the outside in. Start at the edges — services with no downstream dependencies within your mesh. These are the safest to migrate first because their failure doesn’t cascade.

Work inward toward your most critical shared services last. By the time you reach your core services, you’ll have debugged your security group and health check patterns on the less critical ones, and your team will know exactly what to do.

If you have services that sit behind an API Gateway layer, the AWS API Gateway, WAF, and Nginx zero-trust setup is worth reviewing alongside this migration — the mTLS and authentication patterns between layers may need adjustment when moving from App Mesh’s mutual TLS to Service Connect’s configuration.

For VPC and subnet design considerations during the migration — particularly if you’re running services across multiple AZs or using private subnets — the AWS VPC design patterns guide covers the networking foundation that underpins both the old and new setup.

Don’t Wait Until August

Six months feels like a long time. It isn’t, once you account for planning, staging environment migration, testing, production migration per service, and inevitable debugging delays.

The teams that start now will migrate their staging environment in May and June, run their first production service in July, and be fully migrated in August with a month of buffer. The teams that start in August will be doing emergency migrations in September.

App Mesh has worked reliably. Service Connect is genuinely simpler to operate. This migration is an improvement, not just a deadline-forced disruption. Take the time to do it properly.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus