The moment two engineers run terraform apply at the same time without state locking, you have a race condition that can corrupt your entire infrastructure state. Both processes read the...
Review apps changed how my team does code review. Instead of reading diffs, reviewers click a link and see the actual change running. The designer can verify spacing on the...
I spent three years at a company that spent $4 million on “DevOps transformation.” New tools, new cloud infrastructure, training budgets, the works. The velocity of the platform stayed flat....
The first time someone accidentally created a p4d.24xlarge instance in production, we started taking policy-as-code seriously. No one meant to. The Terraform code was correct, the pipeline ran fine, the...
I shipped Terraform code without tests for years. Then a terraform apply deleted a production database because a conditional flipped. The resource had a lifecycle { prevent_destroy = true }...
The VPC decisions you make on day one will follow you for years. I’ve lived through the consequences—redesigning a network that was built without proper CIDR planning, watching a simple...
Our monorepo pipeline used to take 15 minutes. Every commit ran tests for the API, the background worker, and the frontend — in sequence, regardless of what changed. A one-line...
Two years ago, SBOMs were a checkbox on a compliance spreadsheet. In 2026, they’re a hard requirement. The US Executive Order 14028 mandated that any software sold to federal agencies...
I watched a backend engineer spend two hours yesterday trying to figure out which CloudFormation template to use for their new service. They had three options in a Confluence page....
I’ve been using Terraform MCP for three months now, and it’s the most significant shift in how I interact with infrastructure since Terraform itself. That’s not hyperbole. I can ask...
The first time I tried running integration tests in GitLab CI, I hardcoded a database connection to localhost and wondered why nothing worked. The job would spin up, find no...
I’ve made the mistake of using count where I should have used for_each. Most people have. You end up with a Terraform state that looks reasonable until you need to...
I used to instrument AWS services the hard way. AWS X-Ray SDK here, CloudWatch Logs there, custom metrics scattered across a dozen boto3 calls. Each service had its own observability...
Most tutorials show you how to run terraform apply on a git push and call it a day. I’ve inherited infrastructure built that way. It’s chaos. Drift accumulates silently. Rollbacks...
When HashiCorp changed Terraform’s license in August 2023, it forced a reckoning across the infrastructure-as-code community. The shift to the Business Source License (BSL) sent shockwaves through organizations that had...
Java teams already have enough framework churn. Most of them are not looking for a new agent platform. They want to keep Spring Boot, add model access, expose a few...
On March 10, 2026, AWS added stateful MCP server features to Amazon Bedrock AgentCore Runtime. If you only read the headline, it sounds like a protocol update. It is more...
On April 9, 2026, AWS launched AWS Agent Registry in preview inside Amazon Bedrock AgentCore. That launch matters because most teams are no longer struggling with a single agent demo....
AWS announced Amazon EKS Auto Mode on December 1, 2024. The deeper “under the hood” explanation followed on March 31, 2025. On February 10, 2026, AWS added CloudWatch Vended Logs...
On March 1, 2024, AWS added hybrid search to Knowledge Bases for Amazon Bedrock for Amazon OpenSearch Serverless. On March 27, 2025, AWS added Amazon OpenSearch Managed Cluster as a...
AWS made Amazon Bedrock AgentCore Evaluations generally available on March 31, 2026. That launch matters because it answers the first serious production question every agent team eventually hits: how do...
Amazon Bedrock AgentCore got two features in March 2026 that matter far more than the marketing language around them. On March 17, 2026, AWS launched shell command execution in AgentCore...
Amazon ECS Service Connect and Amazon VPC Lattice both improve service-to-service connectivity on AWS, but they do not solve the same boundary. Amazon ECS Service Connect launched on November 27,...
Terraform workspaces seemed like the solution to multi-environment management — one configuration, many states. Then teams discovered the problems: workspace sprawl, no isolation between environments at the module level, and...
Terraform and Pulumi solve the same problem — declaring cloud infrastructure and tracking its state — but with fundamentally different approaches to how you express that declaration. Terraform uses HCL,...
The kube-prometheus-stack Helm chart installs Prometheus, Alertmanager, Grafana, and a collection of default Kubernetes dashboards in about five minutes. That’s the fastest path to useful EKS monitoring. The harder part...
LocalStack built something genuinely useful. A local emulator for AWS services that let you test Lambdas, S3 buckets, SQS queues, and DynamoDB tables without touching a real AWS account. For...
Kubernetes RBAC controls who can do what, but it doesn’t control whether the things they do are safe. A developer with namespace-level deploy access can create a Pod without resource...
Kubernetes v1.36 shipped April 22, 2026, with 64 enhancements across the release: 17 graduating to stable, 18 moving to beta, and 24 entering alpha. The headline is sidecar containers reaching...
ingress-nginx is End of Life. CVE-2026-4342 — a configuration injection vulnerability enabling potential code execution — was disclosed in April 2026 against all versions below v1.13.9, v1.14.5, and v1.15.1. The...
AWS launched Kiro on July 14, 2025. It’s an agentic IDE built on Code OSS (the open-source foundation of VS Code) and it makes a specific bet: the biggest problem...
Helm is the package manager for Kubernetes. Raw YAML manifests work fine for a single deployment in one environment. Once you need the same application in staging, production, and three...
At some point in every GitLab CI/CD setup, the single shared runner stops being enough. Backend tests queue behind someone’s slow frontend build. GPU jobs wait on the same runner...
I spent three years pushing changes to Kubernetes with kubectl apply inside CI/CD pipelines. Every deployment required cluster credentials in GitLab. Every pipeline failure left the cluster in an unknown...
Both platforms started at essentially the same place and have converged to a point where the pipeline YAML looks almost identical. The real differences are in pricing model, ecosystem integration,...
The manual Terraform workflow — terraform plan on your laptop, peer-review the output in Slack, terraform apply if it looks right — breaks down around the time your team hits...
In 2021, GitHub released OIDC support for Actions — and quietly made static AWS access keys in CI/CD pipelines obsolete. The old approach required storing AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as GitHub...
HashiCorp switched Terraform to the Business Source License in August 2023. Within weeks, the OpenTofu fork was announced under the Linux Foundation, accepted as a CNCF project, and had a...
Running out of IP addresses in production at 2 AM is a specific kind of bad. It happens in EKS clusters when the VPC CNI plugin has allocated every available...
Karpenter hit v1.0 in late 2024, and for most EKS clusters it’s now the better choice over Cluster Autoscaler. The performance difference alone is enough to justify the switch: Cluster...
AWS re:Invent 2023 had a stat that keeps coming up in job postings: EKS adoption grew 88% year-over-year among enterprise AWS customers. That number isn’t surprising if you’ve been watching...
A batch job that runs for eight minutes, three times a day. A CI pipeline that spins up test pods on every commit. An API that handles zero traffic on...
AWS EKS standard support ends 14 months after a Kubernetes version’s upstream release. Extended support adds another 12 months but costs $0.60 per cluster per hour on top of normal...
EC2 Auto Scaling has been around since 2009, but teams still misconfigure it in ways that cost them money or reliability. The most common mistake: using simple scaling policies instead...
A Node.js application shipped as a Docker image with all development dependencies included: node_modules with Jest, ESLint, TypeScript compiler, and hundreds of transitive dev dependencies baked in. The image weighs...
Running Kubernetes on EKS without Container Insights is like flying without instruments. You can see your pods are running, but when a node is memory-pressured and pods start getting OOMKilled,...
On February 24, 2026, AWS announced server-side tool execution for Amazon Bedrock through Amazon Bedrock AgentCore Gateway integration with the Responses API. That launch changes a stubborn problem in agent...
X-Ray answers the question that CloudWatch logs and metrics can’t: why is this specific request slow? Logs tell you something happened. Metrics tell you how often. X-Ray tells you exactly...
At five VPCs, full-mesh VPC peering starts to feel manageable. At ten it’s annoying. At twenty, you have 190 peering connections to maintain, each with its own route table entries,...
Every bastion host in your architecture is a maintenance burden and an attack surface. You need to keep the AMI patched, manage SSH keys across the team, control security group...
The biggest bill shock teams get on AWS isn’t from accidental services left running or an exposed S3 bucket. It’s from paying On-Demand rates for workloads that run 24/7. A...
Most engineers use Route 53 for one thing: create an A record pointing to a load balancer and move on. But Route 53 has seven routing policies, each solving a...
The problem RDS Proxy solves is simple to describe and expensive to ignore: Lambda functions don’t maintain persistent connections. Every cold start opens a new database connection. At moderate scale...
The default path for a private EC2 instance to reach an AWS service like S3, Secrets Manager, or SSM is through a NAT gateway — $0.045/hour plus $0.045 per GB...
A Lambda cold start is a tax you pay every time AWS needs to create a new execution environment for your function. For a Python function with minimal dependencies, that...
I’ve watched too many teams misunderstand FinOps. They think it means shutting down instances at night or buying bigger discounts. That’s not FinOps. That’s panic cost-cutting. Real FinOps is about...
Before EventBridge Pipes launched in December 2022, connecting an SQS queue to a Step Functions state machine meant writing a Lambda function that polled the queue, parsed the payload, and...
On March 31, 2026, AWS made the DevOps Agent generally available. The announcement tweet from @awscloud got 3.3 million views in a week. The reaction from the DevOps community ranged...
Most AWS accounts run EC2 instances that are the wrong size. Not dramatically wrong — nobody runs an m5.24xlarge for a blog — but quietly, consistently over-provisioned. An instance that...
AWS CodePipeline and CodeBuild give you a CI/CD stack that stays entirely within AWS — no Jenkins to maintain, no GitHub Actions runner infrastructure, no CircleCI seat costs. CodeBuild runs...
A tweet that reached 17,105 people last January listed the seven AWS services you need to know to get hired. CloudWatch was on it alongside EC2, S3, IAM, Lambda, RDS,...
GitOps is the practice of using a Git repository as the single source of truth for what should run in your Kubernetes cluster. ArgoCD implements this by watching a Git...
Amazon EKS Capabilities is one of the more consequential EKS launches for platform teams because it moves beyond “managed Kubernetes control plane” and starts managing common platform controllers around the...
AWS App Mesh is end-of-life as of September 30, 2026. If you run ECS services that communicate via App Mesh, migration is required. The AWS-recommended replacement for ECS workloads is...
The infrastructure-as-code tooling market looks different in 2026 than it did three years ago. HashiCorp’s 2023 license change from MPL to BSL fractured the Terraform community, triggered the OpenTofu fork...
Every few months someone on my team asks whether we should migrate from GitLab to GitHub, or vice versa. In 2026 that question is harder to answer than it was...
Every infrastructure team hits this wall eventually. The AWS account already has hundreds of resources — VPCs, security groups, RDS clusters, S3 buckets — that predate any Terraform adoption. Someone...
GitLab Runner is one of those tools that sits at the heart of GitLab CI/CD. It picks up the jobs you define in your pipeline and runs them, reporting results...
I’ve been deploying to AWS from GitLab CI for years. The patterns have shifted. In 2021 the answer was almost always Elastic Beanstalk — it was the lowest-friction path from...
Most teams do not have a testing problem. They have a feedback-latency problem. Code gets written, pushed, and the first signal that something is wrong arrives from a production alert...
Startups face a choice: build a monolith and tear it apart later, or start with microservices and add DevOps practices from day one. Most teams that pick the second path...
Picking an architecture style matters. A lot. You either go with a monolith, which is basically one big codebase where everything lives together, or you split things into microservices, where...
Can traditional systems handle the expectations we have now for instant responses and real-time engagement? Imagine a system that reacts immediately to what users do, and can handle thousands or...
Think of it this way: what if the servers running your app could scale up automatically when traffic spikes, and scale down when it’s quiet, without you touching anything? That’s...
As cloud usage grows, data spreads across servers everywhere. This creates a real problem: traditional security tools cannot keep up with cybercriminals who move fast and adapt faster. AI and...
In software development, security and efficiency matter. DevOps has changed how teams build, test, and deploy software, enabling faster delivery and collaboration between development and operations. However, with evolving security...
Welcome to DevSecOps and Artificial Intelligence (AI) in software development. This post explores how AI fits into the DevSecOps landscape and how teams handle modern software development challenges.
Cloud services run fast, and when they don’t, customers leave. That’s the reality of running anything online today. Downtime costs money. Latency costs customers. If you’ve ever watched your error...
This article compares GitLab and Jenkins, two popular DevOps tools. We’ll explore their strengths and weaknesses to help you decide which fits your needs.
If you’ve been watching software teams for any length of time, you know the old way of doing things: developers finish their code, hand it off to testers, who then...
DevOps combines software development and IT operations, which shortens system development cycles and enables continuous delivery. Machine learning needs significant computational resources to process large amounts of data quickly. This...
Cloud computing changes how businesses work. But if you’re thinking about moving away from managing your own IT, you need to know what options are actually available.
I’ve watched three cloud migration projects fall apart. Not because the technology failed — the tech almost never fails. They failed because nobody planned for the human and process side...
In this tutorial, we’ll walk through a real project that needs a GitLab CI/CD pipeline. We’ll look at actual working examples and explain why gitlab ci yml examples matter in...
A DevOps team at a growing company needed to handle automation and event-driven responses across multiple applications. Managing numerous Lambda functions individually became unwieldy. Terraform provided a way to solve...
When you want to use IP replication between the recovery site and the on-premises production site, you must configure a site-to-site VPN connection. Before establishing the connection, there are some...
AWS Enhanced Networking improves how your EC2 instances talk to each other. It uses technologies like the Elastic Network Adapter (ENA) and Single Root I/O Virtualization (SR-IOV) to deliver faster,...
Software development has evolved quickly over the years. Businesses now face pressure to deliver high-quality products faster due to increasing demand for software and apps. DevOps and Site Reliability Engineering...
Terraform lets you manage cloud infrastructure through code instead of clicking around in web consoles. Define what you want, apply it, and Terraform figures out how to make it happen....
If you have spent any time in education over the past decade, you have probably noticed that the way people share files, collaborate on projects, and access course materials has...
As a devops engineer, managing infrastructure eats up a lot of my time. Keeping track of dozens of components, making sure everything talks to each other correctly - it adds...
The lookup function in Terraform is one of those tools that seems trivial until you’re staring at an error at 11pm and realizing you’ve been using it wrong for six...
So you’re trying to decide between DevOps and Software Engineering. I get it—these roles blur together more than most job postings let on, and the advice out there is usually...
I ran into an interesting architecture problem recently. We had multiple Web Applications running on EC2 instances behind AWS API Gateway, and we needed to add a WAF without breaking...
If you’ve worked with Terraform for a while, you’ve probably hit situations where you need to run something that doesn’t fit neatly into a cloud resource. Maybe you need to...
If you’re working with GitLab, you’ve probably noticed that managing who can do what gets complicated fast. This post walks through the built-in roles GitLab gives you, what each one...
I’ve gotten quite a few requests to write about Terraform Modules. The topic comes up a lot because people get confused about where modules end and resources begin. Let me...
Serverless sounds like a new thing, but it’s actually been brewing for decades. Back in the 1950s, computing cost an arm and a leg — we’re talking hundreds of dollars...
With the evolution of Cloud Computing, the way we access applications and databases has changed. We now access these things over the internet, which has pushed the Cloud Computing providers...
Last year I spent two days debugging a build pipeline because our CI system was reading the wrong version from a Maven POM. The XPath query looked correct, but it...
AWS tags let you attach custom key-value pairs to just about any resource in your account. If you’ve ever tried managing tags manually across dozens of resources, you know it...
When you start learning Terraform, the first thing you’ll run is terraform plan. It sounds simple, but understanding what it does will save you from costly mistakes later.
Here’s the setup: you need to provision infrastructure and then configure it. Terraform does the first part beautifully. Ansible does the second part beautifully. The moment you try to make...
If you are moving to the cloud, infrastructure as code (IaC) should be part of your toolkit. It helps teams ship faster and keeps environments consistent. But you need the...
I want to walk you through a real project I worked on. The ask was straightforward: go through all our applications and yank out any passwords that were hardcoded in...
Here’s a quick way to generate random passwords with Terraform. This comes in handy when you’re setting up RDS, AWS Secret Manager, MSK, or anything else that needs authentication. The...
When you run terraform apply without any flags, Terraform applies all the changes in your plan at once. If you’ve ever worked on a large Terraform project, you know how...
Terraform lets you manage a lot of infrastructure declaratively, but sometimes you need to repeat the same nested block configuration multiple times – with slight variations. That’s where dynamic blocks...
AWS created Secrets Manager after hearing from customers that managing secrets was critical but difficult. IAM Roles help because they provide temporary credentials automatically. Attach a role to an EC2...
Terraform is a declarative language. That means you describe the desired state, and Terraform figures out how to get there. Unlike procedural languages, you don’t write step-by-step instructions.
If you have been working with Terraform for a while, you probably already know that environment variables can make your life easier, especially when running Terraform in CI/CD pipelines. You...
When Terraform does not do what you expect, you need to figure out why. This post covers the debugging tools Terraform gives you and how I use them in practice....
I want to walk you through Terraform variable types. If you’ve worked with other programming languages, you’ll find Terraform’s approach familiar. Variables hold your data, and you need to know...
Webhooks let GitLab push HTTP requests to your app when something happens. You can use this to get notified or trigger automation without polling an API.
If you run the same pipeline over and over, waiting for npm install or bundle install every time, you start wondering if there’s a better way. There is. GitLab CI...
If you’ve used Terraform for any serious infrastructure work, you’ve probably felt the pain of managing separate state files for dev, staging, and production. That’s exactly what workspaces solve.
Artifacts let you persist files between CI/CD jobs. If your pipeline produces build outputs, test reports, or any other files you need later, GitLab stores them as artifacts. You can...
Terraform needs to track state about your infrastructure. This state tells Terraform how your configuration maps to real resources already running in the cloud, stores metadata about those resources, and...
Sometimes a resource goes sideways and Terraform loses track. Maybe an application inside a VM crashed while the VM itself keeps running. Or someone manually patched a database server outside...
GitLab’s rules keyword gives you control over whether a job runs or gets skipped. You build these rules from conditions that check variables and events.
Let’s talk about how to decouple applications using poll-based messaging. I’ll walk you through what SQS does, the key settings you’ll touch in practice, and how visibility timeout keeps your...
If you haven’t read it yet, check out our post on horizontal vs vertical scaling. Now let’s talk about what decoupling your applications actually means and how to design a...
If you have launched EC2 instances through the wizard, you know it involves a fair amount of clicking. Image ID, instance type, network, security groups, storage – it adds up....
There are two ways to scale in AWS: vertical and horizontal. I want to start with vertical scaling because it’s the approach most of us learned first. Then we’ll get...
I have been running workloads on AWS for years, and one of the easiest ways to cut your EC2 bill is Spot Instances. In this post I will walk through...
S3 is fast out of the box, but there’s a difference between “works fine” and “handles serious traffic.” This post covers how to push S3 harder without resorting to Transfer...
I had to clean up a bunch of old projects on GitLab recently, and figured I’d write this down while it’s fresh. If your GitLab instance has too many abandoned...
Terraform is a solid tool for describing your infrastructure as code. But if you need to create multiple resources that are nearly identical, copying and pasting the same block gets...
Terraform outputs are how you get data out of your infrastructure. If you have ever run terraform apply and seen those printed values at the end, those are outputs. They...
Terraform manages cloud infrastructure as code. You describe what you want, and it figures out how to make it happen. Like any programming language, Terraform has features that aren’t obvious...
If you have spent any time writing Terraform, you know that your configurations can get messy fast. You end up repeating the same expressions, hard-coding the same values, and before...
If you work with GitLab, you probably type your username and password every time you push code. It gets old fast. SSH keys fix that: once set up, GitLab authenticates...
Terraform has a handy way to render configuration files dynamically by injecting variables into templates. If you have ever needed to generate a user-data script, a config file, or a...
I work with AWS KMS regularly, and in this post I want to share what I’ve learned about the key management service and how to use it from the command...
GitLab is more than a code repo. You can build, test, and deploy straight from it. If you are already working with Infrastructure as Code, you probably use Terraform locally....
GitLab CI is a solid choice for building and deploying applications. You get automation, full change tracking, and a pipeline system that handles the heavy lifting.
I wanted to share how I set up CloudFormation templates to run through GitLab CI/CD. If you’ve been writing templates and running them manually from your terminal, moving the whole...
I have been running GitLab CI at scale for a while now, and one thing I keep running into is the need for more hardware as applications get more complex....
Building a Docker image on GitLab sounds simple, and it usually is – until you hit caching problems or try to push to a remote registry. I ran into these...
If you are building Java applications, you need Gitlab Runner and Maven in your CI/CD pipeline. This post walks through everything required to get your Java project building on Gitlab,...
If you want to analyze a JavaScript project with SonarQube but don’t want to install Java, Node.js, and a bunch of other tools on your machine, Docker is the way...
I’ve broken a production server twice by creating users wrong. Once by assigning the wrong UID. Once by not understanding how the primary group assignment works. Neither time was obvious...
I’ve been using SonarQube with Docker and Maven for years, and it’s still my go-to setup for local development. Let me walk you through how I run it without spending...
Teams sometimes assume their infrastructure-as-code templates are the final word on what’s running. That’s rarely true for long. Configuration drift — the gap between what your code says and what’s...