The $18,400 Monthly AWS Bill
A client came to us with a problem that's increasingly common: their AWS bill was growing faster than their revenue. At $18,400/month for a SaaS product with 12,000 active users, they were spending about $1.53 per user per month on infrastructure alone. For a product charging $29/month, that's a 5.3% infrastructure cost ratio — not catastrophic, but much higher than the 2-3% benchmark for efficient SaaS operations.
We spent two weeks auditing their infrastructure and implementing changes. The final bill: $10,700/month. A 42% reduction with zero downtime and no user-facing performance degradation. Here's every change we made, with the specific savings from each.
Right-Sizing EC2 Instances: -$2,800/month
The classic. They were running three m5.2xlarge instances (8 vCPU, 32GB RAM each) for their API servers. Average CPU utilization: 12%. Average memory usage: 8GB per instance. They'd sized for peak traffic that happened once (a product launch) and never scaled back down.
We switched to m6i.xlarge instances (4 vCPU, 16GB RAM) behind an auto-scaling group that scales from 2 to 6 instances based on CPU utilization (target: 65%). Normal traffic runs on 2 instances. Peak traffic auto-scales to 3-4. We've never seen it need more than 4. The m6i generation is also about 15% cheaper per vCPU than m5, so we got a double benefit.
Reserved Instances + Savings Plans: -$1,900/month
The client was running everything on on-demand pricing. For workloads that run 24/7 (databases, cache, baseline API servers), reserved instances or savings plans save 30-40%. We purchased 1-year no-upfront reserved instances for their RDS database and ElastiCache cluster, and a Compute Savings Plan for the baseline EC2 capacity. The commitment is to a minimum spend, not specific instances, so we retain flexibility.
S3 Lifecycle Policies: -$600/month
They stored user-uploaded files and generated reports in S3 Standard. Files older than 30 days were rarely accessed. We implemented lifecycle policies: files move to S3 Infrequent Access after 30 days and to S3 Glacier Instant Retrieval after 90 days. The access pattern supported this — only 3% of requests were for files older than 30 days. We also enabled S3 Intelligent-Tiering for the subset of data with unpredictable access patterns.
Database Optimization: -$1,400/month
The PostgreSQL RDS instance was a db.r5.2xlarge (8 vCPU, 64GB RAM). Connection count was averaging 45 out of a maximum of 150. Query analysis showed that 30% of database load was from a single reporting query that ran every 15 minutes, doing a full table scan on a 50M row table. We added a composite index (reduced query time from 12 seconds to 200ms) and moved the reporting queries to a read replica. The primary instance was downsized to db.r6g.xlarge (4 vCPU, 32GB) using Graviton processors (20% cheaper than equivalent Intel).
NAT Gateway: The Hidden Cost: -$800/month
This one surprises everyone. NAT Gateway charges $0.045 per GB of data processed. Their application servers in private subnets were making external API calls (payment processing, email sending, analytics) through the NAT Gateway, processing about 500GB/month. We moved the external API call services to public subnets (with proper security groups) and eliminated 80% of NAT Gateway data processing. For services that must stay in private subnets, we set up VPC endpoints for AWS services (S3, DynamoDB, SQS) which are free and bypass the NAT Gateway entirely.
Container Optimization: -$900/month
Their background workers ran on ECS Fargate with 2 vCPU and 4GB RAM per task, running 24/7. Actual utilization: 0.3 vCPU and 800MB RAM average. We switched to ECS Fargate Spot for non-critical workers (acceptable for background jobs where a 2-minute interruption is fine) and right-sized to 0.5 vCPU and 1GB RAM. Fargate Spot is 70% cheaper than standard Fargate.
The Process
We used AWS Cost Explorer's rightsizing recommendations as a starting point, then validated with actual CloudWatch metrics over a 30-day period. We made changes incrementally — one service at a time, with monitoring, never all at once. Each change was tested in a staging environment for at least 48 hours under synthetic load before applying to production. The entire optimization took two weeks of active work, spread over three weeks calendar time to allow for monitoring between changes.