The Board Exam Traffic Pattern
If you haven't worked in Indian EdTech, you might not appreciate how extreme the traffic spikes are. For context: a client's platform averages 2,500 concurrent users on a normal day. On CBSE/ICSE board exam result day, that number hits 100,000+ within the first 30 minutes. That's a 40x spike with virtually no ramp-up time. It's like Black Friday, except it happens on specific dates every year, and your users are extremely anxious students and parents who will not tolerate a slow page.
We've been through four board exam seasons with this client and each one taught us something. Here's the architecture that survives it.
The CDN Layer: Serve What You Can Statically
On result day, about 70% of the traffic is to static or semi-static pages: the homepage, FAQs, "how to check your result" guides, and previous year content. All of this gets served from CloudFront edge cache with a 5-minute TTL. The origin servers never see these requests. This single decision handles 70K of the 100K concurrent users.
For the result-checking feature itself, we pre-generate result PDFs and cache them on S3 with CloudFront. Once a student's result is fetched the first time, subsequent requests for the same result serve from cache. The cache hit rate by the 30-minute mark of result day is typically 60-70% because students share their result link with family, and each subsequent view is a cache hit.
The API Layer: Auto-Scaling with Pre-Warming
We run the API on ECS Fargate behind an Application Load Balancer. Auto-scaling is configured to respond to CPU utilization (target: 60%) and request count per target (target: 500 req/s per container). But auto-scaling has a lag — it takes 2-3 minutes to spin up new containers. During a 40x traffic spike, 2-3 minutes of under-provisioning means thousands of failed requests.
Our solution: pre-warming. The night before a known traffic event (board exam results are announced in advance), we manually scale up to 10x our normal capacity. We set the minimum task count high, pre-warm the database connection pool, and pre-populate the cache with commonly accessed data. This gives us a warm platform when the spike hits. After the event subsides (typically 6-8 hours), we scale back down. Yes, we pay for the extra capacity during the pre-warm period, but the cost ($200-300 for a day of over-provisioning) is trivial compared to the reputational cost of being down during results.
The Database Layer: Read Replicas and Connection Pooling
The primary database (PostgreSQL on RDS) handles writes — storing new user registrations, result queries, and analytics events. All read queries go to two read replicas. We use PgBouncer in front of both primary and replica for connection pooling, which keeps the connection count manageable even at 100K concurrent users.
For the result lookup specifically, we moved it off the main database entirely. Results are imported into a DynamoDB table (keyed by roll number) the night before. DynamoDB handles the read traffic trivially — it auto-scales to any read load, and single-digit-millisecond response times at any scale. The cost for 100K reads in a burst: about $0.25. This was probably the single best architectural decision we made.
The Queue Layer: Don't Let Spikes Kill Background Jobs
The traffic spike doesn't just affect the web tier. Every result lookup triggers background jobs: sending the result via email, generating PDF certificates, updating analytics dashboards. Without protection, these background jobs would overwhelm the system. We use SQS with a controlled concurrency consumer — the job processor pulls at most 50 messages per second, regardless of queue depth. The queue absorbs the spike, and background processing continues at a steady rate. Some jobs are delayed by minutes during peak, but that's acceptable for email delivery and PDF generation.
Lessons From Four Seasons
Test at 2x your expected peak, not 1x. We use k6 for load testing and simulate 200K concurrent users (double our expected peak) in staging. We discovered that at 120K concurrent users, the ALB's connection draining was too slow, causing 502 errors during auto-scaling events. We increased the deregistration delay and the issue disappeared. Document your runbook. On result day, three engineers are on standby with a detailed runbook covering: pre-warming procedures, monitoring dashboards to watch, escalation paths, and rollback procedures for each component. It's not glamorous engineering, but it's the difference between a smooth result day and a panicked one.