The Problem: Human Inspectors Can't Scale
Our client manufactures automotive components in their Chennai factory. They had 12 human inspectors checking parts for surface defects — scratches, dents, discoloration, dimensional deviations. At peak production, they run 8,000 parts per shift. The inspectors were catching about 87% of defects, which sounds okay until you realize that the 13% they missed was costing ₹2.3 crore per year in warranty claims and recalls.
They asked us to build an automated visual inspection system. The constraint: it had to run at line speed (one part every 4.5 seconds) and achieve at least 95% defect detection rate. We delivered 99.2%.
Data Collection Was the Hardest Part
This is the part nobody warns you about with computer vision projects. The model is only as good as your training data, and getting good training data in a manufacturing environment is painful. We set up four industrial cameras (Basler ace 2 with 12MP sensors) at the inspection station and captured images for three weeks before training a single model. We needed images of every defect type under every lighting condition the factory experiences.
The challenge: defective parts are rare. Out of 8,000 parts per shift, maybe 40-60 have defects. That's less than 1%. So we had a massive class imbalance problem. We used a combination of data augmentation (rotation, flip, brightness variation) and synthetic defect generation to balance the dataset. We also worked with the quality team to intentionally introduce defects on scrap parts for photography. The final training set was 45,000 images: 25,000 good parts and 20,000 defective parts across 8 defect categories.
Model Architecture and Training
We started with YOLOv8 for defect detection and it worked surprisingly well for large, obvious defects like dents and deep scratches. But it struggled with subtle surface defects — hairline scratches and slight discoloration. We switched to a two-stage approach: a ResNet-50 classifier for overall good/bad classification, followed by a YOLOv8 detector that localizes the specific defect on flagged parts. This two-stage approach increased our detection rate from 91% to 99.2%.
Training was done on 2x NVIDIA A100 GPUs on AWS. Total training time: about 14 hours for the final model. We used PyTorch with the Ultralytics library for YOLO and torchvision for ResNet. Nothing exotic — the secret was in the data quality, not the model architecture.
Edge Deployment: The Real Engineering Challenge
The model needed to run on-site, on the factory floor, with no cloud dependency (the factory's internet is unreliable). We deployed on an NVIDIA Jetson AGX Orin, which gives us enough GPU power for real-time inference. We converted the PyTorch models to TensorRT for optimized inference — this brought processing time from 800ms per image down to 120ms, well within our 4.5-second cycle time.
The deployment environment is harsh: temperature swings, vibration from nearby machinery, dust. We learned that camera calibration drifts over time due to vibration, so we built an automated recalibration routine that runs during shift changes. Lighting consistency was another nightmare — we ended up installing dedicated LED panels with fixed color temperature and brightness, because the factory's overhead lights change throughout the day as the sun moves.
Results After 6 Months
Defect detection rate: 99.2% (up from 87% with human inspectors). False positive rate: 1.8% (parts flagged as defective that were actually fine). The false positives are reviewed by one human inspector, which is much more manageable than inspecting every part. Annual savings: roughly ₹1.8 crore in reduced warranty claims, plus ₹45 lakh in labor cost reduction. The system paid for itself in under five months.