
Deep Learning Projects
The objective of this project is to develop a Proof of Concept (POC) for car damage detection that can classify the condition of a car's front and rear into six predefined categories. The solution will be delivered as a trained deep learning model integrated into a Streamlit app. This POC will serve as a foundation for evaluating the viability of an automated damage detection system for VROOM Cars.
Vehicle Damage Detection
1. Project Summary
-
An end-to-end deep learning pipeline to classify vehicle images into 6 damage categories (Front Breakage, Front Crushed, Front Normal, Rear Breakage, Rear Crushed, Rear Normal).
-
The project includes dataset preparation and augmentation, training and comparing 5 different models (from scratch CNNs to transfer learning with ResNet), hyperparameter tuning with Optuna, evaluation with confusion matrix and classification reports, model saving, and a Streamlit app for easy end-user inference.
-
The final ResNet-based model achieves ~80.5% validation accuracy on the held-out validation set.
2. Why this problem matters
-
Automated vehicle damage classification speeds up insurance claim triaging, helps repair shops estimate repairs, and can be integrated into mobile apps for customers.
-
The 6-class split models a realistic use-case where both location (front/rear) and severity (normal/breakage/crushed) matter.
3. What I built (high-level)
-
Collected/organized the image dataset and used ImageFolder layout: dataset/<class_name>/*.jpg.
-
Built augmentation and preprocessing pipeline appropriate for ConvNets and pretrained backbones.
-
Trained multiple model families: simple CNN, CNN with regularization, EfficientNet-B0 (transfer learning), ResNet50 (transfer learning), and Finetuned ResNet50 using Optuna.
-
Evaluated each model on the same validation split and compared metrics (accuracy, precision, recall, F1 per class) and confusion matrix.
-
Selected the best model, exported weights, and wrapped inference inside a Streamlit app for end-to-end demo.
-
Documented reproducible steps, saved model artifacts, and created a small guide for deploying the app.
4. Dataset & Setup
-
Source: Local folder organized for ImageFolder with subfolders per class.
-
Counts: 2300 images total (example), split 75% train / 25% val (1725 / 575). Batch size 32.
-
Classes (example): ['F_Breakage','F_Crushed','F_Normal','R_Breakage','R_Crushed','R_Normal'] (6 classes).
-
Why this split? 75/25 is a common choice giving enough data to train while retaining a reasonable validation set to measure generalization.
5. Preprocessing & Augmentation — reasoning and choices

-
Why each step?
-
RandomHorizontalFlip() — provides invariance to left-right orientation; useful because damage on left vs right of a vehicle can be symmetric.
-
RandomRotation(10) — small rotations model camera tilt differences from real-world captures.
-
ColorJitter(...) — simulates lighting and exposure variation across images (important for field-captured photos).
-
Resize(224,224) + standard ImageNet normalization — compatible with pretrained backbones (ResNet/EfficientNet) and ensures stable optimization.
-
6. Models, design choices & intuition
Model 1 — Simple CNN (from scratch)
-
Architecture: 3 convolutional blocks (16, 32, 64 filters), ReLU, MaxPool, then a fully-connected layer 64*28*28 -> 512 -> num_classes.
-
Why try it? Baseline to confirm dataset is learnable and to practice designing from-scratch networks.
-
Results observed: Validation accuracy improved from ~37% -> ~57% . Shows model can learn but likely underpowered for complex visual patterns and prone to overfitting on small dataset.
-
Key intuition: Small networks require careful capacity tuning — too small => underfit; too big => overfit. Also, training from scratch with limited images is harder.
Model 2 — CNN with Regularization (BatchNorm + Dropout + weight decay)
-
Additions: BatchNorm after convs, Dropout before final FC, L2 weight decay in optimizer.
-
Why: BatchNorm stabilizes training and speeds convergence; Dropout and weight decay reduce overfitting by discouraging co-adaptations.
-
Observations: Training behaviour became noisier at first, and validation accuracy dropped to ~50% vs baseline. Interpretation: adding heavy regularization on a small-capacity network can harm learning because it restricts capacity the model needs to fit patterns. This is a useful diagnostic step: regularization is not always helpful unless capacity and dataset size are balanced.
Model 3 — Transfer Learning with EfficientNet-B0
-
Approach: Load pretrained efficientnet_b0 (ImageNet weights), freeze all parameters, replace classifier, train classifier head.
-
Why try it? Transfer learning brings strong ImageNet visual features that often generalize well for small datasets.
-
Results observed: Validation accuracy improved to ~65.7% after training the classifier head.
-
Takeaway: Transfer learning provides a large boost because low/mid-level features (edges, textures) are reused; however, freezing too many layers may limit adaptation to domain-specific features (car damage patterns).
Model 4 — Transfer Learning with ResNet50 (layer4 unfrozen)
-
Approach: Load resnet50 pretrained weights, freeze all layers, then unfreeze layer4 and fc. Replace fc with Dropout(dropout_rate) + Linear.
-
Why unfreeze layer4? layer4 contains higher-level features (object parts, combinations). Fine-tuning these helps adapt to domain specifics while keeping earlier layers stable.
-
Results observed: Substantial improvement — validation accuracy reached ~76–79% depending on run and hyperparameters. This became the strong contender.
-
Intuition: Fine-tuning deeper layers enables the network to learn domain-specific high-level cues (e.g., crushed vs breakage visual patterns) while retaining generic low-level features.
Model 5 — Hyperparameter Tuning with Optuna (ResNet50)
-
Hyperparameters tuned:
-
Learning rate (lr) in [1e-5, 1e-2] (log scale)
-
Dropout rate in [0.2, 0.7]
-
-
Setup: Use Optuna to run n_trials=20. For each trial we:
-
Instantiate ResNet50 with suggested dropout.
-
Unfreeze layer4 and fc, keep other layers frozen.
-
Train for several epochs for quick evaluation and allow Optuna pruning.
-
Report validation accuracy as objective to maximize.
-
Best trials:
-
Several trials reached ~80% validation accuracy during quick optimization. The best recorded trial gave ~80.17% accuracy with certain lr/dropout combos.
-
-
Final training: Re-trained ResNet50 with the chosen hyperparameters (lr=0.005, dropout=0.2) for several epochs and obtained ~80.52% validation accuracy.
7. Training details
-
Device: CUDA (GPU) used during training.
-
Loss: nn.CrossEntropyLoss() for multi-class classification.
-
Optimizer: optim.Adam(...) (used Adam for both from-scratch and transfer learning experiments).
-
Batch size: 32
-
Image size: 224x224
-
Data split: 75% train / 25% validation
-
Model saving: torch.save(model.state_dict(), 'saved_model.pth')
8. Results (final model)
-
Final chosen model: ResNet50 (layer4 + fc unfrozen), tuned hyperparams.
-
Validation Metrics (Example Classification Report):​​

-
Confusion matrix: The confusion matrix highlights classes where the model confuses severity (e.g., breakage vs crushed) — useful for targeted data augmentation.

-
After selecting hyperparameters, I retrained the ResNet model and obtained ~80.5% validation accuracy.
-
The classification report (precision/recall/f1 per class) shows good performance across the 6 classes with macro-avg f1 ≈ 0.79 and weighted avg ≈ 0.80.
-
Confusion matrix analysis helps identify confusable class pairs (e.g., or R_Normal vs F_Crushed vs F_Breakage , R_Breakage ) — these insight reveal which data augmentations or additional data could help
9. Streamlit app — inference & UX
-
Files: app.py, model_helper.py, saved weights saved_model.pth.
-
What it does:
-
Provides a drag-and-drop image uploader.
-
Loads the saved ResNet model and performs inference.
-
Displays predicted class label (one of the six), along with the uploaded image.
-
-
Key inference details in model_helper.py:
-
Rebuilds model architecture exactly the same as training (ResNet50 with dropout in fc).
-
Loads saved_model.pth via load_state_dict.
-
Applies same preprocessing (Resize 224, ToTensor, Normalize with ImageNet mean/std).​​
-
​
​
​
​

10. Deployement
-
Deployment: The complete project (training code, saved model, and inference helpers) is hosted on GitHub and deployed as a Streamlit web app. Users can upload a car image from any device and get a real-time prediction of the damage category, making the model globally accessible without any local setup.
11. Implementation & engineering details
-
Frameworks: PyTorch, torchvision, Optuna, sklearn (metrics), matplotlib (plots).
-
Hardware: GPU ( cuda ) for faster training.
-
Training loop details:
-
CrossEntropyLoss for multi-class classification.
-
Adam optimizer; weight_decay used when experimenting with regularization.
-
Training/Val loops separated; validation performed at epoch end with torch.no_grad().
-
Batch-level logs for loss, epoch-level average loss, and validation accuracy printed for monitoring.
-
-
Model saving: torch.save(model.state_dict(), 'saved_model.pth') for deployment or inference later.
12. Reproducibility & best practices followed
-
Fixed train/val split and recorded dataset sizes.
-
Used ImageNet normalization for transfer learning compatibility.
-
Kept experiments modular: same train_model() function reused across models for fair comparison.
-
Used Optuna with pruning to efficiently search hyperparameters and logged best trials.
-
Saved final model weights and logged confusion matrix + classification report for auditability.
