Deep Learning Projects

1. Vehicle Damage Detection

The objective of this project is to develop a Proof of Concept (POC) for car damage detection that can classify the condition of a car's front and rear into six predefined categories. The solution will be delivered as a trained deep learning model integrated into a Streamlit app. This POC will serve as a foundation for evaluating the viability of an automated damage detection system for VROOM Cars.

Vehicle Damage Detection

1. Project Summary

An end-to-end deep learning pipeline to classify vehicle images into 6 damage categories (Front Breakage, Front Crushed, Front Normal, Rear Breakage, Rear Crushed, Rear Normal).
The project includes dataset preparation and augmentation, training and comparing 5 different models (from scratch CNNs to transfer learning with ResNet), hyperparameter tuning with Optuna, evaluation with confusion matrix and classification reports, model saving, and a Streamlit app for easy end-user inference.
The final ResNet-based model achieves ~80.5% validation accuracy on the held-out validation set.

2. Why this problem matters

Automated vehicle damage classification speeds up insurance claim triaging, helps repair shops estimate repairs, and can be integrated into mobile apps for customers.
The 6-class split models a realistic use-case where both location (front/rear) and severity (normal/breakage/crushed) matter.

3. What I built (high-level)

Collected/organized the image dataset and used ImageFolder layout: dataset/<class_name>/*.jpg.
Built augmentation and preprocessing pipeline appropriate for ConvNets and pretrained backbones.
Trained multiple model families: simple CNN, CNN with regularization, EfficientNet-B0 (transfer learning), ResNet50 (transfer learning), and Finetuned ResNet50 using Optuna.
Evaluated each model on the same validation split and compared metrics (accuracy, precision, recall, F1 per class) and confusion matrix.
Selected the best model, exported weights, and wrapped inference inside a Streamlit app for end-to-end demo.
Documented reproducible steps, saved model artifacts, and created a small guide for deploying the app.

4. Dataset & Setup

Source: Local folder organized for ImageFolder with subfolders per class.
Counts: 2300 images total (example), split 75% train / 25% val (1725 / 575). Batch size 32.
Classes (example): ['F_Breakage','F_Crushed','F_Normal','R_Breakage','R_Crushed','R_Normal'] (6 classes).
Why this split? 75/25 is a common choice giving enough data to train while retaining a reasonable validation set to measure generalization.

5. Preprocessing & Augmentation — reasoning and choices

Why each step?
- RandomHorizontalFlip() — provides invariance to left-right orientation; useful because damage on left vs right of a vehicle can be symmetric.
- RandomRotation(10) — small rotations model camera tilt differences from real-world captures.
- ColorJitter(...) — simulates lighting and exposure variation across images (important for field-captured photos).
- Resize(224,224) + standard ImageNet normalization — compatible with pretrained backbones (ResNet/EfficientNet) and ensures stable optimization.

6. Models, design choices & intuition

Model 1 — Simple CNN (from scratch)

Architecture: 3 convolutional blocks (16, 32, 64 filters), ReLU, MaxPool, then a fully-connected layer 64*28*28 -> 512 -> num_classes.
Why try it? Baseline to confirm dataset is learnable and to practice designing from-scratch networks.
Results observed: Validation accuracy improved from ~37% -> ~57% . Shows model can learn but likely underpowered for complex visual patterns and prone to overfitting on small dataset.
Key intuition: Small networks require careful capacity tuning — too small => underfit; too big => overfit. Also, training from scratch with limited images is harder.

Model 2 — CNN with Regularization (BatchNorm + Dropout + weight decay)

Additions: BatchNorm after convs, Dropout before final FC, L2 weight decay in optimizer.
Why: BatchNorm stabilizes training and speeds convergence; Dropout and weight decay reduce overfitting by discouraging co-adaptations.
Observations: Training behaviour became noisier at first, and validation accuracy dropped to ~50% vs baseline. Interpretation: adding heavy regularization on a small-capacity network can harm learning because it restricts capacity the model needs to fit patterns. This is a useful diagnostic step: regularization is not always helpful unless capacity and dataset size are balanced.

Model 3 — Transfer Learning with EfficientNet-B0

Approach: Load pretrained efficientnet_b0 (ImageNet weights), freeze all parameters, replace classifier, train classifier head.
Why try it? Transfer learning brings strong ImageNet visual features that often generalize well for small datasets.
Results observed: Validation accuracy improved to ~65.7% after training the classifier head.
Takeaway: Transfer learning provides a large boost because low/mid-level features (edges, textures) are reused; however, freezing too many layers may limit adaptation to domain-specific features (car damage patterns).

Model 4 — Transfer Learning with ResNet50 (layer4 unfrozen)

Approach: Load resnet50 pretrained weights, freeze all layers, then unfreeze layer4 and fc. Replace fc with Dropout(dropout_rate) + Linear.
Why unfreeze layer4? layer4 contains higher-level features (object parts, combinations). Fine-tuning these helps adapt to domain specifics while keeping earlier layers stable.
Results observed: Substantial improvement — validation accuracy reached ~76–79% depending on run and hyperparameters. This became the strong contender.
Intuition: Fine-tuning deeper layers enables the network to learn domain-specific high-level cues (e.g., crushed vs breakage visual patterns) while retaining generic low-level features.

Model 5 — Hyperparameter Tuning with Optuna (ResNet50)

Hyperparameters tuned:
- Learning rate (lr) in [1e-5, 1e-2] (log scale)
- Dropout rate in [0.2, 0.7]
Setup: Use Optuna to run n_trials=20. For each trial we:
Instantiate ResNet50 with suggested dropout.
Unfreeze layer4 and fc, keep other layers frozen.
Train for several epochs for quick evaluation and allow Optuna pruning.
Report validation accuracy as objective to maximize.
Best trials:
- Several trials reached ~80% validation accuracy during quick optimization. The best recorded trial gave ~80.17% accuracy with certain lr/dropout combos.
Final training: Re-trained ResNet50 with the chosen hyperparameters (lr=0.005, dropout=0.2) for several epochs and obtained ~80.52% validation accuracy.

7. Training details

Device: CUDA (GPU) used during training.
Loss: nn.CrossEntropyLoss() for multi-class classification.
Optimizer: optim.Adam(...) (used Adam for both from-scratch and transfer learning experiments).
Batch size: 32
Image size: 224x224
Data split: 75% train / 25% validation
Model saving: torch.save(model.state_dict(), 'saved_model.pth')

8. Results (final model)

Final chosen model: ResNet50 (layer4 + fc unfrozen), tuned hyperparams.
Validation Metrics (Example Classification Report):

Confusion matrix: The confusion matrix highlights classes where the model confuses severity (e.g., breakage vs crushed) — useful for targeted data augmentation.

After selecting hyperparameters, I retrained the ResNet model and obtained ~80.5% validation accuracy.
The classification report (precision/recall/f1 per class) shows good performance across the 6 classes with macro-avg f1 ≈ 0.79 and weighted avg ≈ 0.80.
Confusion matrix analysis helps identify confusable class pairs (e.g., or R_Normal vs F_Crushed vs F_Breakage , R_Breakage ) — these insight reveal which data augmentations or additional data could help

9. Streamlit app — inference & UX

Files: app.py, model_helper.py, saved weights saved_model.pth.
What it does:
- Provides a drag-and-drop image uploader.
- Loads the saved ResNet model and performs inference.
- Displays predicted class label (one of the six), along with the uploaded image.
Key inference details in model_helper.py:
- Rebuilds model architecture exactly the same as training (ResNet50 with dropout in fc).
- Loads saved_model.pth via load_state_dict.
- Applies same preprocessing (Resize 224, ToTensor, Normalize with ImageNet mean/std).

10. Deployement

Deployment: The complete project (training code, saved model, and inference helpers) is hosted on GitHub and deployed as a Streamlit web app. Users can upload a car image from any device and get a real-time prediction of the damage category, making the model globally accessible without any local setup.

11. Implementation & engineering details

Frameworks: PyTorch, torchvision, Optuna, sklearn (metrics), matplotlib (plots).
Hardware: GPU ( cuda ) for faster training.
Training loop details:
- CrossEntropyLoss for multi-class classification.
- Adam optimizer; weight_decay used when experimenting with regularization.
- Training/Val loops separated; validation performed at epoch end with torch.no_grad().
- Batch-level logs for loss, epoch-level average loss, and validation accuracy printed for monitoring.
Model saving: torch.save(model.state_dict(), 'saved_model.pth') for deployment or inference later.

12. Reproducibility & best practices followed

Fixed train/val split and recorded dataset sizes.
Used ImageNet normalization for transfer learning compatibility.
Kept experiments modular: same train_model() function reused across models for fair comparison.
Used Optuna with pruning to efficiently search hyperparameters and logged best trials.
Saved final model weights and logged confusion matrix + classification report for auditability.

Link to project

Github Link