top of page

Deep Learning Projects

The objective of this project is to develop a Proof of Concept (POC) for car damage detection that can classify the condition of a car's front and rear into six predefined categories. The solution will be delivered as a trained deep learning model integrated into a Streamlit app. This POC will serve as a foundation for evaluating the viability of an automated damage detection system for VROOM Cars.

DL_Project_SS_edited_edited.jpg

Vehicle Damage Detection

1. Project Summary

  • An end-to-end deep learning pipeline to classify vehicle images into 6 damage categories (Front Breakage, Front Crushed, Front Normal, Rear Breakage, Rear Crushed, Rear Normal).

  • The project includes dataset preparation and augmentation, training and comparing 5 different models (from scratch CNNs to transfer learning with ResNet), hyperparameter tuning with Optuna, evaluation with confusion matrix and classification reports, model saving, and a Streamlit app for easy end-user inference.

  • The final ResNet-based model achieves ~80.5% validation accuracy on the held-out validation set.

2. Why this problem matters

  • Automated vehicle damage classification speeds up insurance claim triaging, helps repair shops estimate repairs, and can be integrated into mobile apps for customers.

  • The 6-class split models a realistic use-case where both location (front/rear) and severity (normal/breakage/crushed) matter.

3. What I built (high-level)

  • Collected/organized the image dataset and used ImageFolder layout: dataset/<class_name>/*.jpg.

  • Built augmentation and preprocessing pipeline appropriate for ConvNets and pretrained backbones.

  • Trained multiple model families: simple CNN, CNN with regularization, EfficientNet-B0 (transfer learning), ResNet50 (transfer learning), and Finetuned ResNet50 using Optuna.

  • Evaluated each model on the same validation split and compared metrics (accuracy, precision, recall, F1 per class) and confusion matrix.

  • Selected the best model, exported weights, and wrapped inference inside a Streamlit app for end-to-end demo.

  • Documented reproducible steps, saved model artifacts, and created a small guide for deploying the app.

4. Dataset & Setup

  • Source: Local folder organized for ImageFolder with subfolders per class.

  • Counts: 2300 images total (example), split 75% train / 25% val (1725 / 575). Batch size 32.

  • Classes (example): ['F_Breakage','F_Crushed','F_Normal','R_Breakage','R_Crushed','R_Normal'] (6 classes).

  • Why this split? 75/25 is a common choice giving enough data to train while retaining a reasonable validation set to measure generalization.

5. Preprocessing & Augmentation — reasoning and choices

DL_Project_SS_2.png
  • Why each step?

    • RandomHorizontalFlip() — provides invariance to left-right orientation; useful because damage on left vs right of a vehicle can be symmetric.

    • RandomRotation(10) — small rotations model camera tilt differences from real-world captures.

    • ColorJitter(...) — simulates lighting and exposure variation across images (important for field-captured photos).

    • Resize(224,224) + standard ImageNet normalization — compatible with pretrained backbones (ResNet/EfficientNet) and ensures stable optimization.

6. Models, design choices & intuition

Model 1 — Simple CNN (from scratch)
  • Architecture: 3 convolutional blocks (16, 32, 64 filters), ReLU, MaxPool, then a fully-connected layer 64*28*28 -> 512 -> num_classes.

  • Why try it? Baseline to confirm dataset is learnable and to practice designing from-scratch networks.

  • Results observed: Validation accuracy improved from ~37% -> ~57% . Shows model can learn but likely underpowered for complex visual patterns and prone to overfitting on small dataset.

  • Key intuition: Small networks require careful capacity tuning — too small => underfit; too big => overfit. Also, training from scratch with limited images is harder.

Model 2 — CNN with Regularization (BatchNorm + Dropout + weight decay)
  • Additions: BatchNorm after convs, Dropout before final FC, L2 weight decay in optimizer.

  • Why: BatchNorm stabilizes training and speeds convergence; Dropout and weight decay reduce overfitting by discouraging co-adaptations.

  • Observations: Training behaviour became noisier at first, and validation accuracy dropped to ~50% vs baseline. Interpretation: adding heavy regularization on a small-capacity network can harm learning because it restricts capacity the model needs to fit patterns. This is a useful diagnostic step: regularization is not always helpful unless capacity and dataset size are balanced.

Model 3 — Transfer Learning with EfficientNet-B0
  • Approach: Load pretrained efficientnet_b0 (ImageNet weights), freeze all parameters, replace classifier, train classifier head.

  • Why try it? Transfer learning brings strong ImageNet visual features that often generalize well for small datasets.

  • Results observed: Validation accuracy improved to ~65.7% after training the classifier head.

  • Takeaway: Transfer learning provides a large boost because low/mid-level features (edges, textures) are reused; however, freezing too many layers may limit adaptation to domain-specific features (car damage patterns).

Model 4 — Transfer Learning with ResNet50 (layer4 unfrozen)
  • Approach: Load resnet50 pretrained weights, freeze all layers, then unfreeze layer4 and fc. Replace fc with Dropout(dropout_rate) + Linear.

  • Why unfreeze layer4? layer4 contains higher-level features (object parts, combinations). Fine-tuning these helps adapt to domain specifics while keeping earlier layers stable.

  • Results observed: Substantial improvement — validation accuracy reached ~76–79% depending on run and hyperparameters. This became the strong contender.

  • Intuition: Fine-tuning deeper layers enables the network to learn domain-specific high-level cues (e.g., crushed vs breakage visual patterns) while retaining generic low-level features.

Model 5 — Hyperparameter Tuning with Optuna (ResNet50)
  • Hyperparameters tuned:

    • Learning rate (lr) in [1e-5, 1e-2] (log scale)

    • Dropout rate in [0.2, 0.7]

  • Setup: Use Optuna to run n_trials=20. For each trial we:

  • Instantiate ResNet50 with suggested dropout.

  • Unfreeze layer4 and fc, keep other layers frozen.

  • Train for several epochs for quick evaluation and allow Optuna pruning.

  • Report validation accuracy as objective to maximize.

  • Best trials:

    • Several trials reached ~80% validation accuracy during quick optimization. The best recorded trial gave ~80.17% accuracy with certain lr/dropout combos.

  • Final training: Re-trained ResNet50 with the chosen hyperparameters (lr=0.005, dropout=0.2) for several epochs and obtained ~80.52% validation accuracy.

7. Training details

  • Device: CUDA (GPU) used during training.

  • Loss: nn.CrossEntropyLoss() for multi-class classification.

  • Optimizer: optim.Adam(...) (used Adam for both from-scratch and transfer learning experiments).

  • Batch size: 32

  • Image size: 224x224

  • Data split: 75% train / 25% validation

  • Model saving: torch.save(model.state_dict(), 'saved_model.pth')

8. Results (final model)

  • Final chosen model: ResNet50 (layer4 + fc unfrozen), tuned hyperparams.

  • Validation Metrics (Example Classification Report):​​

DL_Project_SS_3.png
  • Confusion matrix:  The confusion matrix highlights classes where the model confuses severity (e.g., breakage vs crushed) — useful for targeted data augmentation.

DL_Project_SS_4.png
  • After selecting hyperparameters, I retrained the ResNet model and obtained ~80.5% validation accuracy.

  • The classification report (precision/recall/f1 per class) shows good performance across the 6 classes with macro-avg f1 ≈ 0.79 and weighted avg ≈ 0.80.

  • Confusion matrix analysis helps identify confusable class pairs (e.g., or R_Normal vs F_Crushed vs F_Breakage , R_Breakage ) — these insight reveal which data augmentations or additional data could help

9. Streamlit app — inference & UX

  • Files: app.py, model_helper.py, saved weights saved_model.pth.

  • What it does:

    • Provides a drag-and-drop image uploader.

    • Loads the saved ResNet model and performs inference.

    • Displays predicted class label (one of the six), along with the uploaded image.

  • Key inference details in model_helper.py:

    • Rebuilds model architecture exactly the same as training (ResNet50 with dropout in fc).

    • Loads saved_model.pth via load_state_dict.

    • Applies same preprocessing (Resize 224, ToTensor, Normalize with ImageNet mean/std).​​

​

​

​

​

DL_Project_SS.png

10. Deployement

  • Deployment: The complete project (training code, saved model, and inference helpers) is hosted on GitHub and deployed as a Streamlit web app. Users can upload a car image from any device and get a real-time prediction of the damage category, making the model globally accessible without any local setup.

11. Implementation & engineering details

  • Frameworks: PyTorch, torchvision, Optuna, sklearn (metrics), matplotlib (plots).

  • Hardware: GPU ( cuda ) for faster training.

  • Training loop details:

    • CrossEntropyLoss for multi-class classification.

    • Adam optimizer; weight_decay used when experimenting with regularization.

    • Training/Val loops separated; validation performed at epoch end with torch.no_grad().

    • Batch-level logs for loss, epoch-level average loss, and validation accuracy printed for monitoring.

  • Model saving: torch.save(model.state_dict(), 'saved_model.pth') for deployment or inference later.

12. Reproducibility & best practices followed

  • Fixed train/val split and recorded dataset sizes.

  • Used ImageNet normalization for transfer learning compatibility.

  • Kept experiments modular: same train_model() function reused across models for fair comparison.

  • Used Optuna with pruning to efficiently search hyperparameters and logged best trials.

  • Saved final model weights and logged confusion matrix + classification report for auditability.

bottom of page