Skip to content

Volleyball Detection with Deep Learning #1005#1091

Open
ash-heinz wants to merge 1 commit into
abhisheks008:mainfrom
ash-heinz:volleyball-detection-with-deep-learning
Open

Volleyball Detection with Deep Learning #1005#1091
ash-heinz wants to merge 1 commit into
abhisheks008:mainfrom
ash-heinz:volleyball-detection-with-deep-learning

Conversation

@ash-heinz

Copy link
Copy Markdown

Pull Request for DL-Simplified 💡

Issue Title : Volleyball Detection using Deep Learning

  • Info about the related issue (Aim of the project) : The primary objective of this project is to build, train, and benchmark multiple object detection architectures to determine the optimal algorithm for tracking a fast-moving, small object (a volleyball) in video footage.
  • Name: Ashwast Gupta
  • GitHub ID: ash-heinz
  • Email ID: ashwast.gupta@gmail.com
  • Identify yourself: GSSoC'26 Participant

Closes: #1005

Models Evaluated

  1. YOLOv8 (Nano & Medium): State-of-the-art single-stage detector.
  2. Faster R-CNN (ResNet50 FPN): Heavyweight, highly accurate two-stage detector.
  3. SSD300 (VGG16): Single Shot MultiBox Detector bridging the gap between speed and architectural complexity.
  4. RetinaNet (ResNet50 FPN): Single-stage detector utilizing Focal Loss to handle extreme foreground-background class imbalance (critical for finding a small ball on a large court).

Methodology & Approach

  1. Baseline Establishment: Trained YOLOv8 Nano to establish a fast but easily confused baseline. Scaled up to YOLOv8 Medium to solve motion blur and tracking inconsistencies.
  2. Custom Data Pipeline: Engineered a PyTorch DataLoader to bridge YOLO text annotations into tensor dictionaries required by torchvision models.
  3. Model Head Modification: Loaded pre-trained backbones (ResNet50/VGG16) and manually modified the prediction heads to output exactly 2 classes: Background (0) and Volleyball (1).
  4. Training Stability: Implemented Gradient Clipping (max_norm=1.0) and reduced learning rates (0.0005) to prevent exploding gradients (NaN loss) during the highly volatile early epochs of SSD and RetinaNet training.
  5. Evaluation Engine: Wrote a custom local Python script (benchmark.py) using OpenCV to run exactly 100 frames of a test video through all four architectures back-to-back to calculate true inference speed (FPS). Utilized torchmetrics to calculate Mean Average Precision (mAP) for the PyTorch models.

How Has This Been Tested? ⚙️

Testing was done by uploading a clip of another volleyball match from YouTube. The test video was then tested upon by all the various best models acquired from training on Google Colab. The testing was done by creating a python script to run and show the output frame by frame, and upon completion saving the output video to view.
The efficiency was tested by running a benchmark test for all the models to obtain the FPS.

Checklist: ☑️

  • My code follows the guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly wherever it was hard to understand.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added things that prove my fix is effective or that my feature works.
  • Any dependent changes have been merged and published in downstream modules.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Our team will soon review your PR. Thanks @ash-heinz :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Volleyball Detection using Deep Learning

1 participant