SoccerVision
PeterDrury uses computer vision and NLP to provide real-time player tracking, stats, and human-like commentary for football matches.
Video Video
https://drive.google.com/file/d/1VUyNu5yKXSJby9WmLS8THQpklfR_VVRZ/view?usp=drive_linkProject Description
Project Concept — SoccerVision: AI Football Video Analyst
SoccerVision is a multimodal AI system that processes football (soccer) match clips to automatically analyze, annotate, and describe what happens on the pitch.
Core Features
Player & Object Detection: Detects and labels players, referees, goalkeepers, and the ball using YOLO-based models.
Pitch Recognition: Identifies the football field and tracks player movement.
Event Description: Generates structured chronological descriptions of key actions (passes, shots, goals) using a multimodal model.
Annotated Video Output: Adds subtitles of the AI-generated commentary directly on the annotated video for a clear visual understanding.
Technical Overview
Processing Pipeline:
Extract video frames.
Run detection models (players, ball, pitch).
Track objects over time and assign player IDs.
Send selected frames to a multimodal LLM (Gemini) for contextual event description.
Infrastructure:
Deployed on Google Cloud Run for scalable video analysis.
Uses Cloud Storage for frame management and Cloud Functions to trigger analysis.
Outputs both a JSON description file and an annotated video with subtitles.
Current Progress
Implemented the full detection-to-description pipeline and subtitle overlay on videos.
Audio Commentary generation (text-to-speech) is in progress but will not be part of the live demo.
Future Work
Audio Commentary Integration: Add real-time spoken commentary synchronized with video.
Statistical Insights: Compute metrics such as possession rate, pass accuracy, and player heatmaps.
Interactive Chat: Enable natural language queries on the clip (e.g., “What happened at the beginning?” or “Who made the assist?”).
Vision
Build a fully autonomous football analyst capable of detecting, describing, and explaining match events in real time through scalable cloud infrastructure.