Hackathon Showcase 1st Place Winner

Posey

4 members

Project Description

Posey is a mobile AI coaching app that turns the phone in your pocket into a personal strength-training coach. The user records a set directly inside the app — squat, pushup, bicep curl, and more on the way — and within seconds Posey returns an annotated playback with the 3D skeleton overlaid on every frame, a 0–100 form score, prioritized fix-it cues per rep (“knees collapsing inward at the bottom of rep 2”), and a longitudinal progress timeline. Under the hood, every clip is processed by our pose-intelligence engine, which extracts a 33-point 3D skeleton with MediaPipe BlazePose, smooths motion with a One‑Euro or Kalman filter, segments the video into biomechanical reps and phases, and scores form against a YAML-driven rule library plus optional trainer reference clips — every cue traceable back to a specific rule, body part, phase, and timestamp.

From experimental AI to venture-ready solution
We started the sprint with a research notebook of raw MediaPipe landmarks and ad‑hoc heuristics. Across rapid iterations we hardened it into a mobile-shippable engine:

Stage 1 → typed pipeline. Wrapped MediaPipe in a deterministic extract_pose_sequence with explicit CPU delegate (no NSOpenGL / headless‑GL crashes on lower-end devices), pinned model with cache-on-first-use, and frame-aligned PoseFrameSequence so the in-app overlay never desynchronizes from the source video.
Stage 2 → meaning layer. pose_sequence_to_meaning converts raw landmarks into a versioned MeaningDocument JSON with metrics, motion segments, and transition events — the wire contract the mobile UI, the report card, and our future LLM coach all consume.
Stage 3 → explainable scorer. A modular pose_evaluation/ package (preprocessing, features, segmentation, scoring, reference, aggregation, feedback) produces an overall score, six sub-scores (posture, ROM, symmetry, stability, tempo, reference similarity), prioritized issues, and frame annotations — with safety caps that hard-clamp the score on severe biomechanical risks (knee valgus, back rounding) so the app never green-lights an injury pattern.
Stage 4 → integration-ready surface. A Gradio demo for judges and pilot users, a CLI, and a clean Python API designed to be wrapped behind a thin REST/gRPC gateway for the mobile client — JSON contracts that map 1:1 to the screens we’re building.
Market problem & traction
Problem: 1‑on‑1 coaching is the gold standard for safe strength training but costs $60–$150/session and is out of reach for the 200M+ people who train at home or in commercial gyms without supervision. Existing mobile apps either count reps (Apple Fitness, Tonal) or stream pre-recorded classes. None give form-quality feedback at the rep level with auditable reasoning — and none of that fits inside a phone-native experience.

Sprint traction: the engine runs end-to-end on CPU and processes short clips in real time on a laptop (a strong signal for on-device mobile inference); full pytest suite green; three exercises shipped (squat, pushup, bicep curl) configured purely in YAML — no code deploys to add the next one; synthetic biomechanical test harness regression-tests new rules without re-recording videos; Gradio share link is demo-ready as a stand-in for the mobile UI while the iOS/Android client is wired against the same JSON contracts.

Autonomous agents / custom LLM strategy
Today the deterministic rule engine is the agent: a YAML-declared scorer that reads features, applies severity-tiered thresholds with soft penalties and per-rule safety caps (scoring.py + aggregation.py), and emits structured RuleViolation objects. This was a deliberate architectural choice — clinicians and insurers will not accept a black-box LLM grading a squat. Our engine produces grounded, citable evidence; the LLM then narrates it.

The next-sprint layer (scaffolded in posey_demo/report/) is an in-app coaching agent that consumes the same MeaningDocument + EvaluationResult JSON and generates personalized push notifications, weekly progression plans, and natural-language Q&A inside the app’s chat tab — keeping the deterministic core as the source of truth and the LLM strictly as the explainer, which keeps hallucination risk near zero on a surface where bad advice can hurt people.

Technical scalability
Stateless pipeline, CPU-first inference (so it ports to Core ML / TFLite for on-device mobile execution and avoids GPU server cost lock-in), per-job artifact directories keyed by UUID, env-driven data root, and pure-Python core that horizontally scales behind any queue when we run heavy reference-comparison jobs server-side. Adding a new exercise is a pure-config change — drop a YAML, ship an OTA update.

Tech stack
Pose & vision: MediaPipe Tasks (BlazePose Lite, 33-landmark 3D), OpenCV (headless build) — both with first-class iOS / Android runtimes for the eventual on-device path.
Numerics & filters: NumPy, SciPy, custom One‑Euro and Kalman temporal smoothers.
Scoring: PyYAML-driven rule engine, dataclass-based domain model, biomechanical feature extraction (joint angles, valgus proxies, trunk angle, symmetry).
Demo & integration surface: Gradio 5 web UI (judge/pilot demos), argparse CLIs (posey-gradio, posey-evaluate), JSON-over-Python API designed for the mobile client to consume.
Tooling & quality: Python 3.13, uv package manager, Hatchling build backend, pytest with synthetic-fixture regression tests, structured logging.
Roadmap (scaffolded): LLM coaching agent in posey_demo/report/, REST gateway for the mobile client, on-device MediaPipe via Core ML / TFLite, Postgres for longitudinal user progress.
Rapid iteration → production readiness
Eighteen merged commits across four staged PRs (pose_extractor → feat/pipeline-http-server → feature/gradio-ui-replace-fastapi → pose_evaluation), each landing behind tests. Mid-sprint we pivoted from a FastAPI server to a Gradio demo when pilot users told us they wanted a one-click visual demo, not a JSON endpoint — the engine core stayed untouched because of the typed MeaningDocument contract. That same contract is what the mobile app will call into, which means the iOS/Android integration is a frontend problem, not a re-architecture. That is the production-readiness signal: we swapped the entire UI in one PR without breaking a single downstream test, and we can do it again when the mobile client lands.

Team

Products & Tools

Schoolab SundAI