AI motion tracking pipeline — 3D
body mesh, hand tracking, pose estimation, and depth reconstruction from monocular video.
GVHMR — 3D Body Mesh
World-Grounded Human Mesh
Recovery (SIGGRAPH Asia 2024). Recovers full 3D SMPL body meshes with world-space positioning
and ground plane attachment. Includes automatic HaMeR hand tracking (MANO finger pose),
Savitzky-Golay temporal smoothing, and foot-contact ground pinning.
Outputs: animation.fbx (rigged skeleton) ·
meshes.zip (OBJ per frame) · joints_3d.zip · smpl_params.zip
Post-processing: Savitzky-Golay smoothing
(body + hands) · foot-contact ground pinning · cm-scale OBJ + FBX
Depth Point Cloud
Combines SAM3 person
segmentation with DepthAnything2 monocular depth estimation to generate 3D point clouds (.ply)
of detected subjects. Useful for volumetric capture previews and spatial analysis.
Outputs: pointcloud.zip · depth_video.mp4
DWPose — 2D Body
Skeleton
YOLOv8-Pose for real-time 2D
skeleton extraction. Detects 17 body keypoints (COCO format) per person per frame with
confidence scoring. Includes temporal smoothing (Savitzky-Golay) to reduce jitter between
frames.
Outputs: pose_overlay.mp4 · skeletons.zip
(JSON per frame) · pose_summary.json
Wholebody —
133-Point
Full COCO-WholeBody keypoint
detection: 17 body + 68 face + 42 hand (21 per hand) + 6 foot keypoints. Uses the DWPose model
for comprehensive whole-body motion capture including finger articulation and facial landmark
tracking.
Outputs: wholebody_overlay.mp4 ·
wholebody.zip (JSON per frame)
MediaPipe Hands
Google MediaPipe hand
landmark detection. 21 keypoints per hand (fingertips, knuckles, wrist) with real-time
performance. Tracks up to 2 hands simultaneously with sub-pixel accuracy for finger-level
gesture recognition.
Outputs: hands_overlay.mp4 ·
hand_landmarks.zip (JSON per frame)
🎯
Pipeline
Select
one or more methods per job. Each runs independently on Thor's GPU:
- GVHMR — 3D body + hands (HaMeR) → FBX
skeleton + OBJ meshes
- Point Cloud — SAM3 mask → DepthAnything2
→ 3D .ply export
- DWPose — YOLOv8-Pose 17-point body
skeleton with smoothing
- Wholebody — 133-point body + hands + face
keypoints
- MediaPipe — 21-point per-hand finger
tracking
⚡
Features
- Test Frame — Process a single frame for
quick validation before committing to full video
- Ground Pinning — Automatic foot-contact
detection pins the skeleton to the floor plane
- Temporal Smoothing — Savitzky-Golay
filter removes jitter from body pose, translation, and 2D keypoints
- FBX Export — Animated skeleton with body
+ hand poses, cm-scale, ready for Blender/Maya import
- 3D Viewer — Preview meshes and skeletons
in-browser via Three.js with playback controls
- Dedup Upload — Files are hashed and
cached — re-submitting the same file skips the upload