Completed

DataGym.ai

DataGym.ai is a modern, web-based workbench for labeling images and videos. I built this tool to streamline the process of creating high-quality training data for machine learning models.

What it does

The platform allows teams to manage annotation projects end-to-end: organize datasets, label data with various annotation types, control quality through built-in review workflows, and export labeled data for model training.

Key Features

  • Multi-format support - Works with images (JPEG, PNG) and videos (MP4), including high-resolution assets
  • Rich annotation tools - Points, lines, bounding boxes, polygons, image segmentation, and video object tracking with linear interpolation
  • Quality control - Integrated review process with task lifecycle management (backlog → in progress → completed → reviewed)
  • Flexible storage - Direct uploads, public URLs, or AWS S3 integration
  • API-first design - Full REST API with Python SDK for integration into ML pipelines
  • Data portability - JSON import/export for labeled data and configurations

Tech Stack

Built with Java/Spring Boot on the backend and Angular on the frontend. The entire stack runs via Docker Compose for easy local deployment.

Status

The project is open source under MIT license and available for self-hosting.