A desktop GUI application for manually annotating video clips. Annotators draw pixel-level segmentation masks over footage and answer structured survey questions defined in a config file.
Edit `config/config.yaml` to set your `data_dir` and `out_dir`, then edit `config/clips.txt` to list the clips you want to annotate. Survey questions are defined in `config/questions.yaml` (committed to the repo; edit to customise). See the [Configuration](#configuration) section for all available options.
By default the tool reads clips from and writes annotations to the local filesystem (`storage: local`). To use an S3-compatible object store instead, set `storage: s3` in `config/config.yaml` and give `data_dir` / `out_dir` as `bucket/prefix` paths:
```yaml
storage: s3
data_dir: my-bucket/clips
out_dir: my-bucket/annotation_results
```
Copy `.env.example` to `.env` and fill in your credentials — the app loads this file automatically at startup:
> **River annotation reference:** If you are annotating river footage, consult the [river annotation guide](https://docs.google.com/document/d/1iPN9JxiDtb60kC0yjO8tTM0XEfDTGM5ysw33WY-BEDQ/edit?usp=sharing) for guidance on how to draw masks correctly.
| `--extras` | off | Also save GIFs and extra PNGs (see Output section). **Note: GIF encoding is slow — saving will freeze the UI for several seconds per clip.** |
Output filenames (`mask.png`, `metadata.json`, etc.) have sensible defaults and can be overridden in the `filenames:` block — see [`config.py`](src/clip_annotator/config.py) for the full list.
Survey questions are defined in `config/questions.yaml` (committed to the repo). Add, remove, or reorder sections and items — the UI rebuilds automatically. `key` is what gets saved in `metadata.json`; `default` selects the pre-checked option (omit or set to `null` to leave unselected).
`config/optical_flow_config.yaml` controls the **Auto Segment** button. When pressed, the tool computes a segmentation mask from the loaded frames and replaces the current mask (undoable). The segmentation combines two criteria:
- **Optical flow magnitude** — pixels where the temporal median of frame-to-frame flow (scaled by FPS) exceeds a fraction of the maximum are considered moving.
`config/clips.txt` lists bare clip filenames (not full paths) to annotate, one per line — the tool prepends `data_dir` automatically. Each entry must be the exact filename of the ZIP archive as it appears in `data_dir` (e.g. `clip_20230501T120000.zip`). Lines starting with `#` are ignored. Clips are processed in order; a clip is considered already-annotated when `<out_dir>/<clip_stem>/mask.png` exists and is skipped automatically. Pass `--no-skip` to include already-annotated clips. When the last clip is reached, a dialog appears and the app exits.
To distribute clips across multiple annotators, create one clips file per annotator (e.g. `config/annotator_A.txt`) listing their assigned clip filenames, then pass it via `--clips`:
Assigning non-overlapping clip lists lets each annotator work independently. Intentionally overlapping a subset of clips across annotators enables inter-annotator agreement checks.
The window is split into two panels: the **video canvas** on the left (~70% of the width) and the **survey panel** on the right. The video auto-plays as a looping preview. Drawing tools and mask controls are arranged above and beside the canvas; navigation buttons (**Previous / Next / Skip**) sit at the top.
| Optical flow first guess | **Auto Segment** — replaces the current mask with an automatic segmentation based on motion and brightness; undoable. Disabled when `enabled: false` in `config/optical_flow_config.yaml`. |
| Save and continue | **Next** — saves current clip and loads the next one. If the clip already has a saved annotation a dialog asks whether to replace it or keep the existing save. The UI freezes during saving; this is normal — wait for it to complete before clicking again. |
Each clip is a ZIP archive containing a video file. The video inside the archive must be named `left.mp4` by default; if your archives use a different internal name, set `filenames.video_in_zip` in `config.yaml`. The ZIP filename becomes the output folder name (e.g. `clip_20230501T120000.zip` → `<out_dir>/clip_20230501T120000/`).
Up to `max_frames` frames are extracted from the video and scaled so the longest side is `display_max` px. This display-resolution copy is what the annotator works on; the full-resolution dimensions are remembered separately so the saved mask is upscaled back to the original size on export.
The mask is a binary array at display resolution. **Brush** strokes stamp a filled circle (draw or erase). **Polygon** shapes are stored as overlays and don't touch the mask until a **Fill** click rasterises them — the innermost polygon containing the click is filled, and any polygon whose centroid falls inside it is punched out as a hole.
Every mask-changing operation is pushed onto an undo stack before it executes. On save, the mask is upscaled to the original video resolution and written as an 8-bit PNG (0 or 255).
When a clip is loaded that already has a saved `mask.png` and `metadata.json`, the mask is restored at display resolution and the survey answers are pre-filled.