A desktop GUI application for manually annotating video clips. Annotators draw pixel-level segmentation masks over footage and answer structured survey questions defined in a config file.
Edit `config/config.yaml` to set your `data_dir` and `out_dir`, then edit `config/clips.txt` to list the clips you want to annotate. Survey questions are defined in `config/questions.yaml` (committed to the repo; edit to customise). See the [Configuration](#configuration) section for all available options.
By default the tool reads clips from and writes annotations to the local filesystem (`storage: local`). To use an S3-compatible object store instead, set `storage: s3` in `config/config.yaml` and give `data_dir` / `out_dir` as `bucket/prefix` paths:
```yaml
storage: s3
data_dir: my-bucket/clips
out_dir: my-bucket/annotation_results
```
Copy `.env.example` to `.env` and fill in your credentials — the app loads this file automatically at startup:
```sh
cp .env.example .env
# edit .env with your credentials
```
| Variable | Description |
|---|---|
| `S3_ACCESS_KEY` | Access key ID |
| `S3_SECRET_ACCESS_KEY` | Secret access key |
| `S3_ENDPOINT_URL` | Endpoint URL (defaults to `https://os.zhdk.cloud.switch.ch` if not set) |
| `AWS_REQUEST_CHECKSUM_CALCULATION` | Set to `when_required` to avoid checksum errors on SwitchEngines/Ceph |
| `AWS_RESPONSE_CHECKSUM_VALIDATION` | Set to `when_required` to avoid checksum errors on SwitchEngines/Ceph |
The `clips_file` (the list of clip filenames to annotate) is always read from the local filesystem even when `storage: s3`.
Output filenames (`mask.png`, `metadata.json`, etc.) have sensible defaults and can be overridden in the `filenames:` block — see [`config.py`](src/clip_annotator/config.py) for the full list.
Survey questions are defined in `config/questions.yaml` (committed to the repo). Add, remove, or reorder sections and items — the UI rebuilds automatically. `key` is what gets saved in `metadata.json`; `default` selects the pre-checked option (omit or set to `null` to leave unselected).
`config/optical_flow_config.yaml` controls the **Auto Segment** button. When pressed, the tool computes a segmentation mask from the loaded frames and replaces the current mask (undoable). The segmentation combines two criteria:
- **Optical flow magnitude** — pixels where the temporal median of frame-to-frame flow (scaled by FPS) exceeds a fraction of the maximum are considered moving.
`config/clips.txt` lists bare clip filenames (not full paths) to annotate, one per line — the tool prepends `data_dir` automatically. Each entry must be the exact filename of the ZIP archive as it appears in `data_dir` (e.g. `clip_20230501T120000.zip`). Lines starting with `#` are ignored. Clips are processed in order; a clip is considered already-annotated when `<out_dir>/<clip_stem>/mask.png` exists and is skipped automatically. Pass `--no-skip` to include already-annotated clips. When the last clip is reached, a dialog appears and the app exits.
To distribute clips across multiple annotators, create one clips file per annotator (e.g. `config/annotator_A.txt`) listing their assigned clip filenames, then pass it via `--clips`:
Assigning non-overlapping clip lists lets each annotator work independently. Intentionally overlapping a subset of clips across annotators enables inter-annotator agreement checks.
The window is split into two panels: the **video canvas** on the left (~70% of the width) and the **survey panel** on the right. The video auto-plays as a looping preview. Drawing tools and mask controls are arranged above and beside the canvas; navigation buttons (**Previous / Next / Skip**) sit at the top.
| Optical flow first guess | **Auto Segment** — replaces the current mask with an automatic segmentation based on motion and brightness; undoable. Disabled when `enabled: false` in `config/optical_flow_config.yaml`. |
| Save and continue | **Next** — saves current clip and loads the next one. If the clip already has a saved annotation a dialog asks whether to replace it or keep the existing save. |
| Go back | **Previous** — saves current clip and returns to the previously viewed clip. Disabled on the first clip. |
| Skip without saving | **Skip** — discards any unsaved changes and loads the next clip without writing anything to disk. |
Each clip is a ZIP archive containing a video file. The video inside the archive must be named `left.mp4` by default; if your archives use a different internal name, set `filenames.video_in_zip` in `config.yaml`. The ZIP filename becomes the output folder name (e.g. `clip_20230501T120000.zip` → `<out_dir>/clip_20230501T120000/`).
Up to `max_frames` frames are extracted from the video and scaled so the longest side is `display_max` px. This display-resolution copy is what the annotator works on; the full-resolution dimensions are remembered separately so the saved mask is upscaled back to the original size on export.
The mask is a binary array at display resolution. **Brush** strokes stamp a filled circle (draw or erase). **Polygon** shapes are stored as overlays and don't touch the mask until a **Fill** click rasterises them — the innermost polygon containing the click is filled, and any polygon whose centroid falls inside it is punched out as a hole.
Every mask-changing operation is pushed onto an undo stack before it executes. On save, the mask is upscaled to the original video resolution and written as an 8-bit PNG (0 or 255).
When a clip is loaded that already has a saved `mask.png` and `metadata.json`, the mask is restored at display resolution and the survey answers are pre-filled.