A desktop GUI application for manually annotating video clips. Annotators draw pixel-level segmentation masks over footage and answer structured survey questions defined in a config file.

Requirements

Python 3.12
uv (recommended) or pip

Quick start

# 1. Clone and install
git clone https://exchange.sensima.ch/ivan.sievering/clip-annotator
cd clip-annotator
uv sync

# 2. Create config and clip list from examples
cp config/config.example.yaml config/config.yaml   # macOS/Linux
cp config/clips.example.txt config/clips.txt

copy config\config.example.yaml config\config.yaml  # Windows
copy config\clips.example.txt config\clips.txt

# 3. Edit config/config.yaml (set data_dir and out_dir)
#    Edit config/clips.txt (list clips to annotate)
#    Edit config/questions.yaml to customise survey questions (optional)

# 4. Run
uv run python -m clip_annotator

Installation

# Install with uv (creates the virtual environment automatically)
uv sync

# Or with pip
python -m venv .venv
.venv\Scripts\activate        # Windows
source .venv/bin/activate     # macOS/Linux
pip install -e .

Setup

Before running, create your config and clip list from the provided examples:

# macOS/Linux
cp config/config.example.yaml config/config.yaml
cp config/clips.example.txt config/clips.txt

# Windows
copy config\config.example.yaml config\config.yaml
copy config\clips.example.txt config\clips.txt

Edit config/config.yaml to set your data_dir and out_dir, then edit config/clips.txt to list the clips you want to annotate. Survey questions are defined in config/questions.yaml (committed to the repo; edit to customise). See the Configuration section for all available options.

S3 storage (optional)

By default the tool reads clips from and writes annotations to the local filesystem (storage: local). To use an S3-compatible object store instead, set storage: s3 in config/config.yaml and give data_dir / out_dir as bucket/prefix paths:

storage: s3
data_dir: my-bucket/clips
out_dir: my-bucket/annotation_results

Copy .env.example to .env and fill in your credentials — the app loads this file automatically at startup:

cp .env.example .env          # macOS/Linux
copy .env.example .env        # Windows
# edit .env with your credentials

Variable	Description
`S3_ACCESS_KEY`	Access key ID
`S3_SECRET_ACCESS_KEY`	Secret access key
`S3_ENDPOINT_URL`	Endpoint URL (defaults to `https://os.zhdk.cloud.switch.ch` if not set)
`AWS_REQUEST_CHECKSUM_CALCULATION`	Set to `when_required` to avoid checksum errors on SwitchEngines/Ceph
`AWS_RESPONSE_CHECKSUM_VALIDATION`	Set to `when_required` to avoid checksum errors on SwitchEngines/Ceph

The clips_file (the list of clip filenames to annotate) is always read from the local filesystem even when storage: s3.

Usage

River annotation reference: If you are annotating river footage, consult the river annotation guide for guidance on how to draw masks correctly.

uv run python -m clip_annotator
# or, if you have the venv activated:
python -m clip_annotator

Arguments

Argument	Default	Description
`--config`	`config/config.yaml`	Path to the config YAML file
`--data`	(from config)	Override `data_dir` from config
`--out`	(from config)	Override `out_dir` from config
`--clips`	(from config)	Override `clips_file` from config
`--clip`	(first unannotated in list)	Open a specific clip by its stem name (filename without extension, e.g. `clip_20230501T120000`)
`--extras`	off	Also save GIFs and extra PNGs (see Output section). Note: GIF encoding is slow — saving will freeze the UI for several seconds per clip.
`--no-skip`	off	Show already-annotated clips instead of skipping them

Typical workflows

# Annotate clips listed in config/clips.txt (default)
uv run python -m clip_annotator

# Use a different config file
uv run python -m clip_annotator --config config/my_config.yaml

# Override paths from the command line
uv run python -m clip_annotator --data data/clips --out data/out

# Annotate a single specific clip
uv run python -m clip_annotator --clip clip_20230615T120000

Configuration

Main settings live in config/config.yaml. Copy config/config.example.yaml to get started.

storage: local            # required: 'local' or 's3'

data_dir:                 # required: read-only source of ZIP archives (local path or bucket/prefix for S3)
out_dir:                  # required: write destination for annotations (can be same bucket as data_dir with a different prefix, or a separate location)

clips_file: config/clips.txt
optical_flow_config_file: config/optical_flow_config.yaml
questions_config_file: config/questions.yaml

display_max: 720          # longest side in pixels for display
fps_fallback: 25          # FPS to use if the video header is missing
max_frames: 100           # max frames to extract per clip

# Override input filenames only if your ZIP archives differ from the defaults
filenames:
  video_in_zip: left.mp4
  video_tmp_suffix: .mp4
  zip_extension: .zip

Output filenames (mask.png, metadata.json, etc.) have sensible defaults and can be overridden in the filenames: block — see config.py for the full list.

Survey questions

Survey questions are defined in config/questions.yaml (committed to the repo). Add, remove, or reorder sections and items — the UI rebuilds automatically. key is what gets saved in metadata.json; default selects the pre-checked option (omit or set to null to leave unselected).

Optical flow segmentation

config/optical_flow_config.yaml controls the Auto Segment button. When pressed, the tool computes a segmentation mask from the loaded frames and replaces the current mask (undoable). The segmentation combines two criteria:

Optical flow magnitude — pixels where the temporal median of frame-to-frame flow (scaled by FPS) exceeds a fraction of the maximum are considered moving.
Brightness — pixels outside a brightness window are excluded (removes sky, saturated glare, etc.).

# config/optical_flow_config.yaml
enabled: true
norm_squared_threshold: 0.06   # fraction of max flow² that counts as moving
gaussian_kernel: [5, 5]        # blur kernel applied to the reference frame before brightness check
brightness_range: [2, 253]     # [min, max] greyscale brightness to keep

enabled: false disables the button without removing the config file.

Clip list file

config/clips.txt lists bare clip filenames (not full paths) to annotate, one per line — the tool prepends data_dir automatically. Each entry must be the exact filename of the ZIP archive as it appears in data_dir (e.g. clip_20230501T120000.zip). Lines starting with # are ignored. Clips are processed in order; a clip is considered already-annotated when <out_dir>/<clip_stem>/mask.png exists and is skipped automatically. Pass --no-skip to include already-annotated clips. When the last clip is reached, a dialog appears and the app exits.

# Example clips.txt
clip_20230501T120000.zip
clip_20230502T120000.zip

Copy config/clips.example.txt as a starting point.

Multi-annotator setup

To distribute clips across multiple annotators, create one clips file per annotator (e.g. config/annotator_A.txt) listing their assigned clip filenames, then pass it via --clips:

uv run python -m clip_annotator --clips config/annotator_A.txt

Assigning non-overlapping clip lists lets each annotator work independently. Intentionally overlapping a subset of clips across annotators enables inter-annotator agreement checks.

Controls

The window is split into two panels: the video canvas on the left (~70% of the width) and the survey panel on the right. The video auto-plays as a looping preview. Drawing tools and mask controls are arranged above and beside the canvas; navigation buttons (Previous / Next / Skip) sit at the top.

Tool modes

Three drawing tools are available in the tool row. The active tool is highlighted in blue.

Tool	How to activate	Description
Brush	Click Brush	Click and drag to paint the mask with a circular brush (default)
Polygon	Click Polygon	Click to place vertices and build closed shapes; use Fill mode to commit them
Fill	Click Fill	Click inside a closed polygon to fill it onto the mask

Brush tool

Action	How
Draw mask	Click and drag on the video
Erase mask	Toggle Eraser button (turns orange when active), then drag
Brush preview	A white circle follows the cursor showing the current brush size
Adjust brush size	Brush size slider (2–50 px, default 5); click ↺ to reset

Polygon tool

Polygons are drawn as overlays and do not affect the mask until you use Fill mode.

Action	How
Add vertex	Left-click on the canvas
Remove last vertex	Right-click
Close a shape	Left-click near the first vertex (red dot) when ≥ 3 vertices are placed; completed shapes turn bold cyan
Draw multiple shapes	Each closed shape is kept independently; draw as many as needed
Cancel in-progress polygon	Cancel Current Poly — discards the unfinished polygon, keeps completed shapes
Delete last completed shape	Del Shape

Fill tool

Action	How
Fill a shape	Left-click anywhere inside a closed polygon; that shape's interior is painted onto the mask
Nested shapes	If a closed polygon lies entirely inside the target, its interior is left unfilled (acts as a hole)
Innermost shape	Clicking inside nested shapes always fills the innermost (smallest) polygon containing the click
Undo fill	Undo — each fill is a single undoable step

Mask editing

Action	How
Undo last action	Undo
Undo 10 actions	Undo×10
Redo	Redo
Clear entire mask	Clear
Toggle mask overlay	Hide Mask / Show Mask — button turns red when hidden; does not affect mask data
Mask transparency	Mask Alpha slider (0–100%, default 15%); click ↺ to reset

Starting-point shortcuts

Action	How
Load mask from previous clip	Load Prev Mask — copies the saved mask of the previous clip onto the current one; undoable
Optical flow first guess	Auto Segment — replaces the current mask with an automatic segmentation based on motion and brightness; undoable. Disabled when `enabled: false` in `config/optical_flow_config.yaml`.

Image display adjustments

Three vertical sliders sit to the left of the video and affect display only — they do not change what is saved.

Slider	Effect	Range
Brightness	Shifts all pixel values up or down	−100 to +100
Contrast	Scales pixel values around the midpoint	−100 to +100
Gamma	Applies a power-law correction (higher = brighter)	0.1× to 3.0×

Click ↺ below any slider to restore its default value.

Action	How
Save and continue	Next — saves current clip and loads the next one. If the clip already has a saved annotation a dialog asks whether to replace it or keep the existing save. The UI freezes during saving; this is normal — wait for it to complete before clicking again.
Go back	Previous — saves current clip and returns to the previously viewed clip. Disabled on the first clip.
Skip without saving	Skip — discards any unsaved changes and loads the next clip without writing anything to disk.

Output

Each annotated clip produces a folder <out_dir>/<clip_stem>/ with:

mask.png          # Binary segmentation mask at full source resolution (always)
metadata.json     # Survey answers as JSON (always)
frame.png         # Middle frame of the clip (always)
overlay.png       # That frame with the mask blended in green (always)

# Only with --extras:
mask_vis.png               # Mask rendered as a greyscale PNG
video_original_hires.gif   # All frames at display resolution
video_original_lowres.gif  # All frames at 50% of display resolution
video_overlay_hires.gif    # Overlay GIF at display resolution
video_overlay_lowres.gif   # Overlay GIF at 50% of display resolution

All output filenames can be overridden via the filenames: section in config/config.yaml.

Survey answers (`metadata.json`)

Keys and values are determined by config/questions.yaml. With the default questions:

{
  "flow":          "Turbulent | Laminar | Uncertain",
  "shadows":       "Yes | No | Uncertain",
  "artifacts":     "Yes | No | Uncertain",
  "lighting":      "Day | Night | Uncertain",
  "exposure":      "Overexposed | Underexposed | Both | Normal | Uncertain",
  "snowing":       "Yes | No | Uncertain",
  "snow_on_ground":"Yes | No | Uncertain"
}

Internals

Clip format

Each clip is a ZIP archive containing a video file. The video inside the archive must be named left.mp4 by default; if your archives use a different internal name, set filenames.video_in_zip in config.yaml. The ZIP filename becomes the output folder name (e.g. clip_20230501T120000.zip → <out_dir>/clip_20230501T120000/).

Frame loading

Up to max_frames frames are extracted from the video and scaled so the longest side is display_max px. This display-resolution copy is what the annotator works on; the full-resolution dimensions are remembered separately so the saved mask is upscaled back to the original size on export.

Mask drawing

The mask is a binary array at display resolution. Brush strokes stamp a filled circle (draw or erase). Polygon shapes are stored as overlays and don't touch the mask until a Fill click rasterises them — the innermost polygon containing the click is filled, and any polygon whose centroid falls inside it is punched out as a hole.

Every mask-changing operation is pushed onto an undo stack before it executes. On save, the mask is upscaled to the original video resolution and written as an 8-bit PNG (0 or 255).

Resuming

When a clip is loaded that already has a saved mask.png and metadata.json, the mask is restored at display resolution and the survey answers are pre-filled.

Repository structure

.env.example                        # S3 credential template (copy to .env and fill in)
config/
    config.yaml                     # Your local config (git-ignored, copy from example)
    config.example.yaml             # Example config to copy and edit
    clips.txt                       # Your clip list (git-ignored, copy from example)
    clips.example.txt               # Example clip list
    questions.yaml                  # Survey question definitions
    optical_flow_config.yaml        # Optical flow parameters (set enabled: false to disable Auto Segment)
src/clip_annotator/
    __main__.py                     # Entry point — argument parsing and app launch
    annotator.py                    # Main QMainWindow — orchestrates all components
    clip_selector.py                # Reads the clip list and picks the next clip
    filesystem.py                   # Storage backend — local passthrough or S3 via s3fs
    mask_canvas.py                  # Drawing widget — brush, undo, erase, mouse events
    video_loader.py                 # ZIP extraction and frame resizing
    compute_optical_flow.py         # Optical flow segmentation (Auto Segment button)
    config.py                       # AppConfig dataclass and YAML loader
    __init__.py                     # Package version
pyproject.toml                      # Project metadata and dependencies

Development

# Install pre-commit hooks
uv run pre-commit install
uv run pre-commit run --all-files   # Run manually once

# Add a dependency
uv add <package>
uv add --dev <package>              # Development-only

README.md Unescape Escape

Video Annotation Tool