16 KiB
Video Annotation Tool
A desktop GUI application for manually annotating video clips. Annotators draw pixel-level segmentation masks over footage and answer structured survey questions defined in a config file.
Requirements
- Python 3.12
- uv (recommended) or pip
Quick start
# 1. Clone and install
git clone https://exchange.sensima.ch/ivan.sievering/clip-annotator
cd clip-annotator
uv sync
# 2. Create config and clip list from examples
cp config/config.example.yaml config/config.yaml # macOS/Linux
cp config/clips.example.txt config/clips.txt
copy config\config.example.yaml config\config.yaml # Windows
copy config\clips.example.txt config\clips.txt
# 3. Edit config/config.yaml (set data_dir and out_dir)
# Edit config/clips.txt (list clips to annotate)
# Edit config/questions.yaml to customise survey questions (optional)
# 4. Run
uv run python -m clip_annotator
Installation
# Install with uv (creates the virtual environment automatically)
uv sync
# Or with pip
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # macOS/Linux
pip install -e .
Setup
Before running, create your config and clip list from the provided examples:
# macOS/Linux
cp config/config.example.yaml config/config.yaml
cp config/clips.example.txt config/clips.txt
# Windows
copy config\config.example.yaml config\config.yaml
copy config\clips.example.txt config\clips.txt
Edit config/config.yaml to set your data_dir and out_dir, then edit config/clips.txt to list the clips you want to annotate. Survey questions are defined in config/questions.yaml (committed to the repo; edit to customise). See the Configuration section for all available options.
S3 storage (optional)
By default the tool reads clips from and writes annotations to the local filesystem (storage: local). To use an S3-compatible object store instead, set storage: s3 in config/config.yaml and give data_dir / out_dir as bucket/prefix paths:
storage: s3
data_dir: my-bucket/clips
out_dir: my-bucket/annotation_results
Copy .env.example to .env and fill in your credentials — the app loads this file automatically at startup:
cp .env.example .env # macOS/Linux
copy .env.example .env # Windows
# edit .env with your credentials
| Variable | Description |
|---|---|
S3_ACCESS_KEY |
Access key ID |
S3_SECRET_ACCESS_KEY |
Secret access key |
S3_ENDPOINT_URL |
Endpoint URL (defaults to https://os.zhdk.cloud.switch.ch if not set) |
AWS_REQUEST_CHECKSUM_CALCULATION |
Set to when_required to avoid checksum errors on SwitchEngines/Ceph |
AWS_RESPONSE_CHECKSUM_VALIDATION |
Set to when_required to avoid checksum errors on SwitchEngines/Ceph |
The clips_file (the list of clip filenames to annotate) is always read from the local filesystem even when storage: s3.
Usage
River annotation reference: If you are annotating river footage, consult the river annotation guide for guidance on how to draw masks correctly.
uv run python -m clip_annotator
# or, if you have the venv activated:
python -m clip_annotator
Arguments
| Argument | Default | Description |
|---|---|---|
--config |
config/config.yaml |
Path to the config YAML file |
--data |
(from config) | Override data_dir from config |
--out |
(from config) | Override out_dir from config |
--clips |
(from config) | Override clips_file from config |
--clip |
(first unannotated in list) | Open a specific clip by its stem name (filename without extension, e.g. clip_20230501T120000) |
--extras |
off | Also save GIFs and extra PNGs (see Output section). Note: GIF encoding is slow — saving will freeze the UI for several seconds per clip. |
--no-skip |
off | Show already-annotated clips instead of skipping them |
Typical workflows
# Annotate clips listed in config/clips.txt (default)
uv run python -m clip_annotator
# Use a different config file
uv run python -m clip_annotator --config config/my_config.yaml
# Override paths from the command line
uv run python -m clip_annotator --data data/clips --out data/out
# Annotate a single specific clip
uv run python -m clip_annotator --clip clip_20230615T120000
Configuration
Main settings live in config/config.yaml. Copy config/config.example.yaml to get started.
storage: local # required: 'local' or 's3'
data_dir: # required: read-only source of ZIP archives (local path or bucket/prefix for S3)
out_dir: # required: write destination for annotations (can be same bucket as data_dir with a different prefix, or a separate location)
clips_file: config/clips.txt
optical_flow_config_file: config/optical_flow_config.yaml
questions_config_file: config/questions.yaml
display_max: 720 # longest side in pixels for display
fps_fallback: 25 # FPS to use if the video header is missing
max_frames: 100 # max frames to extract per clip
# Override input filenames only if your ZIP archives differ from the defaults
filenames:
video_in_zip: left.mp4
video_tmp_suffix: .mp4
zip_extension: .zip
Output filenames (mask.png, metadata.json, etc.) have sensible defaults and can be overridden in the filenames: block — see config.py for the full list.
Survey questions
Survey questions are defined in config/questions.yaml (committed to the repo). Add, remove, or reorder sections and items — the UI rebuilds automatically. key is what gets saved in metadata.json; default selects the pre-checked option (omit or set to null to leave unselected).
Optical flow segmentation
config/optical_flow_config.yaml controls the Auto Segment button. When pressed, the tool computes a segmentation mask from the loaded frames and replaces the current mask (undoable). The segmentation combines two criteria:
- Optical flow magnitude — pixels where the temporal median of frame-to-frame flow (scaled by FPS) exceeds a fraction of the maximum are considered moving.
- Brightness — pixels outside a brightness window are excluded (removes sky, saturated glare, etc.).
# config/optical_flow_config.yaml
enabled: true
norm_squared_threshold: 0.06 # fraction of max flow² that counts as moving
gaussian_kernel: [5, 5] # blur kernel applied to the reference frame before brightness check
brightness_range: [2, 253] # [min, max] greyscale brightness to keep
enabled: false disables the button without removing the config file.
Clip list file
config/clips.txt lists bare clip filenames (not full paths) to annotate, one per line — the tool prepends data_dir automatically. Each entry must be the exact filename of the ZIP archive as it appears in data_dir (e.g. clip_20230501T120000.zip). Lines starting with # are ignored. Clips are processed in order; a clip is considered already-annotated when <out_dir>/<clip_stem>/mask.png exists and is skipped automatically. Pass --no-skip to include already-annotated clips. When the last clip is reached, a dialog appears and the app exits.
# Example clips.txt
clip_20230501T120000.zip
clip_20230502T120000.zip
Copy config/clips.example.txt as a starting point.
Multi-annotator setup
To distribute clips across multiple annotators, create one clips file per annotator (e.g. config/annotator_A.txt) listing their assigned clip filenames, then pass it via --clips:
uv run python -m clip_annotator --clips config/annotator_A.txt
Assigning non-overlapping clip lists lets each annotator work independently. Intentionally overlapping a subset of clips across annotators enables inter-annotator agreement checks.
Controls
The window is split into two panels: the video canvas on the left (~70% of the width) and the survey panel on the right. The video auto-plays as a looping preview. Drawing tools and mask controls are arranged above and beside the canvas; navigation buttons (Previous / Next / Skip) sit at the top.
Tool modes
Three drawing tools are available in the tool row. The active tool is highlighted in blue.
| Tool | How to activate | Description |
|---|---|---|
| Brush | Click Brush | Click and drag to paint the mask with a circular brush (default) |
| Polygon | Click Polygon | Click to place vertices and build closed shapes; use Fill mode to commit them |
| Fill | Click Fill | Click inside a closed polygon to fill it onto the mask |
Brush tool
| Action | How |
|---|---|
| Draw mask | Click and drag on the video |
| Erase mask | Toggle Eraser button (turns orange when active), then drag |
| Brush preview | A white circle follows the cursor showing the current brush size |
| Adjust brush size | Brush size slider (2–50 px, default 5); click ↺ to reset |
Polygon tool
Polygons are drawn as overlays and do not affect the mask until you use Fill mode.
| Action | How |
|---|---|
| Add vertex | Left-click on the canvas |
| Remove last vertex | Right-click |
| Close a shape | Left-click near the first vertex (red dot) when ≥ 3 vertices are placed; completed shapes turn bold cyan |
| Draw multiple shapes | Each closed shape is kept independently; draw as many as needed |
| Cancel in-progress polygon | Cancel Current Poly — discards the unfinished polygon, keeps completed shapes |
| Delete last completed shape | Del Shape |
Fill tool
| Action | How |
|---|---|
| Fill a shape | Left-click anywhere inside a closed polygon; that shape's interior is painted onto the mask |
| Nested shapes | If a closed polygon lies entirely inside the target, its interior is left unfilled (acts as a hole) |
| Innermost shape | Clicking inside nested shapes always fills the innermost (smallest) polygon containing the click |
| Undo fill | Undo — each fill is a single undoable step |
Mask editing
| Action | How |
|---|---|
| Undo last action | Undo |
| Undo 10 actions | Undo×10 |
| Redo | Redo |
| Clear entire mask | Clear |
| Toggle mask overlay | Hide Mask / Show Mask — button turns red when hidden; does not affect mask data |
| Mask transparency | Mask Alpha slider (0–100%, default 15%); click ↺ to reset |
Starting-point shortcuts
| Action | How |
|---|---|
| Load mask from previous clip | Load Prev Mask — copies the saved mask of the previous clip onto the current one; undoable |
| Optical flow first guess | Auto Segment — replaces the current mask with an automatic segmentation based on motion and brightness; undoable. Disabled when enabled: false in config/optical_flow_config.yaml. |
Image display adjustments
Three vertical sliders sit to the left of the video and affect display only — they do not change what is saved.
| Slider | Effect | Range |
|---|---|---|
| Brightness | Shifts all pixel values up or down | −100 to +100 |
| Contrast | Scales pixel values around the midpoint | −100 to +100 |
| Gamma | Applies a power-law correction (higher = brighter) | 0.1× to 3.0× |
Click ↺ below any slider to restore its default value.
Navigation
| Action | How |
|---|---|
| Save and continue | Next — saves current clip and loads the next one. If the clip already has a saved annotation a dialog asks whether to replace it or keep the existing save. The UI freezes during saving; this is normal — wait for it to complete before clicking again. |
| Go back | Previous — saves current clip and returns to the previously viewed clip. Disabled on the first clip. |
| Skip without saving | Skip — discards any unsaved changes and loads the next clip without writing anything to disk. |
Output
Each annotated clip produces a folder <out_dir>/<clip_stem>/ with:
mask.png # Binary segmentation mask at full source resolution (always)
metadata.json # Survey answers as JSON (always)
frame.png # Middle frame of the clip (always)
overlay.png # That frame with the mask blended in green (always)
# Only with --extras:
mask_vis.png # Mask rendered as a greyscale PNG
video_original_hires.gif # All frames at display resolution
video_original_lowres.gif # All frames at 50% of display resolution
video_overlay_hires.gif # Overlay GIF at display resolution
video_overlay_lowres.gif # Overlay GIF at 50% of display resolution
All output filenames can be overridden via the filenames: section in config/config.yaml.
Survey answers (metadata.json)
Keys and values are determined by config/questions.yaml. With the default questions:
{
"flow": "Turbulent | Laminar | Uncertain",
"shadows": "Yes | No | Uncertain",
"artifacts": "Yes | No | Uncertain",
"lighting": "Day | Night | Uncertain",
"exposure": "Overexposed | Underexposed | Both | Normal | Uncertain",
"snowing": "Yes | No | Uncertain",
"snow_on_ground":"Yes | No | Uncertain"
}
Internals
Clip format
Each clip is a ZIP archive containing a video file. The video inside the archive must be named left.mp4 by default; if your archives use a different internal name, set filenames.video_in_zip in config.yaml. The ZIP filename becomes the output folder name (e.g. clip_20230501T120000.zip → <out_dir>/clip_20230501T120000/).
Frame loading
Up to max_frames frames are extracted from the video and scaled so the longest side is display_max px. This display-resolution copy is what the annotator works on; the full-resolution dimensions are remembered separately so the saved mask is upscaled back to the original size on export.
Mask drawing
The mask is a binary array at display resolution. Brush strokes stamp a filled circle (draw or erase). Polygon shapes are stored as overlays and don't touch the mask until a Fill click rasterises them — the innermost polygon containing the click is filled, and any polygon whose centroid falls inside it is punched out as a hole.
Every mask-changing operation is pushed onto an undo stack before it executes. On save, the mask is upscaled to the original video resolution and written as an 8-bit PNG (0 or 255).
Resuming
When a clip is loaded that already has a saved mask.png and metadata.json, the mask is restored at display resolution and the survey answers are pre-filled.
Repository structure
.env.example # S3 credential template (copy to .env and fill in)
config/
config.yaml # Your local config (git-ignored, copy from example)
config.example.yaml # Example config to copy and edit
clips.txt # Your clip list (git-ignored, copy from example)
clips.example.txt # Example clip list
questions.yaml # Survey question definitions
optical_flow_config.yaml # Optical flow parameters (set enabled: false to disable Auto Segment)
src/clip_annotator/
__main__.py # Entry point — argument parsing and app launch
annotator.py # Main QMainWindow — orchestrates all components
clip_selector.py # Reads the clip list and picks the next clip
filesystem.py # Storage backend — local passthrough or S3 via s3fs
mask_canvas.py # Drawing widget — brush, undo, erase, mouse events
video_loader.py # ZIP extraction and frame resizing
compute_optical_flow.py # Optical flow segmentation (Auto Segment button)
config.py # AppConfig dataclass and YAML loader
__init__.py # Package version
pyproject.toml # Project metadata and dependencies
Development
# Install pre-commit hooks
uv run pre-commit install
uv run pre-commit run --all-files # Run manually once
# Add a dependency
uv add <package>
uv add --dev <package> # Development-only