A desktop application for manually annotating river video clips as part of the [HydroScan](https://github.com/HydroScan) project. Annotators draw pixel-level water masks over river footage and answer structured survey questions about flow conditions, lighting, and scene quality.
display_max: 720 # longest side in pixels for display
fps_fallback: 25 # FPS to use if the video header is missing
max_frames: 100 # max frames to extract per clip
data_dir: data/clips # directory containing ZIP archives
out_dir: data/annotation_results
clips_file: config/clips.txt
questions:
- section: River
items:
- key: flow
label: "Flow Regime"
options: [Turbulent, Laminar, Uncertain]
default: Laminar
# add more items or sections as needed
```
Add, remove, or reorder questions directly in the YAML — the UI rebuilds automatically. `key` is what gets saved in `metadata.json`; `default` selects the pre-checked option (omit or set to `null` to leave unselected).
`config/clips.txt` lists the clip filenames to annotate, one per line. Lines starting with `#` are ignored. Clips are processed in order; already-annotated clips (those with an existing `mask.png`) are skipped automatically. Pass `--no-skip` to include them. When the last clip is reached, a dialog appears and the app exits.
| Save and continue | **Next** — saves current clip and loads the next one. If the clip already has a saved annotation a dialog asks whether to replace it or keep the existing save. |
| Go back | **Previous** — saves current clip and returns to the previously viewed clip. Disabled on the first clip. |
| Skip without saving | **Skip** — discards any unsaved changes and loads the next clip without writing anything to disk. |
Each clip is a ZIP archive containing a video file (default `left.mp4`, configurable via `filenames.video_in_zip`). The filename encodes the recording timestamp (e.g. `left_20230615T120000.zip`).
Up to `max_frames` frames are extracted from the video and scaled so the longest side is `display_max` px. This display-resolution copy is what the annotator works on; the full-resolution dimensions are remembered separately so the saved mask is upscaled back to the original size on export.
The mask is a binary NumPy array matching the display frame size.
**Brush:** each stroke stamps a filled circle of the selected radius, setting pixels to 1 (draw) or 0 (erase).
**Polygon:** vertices are stored as a list of floating-point canvas coordinates. Multiple closed shapes can coexist. Completed shapes are rendered as cyan overlays on the canvas but do not touch the mask until a fill is applied.
**Fill:** clicking inside a closed polygon rasterises it with `cv2.fillPoly` and ORs the result into the mask. Among all shapes containing the click, the innermost (smallest area, determined by `cv2.contourArea`) is selected as the fill target. Any polygon whose centroid lies inside the target is then punched out as a hole.
Every mask-changing operation (brush stroke, fill) pushes the previous mask onto the undo stack before modifying it. On save the mask is resized to the original video resolution with nearest-neighbour interpolation and written as an 8-bit PNG (0 or 255).
When a clip is loaded that already has a saved `mask.png` and `metadata.json`, the mask is restored at display resolution and the survey answers are pre-filled.