Add S3 storage support via s3fs; make storage field required

- New filesystem.py: make_fs() factory (returns s3fs.S3FileSystem or None), plus fsjoin/fsstem/fsname path helpers - config.py: storage field is now required ('local' or 's3'); load_config raises a clear ValueError when it is missing - video_loader, clip_selector, annotator: thread fs through all file I/O; local paths unchanged, S3 paths use fs.open/fs.exists/fs.pipe - annotation_script: load .env via python-dotenv at startup, create fs from config and pass to Annotator - Add .env.example with SwitchEngines endpoint and AWS checksum env vars - pyproject.toml: add s3fs and python-dotenv dependencies - Reduce default mask alpha from 40% to 15% - Update example clip names to colon-separated timestamps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 16:15:38 +02:00
parent 8579bad2e2
commit dc59b8affb
15 changed files with 1539 additions and 106 deletions
--- a/README.md
+++ b/README.md
@@ -35,6 +35,33 @@ cp config/clips.example.txt config/clips.txt

 Edit `config/config.yaml` to set your `data_dir` and `out_dir`, then edit `config/clips.txt` to list the clips you want to annotate.

+### S3 storage (optional)
+
+By default the tool reads clips from and writes annotations to the local filesystem (`storage: local`). To use an S3-compatible object store instead, set `storage: s3` in `config/config.yaml` and give `data_dir` / `out_dir` as `bucket/prefix` paths:
+
+```yaml
+storage: s3
+data_dir: my-bucket/clips
+out_dir: my-bucket/annotation_results
+```
+
+Copy `.env.example` to `.env` and fill in your credentials — the app loads this file automatically at startup:
+
+```sh
+cp .env.example .env
+# edit .env with your credentials
+```
+
+| Variable | Description |
+|---|---|
+| `S3_ACCESS_KEY` | Access key ID |
+| `S3_SECRET_ACCESS_KEY` | Secret access key |
+| `S3_ENDPOINT_URL` | Endpoint URL (defaults to `https://os.zhdk.cloud.switch.ch` if not set) |
+| `AWS_REQUEST_CHECKSUM_CALCULATION` | Set to `when_required` to avoid checksum errors on SwitchEngines/Ceph |
+| `AWS_RESPONSE_CHECKSUM_VALIDATION` | Set to `when_required` to avoid checksum errors on SwitchEngines/Ceph |
+
+The `clips_file` (the list of clip filenames to annotate) is always read from the local filesystem even when `storage: s3`.
+
 ## Usage

 ```sh
@@ -74,6 +101,26 @@ python -m river_annotation_tool.annotation_script --clip left_20230615T120000
 All settings live in `config/config.yaml`. Copy `config/config.example.yaml` to get started.

 ```yaml
+storage: local            # required: 'local' or 's3'
+
+data_dir: data/clips      # directory containing ZIP archives (local path or bucket/prefix for S3)
+out_dir: data/annotation_results
+clips_file: config/clips.txt
+# optical_flow_config_file: config/optical_flow_config.yaml   # optional, enables Auto Segment
+
+display_max: 720          # longest side in pixels for display
+fps_fallback: 25          # FPS to use if the video header is missing
+max_frames: 100           # max frames to extract per clip
+
+questions:
+  - section: River
+    items:
+      - key: flow
+        label: "Flow Regime"
+        options: [Turbulent, Laminar, Uncertain]
+        default: Laminar
+      # add more items or sections as needed
+
 filenames:
  video_in_zip: left.mp4           # video filename inside each ZIP archive
  video_tmp_suffix: .mp4           # suffix for the extraction temp file
@@ -87,24 +134,6 @@ filenames:
  gif_original_lowres: video_original_lowres.gif
  gif_overlay_hires: video_overlay_hires.gif
  gif_overlay_lowres: video_overlay_lowres.gif
-
-display_max: 720          # longest side in pixels for display
-fps_fallback: 25          # FPS to use if the video header is missing
-max_frames: 100           # max frames to extract per clip
-
-data_dir: data/clips      # directory containing ZIP archives
-out_dir: data/annotation_results
-clips_file: config/clips.txt
-# optical_flow_config_file: config/optical_flow_config.yaml   # optional, enables Auto Segment
-
-questions:
-  - section: River
-    items:
-      - key: flow
-        label: "Flow Regime"
-        options: [Turbulent, Laminar, Uncertain]
-        default: Laminar
-      # add more items or sections as needed
 ```

 Add, remove, or reorder questions directly in the YAML — the UI rebuilds automatically. `key` is what gets saved in `metadata.json`; `default` selects the pre-checked option (omit or set to `null` to leave unselected).
@@ -192,7 +221,7 @@ Polygons are drawn as overlays and do not affect the mask until you use **Fill**
 | Redo | **Redo** |
 | Clear entire mask | **Clear** |
 | Toggle mask overlay | **Hide Mask / Show Mask** — button turns red when hidden; does not affect mask data |
-| Mask transparency | **Mask Alpha** slider (0–100%, default 40%); click **↺** to reset |
+| Mask transparency | **Mask Alpha** slider (0–100%, default 15%); click **↺** to reset |

 ### Starting-point shortcuts

@@ -280,22 +309,24 @@ When a clip is loaded that already has a saved `mask.png` and `metadata.json`, t
 ## Repository structure

 ```
+.env.example                        # S3 credential template (copy to .env and fill in)
 config/
-    config.yaml                    # Your local config (git-ignored, copy from example)
-    config.example.yaml            # Example config to copy and edit
-    clips.txt                      # Your clip list (git-ignored, copy from example)
-    clips.example.txt              # Example clip list
-    optical_flow_config.yaml       # Optional optical flow parameters (enable via config.yaml)
+    config.yaml                     # Your local config (git-ignored, copy from example)
+    config.example.yaml             # Example config to copy and edit
+    clips.txt                       # Your clip list (git-ignored, copy from example)
+    clips.example.txt               # Example clip list
+    optical_flow_config.yaml        # Optional optical flow parameters (enable via config.yaml)
 src/river_annotation_tool/
-    annotation_script.py           # Entry point — argument parsing and app launch
-    annotator.py                   # Main QMainWindow — orchestrates all components
-    clip_selector.py               # Reads the clip list and picks the next clip
-    mask_canvas.py                 # Drawing widget — brush, undo, erase, mouse events
-    video_loader.py                # ZIP extraction and frame resizing
-    compute_optical_flow.py        # Optical flow river segmentation (Auto Segment button)
-    config.py                      # AppConfig dataclass and YAML loader
-    __init__.py                    # Package version
-pyproject.toml              # Project metadata and dependencies
+    annotation_script.py            # Entry point — argument parsing and app launch
+    annotator.py                    # Main QMainWindow — orchestrates all components
+    clip_selector.py                # Reads the clip list and picks the next clip
+    filesystem.py                   # Storage backend — local passthrough or S3 via s3fs
+    mask_canvas.py                  # Drawing widget — brush, undo, erase, mouse events
+    video_loader.py                 # ZIP extraction and frame resizing
+    compute_optical_flow.py         # Optical flow river segmentation (Auto Segment button)
+    config.py                       # AppConfig dataclass and YAML loader
+    __init__.py                     # Package version
+pyproject.toml                      # Project metadata and dependencies
 ```

 ## Development