CI: route test_game_my_sea*.py to test-FTs-room stage — 49 my-sea FTs DRY-reuse the room-shell hex + sea-cross picker (same Selenium surface as test_game_room_* + test_trinket_*), so they belong w. the heavy room flows instead of bloating test-FTs-non-room. Filename-regex partition stays clean (13 room + 24 non-room = 37 total, no overlap)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI: _retry_failed.sh wraps both FT steps — single-flake retries cost ~22s instead of a full 35-min step re-run. Parses Django's FAIL:/ERROR: test_method (full.dotted.path) lines from stdout, re-runs only those labels (deduped + sorted). Green first runs skip the retry; first-run crashes w. no parseable labels propagate the original exit code without masking infra problems
2026-05-20 13:20:23 -04:00 · 2026-05-20 13:14:26 -04:00
2 changed files with 91 additions and 11 deletions
--- a/.woodpecker/_retry_failed.sh
+++ b/.woodpecker/_retry_failed.sh
@@ -0,0 +1,64 @@
 #!/usr/bin/env bash
 # Usage: bash .woodpecker/_retry_failed.sh <test command args...>
 #
 # Runs `python manage.py test "$@"`. If any tests fail/error, parses the
 # failure labels out of stdout and re-runs ONLY those tests — so a single
 # Selenium flake at test 90/93 costs ~22s on retry instead of the full
 # 35-minute step.
 #
 # Django's unittest-based runner prints failures in a predictable shape:
 #
 #   ERROR: test_method (full.dotted.path.TestClass.test_method)
 #   FAIL:  test_method (full.dotted.path.TestClass.test_method)
 #
 # The dotted path inside the parens is exactly what `manage.py test`
 # accepts as a label. We grep for those lines + re-run that list.
 #
 # Exit semantics:
 #   - First run green → exit 0, no retry.
 #   - First run failed AND label parse found nothing (crashed before any
 #     test reported, e.g. ImportError) → propagate first-run exit code,
 #     no retry. Genuine infra problems shouldn't be silently re-run.
 #   - First run failed AND labels parsed → retry just those; exit with
 #     the retry's exit code. A real (not-flaky) regression fails twice
 #     → step still red, with the focused retry log as the authoritative
 #     report (no need to scroll past the noisy first-run output).
 #
 # Run from inside `src/` (Woodpecker preserves cwd across `commands:`,
 # so the upstream `cd ./src` carries through).
 set +e  # do NOT bail on first failure; we WANT to handle it
 LOG=$(mktemp -t ft-retry.XXXXXX.log)
 trap 'rm -f "$LOG"' EXIT
 echo "──── First run ────"
 python manage.py test "$@" 2>&1 | tee "$LOG"
 FIRST=${PIPESTATUS[0]}
 if [ "$FIRST" -eq 0 ]; then
    exit 0
 fi
 # Parse failure labels. Match both FAIL: and ERROR: lines; the dotted
 # path lives inside the trailing parens. `sort -u` dedupes if a single
 # test produces multiple lines (rare but possible).
 FAILED=$(grep -E '^(FAIL|ERROR): ' "$LOG" \
         | sed -E 's/^.*\(([^)]+)\)[^()]*$/\1/' \
         | sort -u \
         | tr '\n' ' ')
 if [ -z "$FAILED" ]; then
    echo "──── First run failed, but no FAIL/ERROR labels parseable ────"
    echo "──── Not retrying — likely an infra problem, not a test flake ────"
    exit "$FIRST"
 fi
 NUM=$(echo "$FAILED" | wc -w | tr -d ' ')
 echo ""
 echo "──── Retry ($NUM failed test(s) from first run) ────"
 echo "$FAILED" | tr ' ' '\n' | sed 's/^/  /'
 echo "─────────────────────────────────────────────────────"
 echo ""
 python manage.py test $FAILED
--- a/.woodpecker/main.yaml
+++ b/.woodpecker/main.yaml
@@ -107,11 +107,18 @@ steps:
    commands:
      - pip install -r requirements.dev.txt
      - cd ./src
-      # Every FT file EXCEPT test_game_room_* and test_trinket_* — both
+      # Every FT file EXCEPT test_game_room_*, test_trinket_*, AND
-      # clusters run in test-FTs-room. Channels + two-browser tags already
+      # test_game_my_sea* — all three clusters run in test-FTs-room.
-      # covered upstream. `ls | grep -v | sed` enumerates module dotted-paths
+      # Channels + two-browser tags already covered upstream.
-      # from filenames.
+      # `ls | grep -v | sed` enumerates module dotted-paths from
-      - python manage.py test --exclude-tag=channels --exclude-tag=two-browser $(ls functional_tests/test_*.py | grep -vE 'test_(game_room|trinket)_' | sed 's|/|.|g;s|\.py||')
+      # filenames. (No trailing `_` in the my-sea alternative — the
      # file is `test_game_my_sea.py` w. no further suffix today.)
      #
      # Wrapped in `_retry_failed.sh` so a single Selenium flake (browser
      # hang, gecko-perms blip, login race) at test N/M doesn't cost the
      # full step wall-clock on retry — the script parses Django's
      # FAIL:/ERROR: lines from stdout + re-runs only those labels.
      - bash ../.woodpecker/_retry_failed.sh --exclude-tag=channels --exclude-tag=two-browser $(ls functional_tests/test_*.py | grep -vE 'test_(game_room|trinket)_|test_game_my_sea' | sed 's|/|.|g;s|\.py||')
    when:
      - event: push
        path:
@@ -139,13 +146,22 @@ steps:
      - pip install -r requirements.dev.txt
      - cd ./src
      # Heavy Selenium room flows — test_game_room_* (deck_contrib,
-      # gatekeeper, invite, select_role/sea/sig/sky, tray, tray_tooltip)
+      # gatekeeper, invite, select_role/sea/sig/sky, tray, tray_tooltip),
-      # AND test_trinket_* (carte_blanche, coin_on_a_string, backstage_pass)
+      # test_trinket_* (carte_blanche, coin_on_a_string, backstage_pass)
-      # since trinket FTs create rooms + load the room template (where the
+      # since trinket FTs create rooms + load the room template (where
-      # table hex SCSS + chair geometry live), so they exercise the same
+      # the table hex SCSS + chair geometry live), AND test_game_my_sea*
-      # surface as test_game_room_*. Runs in parallel w. test-FTs-non-room
+      # (49 my-sea FTs that DRY-reuse the room-shell hex + sea-cross
      # picker — same Selenium surface, so the same parallel-stage
      # contention concerns apply). Runs in parallel w. test-FTs-non-room
      # (distinct DATABASE_URL paths under /tmp; see split-rationale).
-      - python manage.py test --exclude-tag=channels --exclude-tag=two-browser $(ls functional_tests/test_game_room_*.py functional_tests/test_trinket_*.py | sed 's|/|.|g;s|\.py||')
+      #
      # `_retry_failed.sh` parses Django FAIL:/ERROR: lines from the first
      # run's stdout + re-runs just those labels — single-flake retries
      # cost ~22s instead of the full ~35-min step wall-clock. Genuine
      # regressions still fail (second run output is the authoritative
      # report); first-run crashes w. no parseable labels propagate
      # the original exit code (don't silently mask infra problems).
      - bash ../.woodpecker/_retry_failed.sh --exclude-tag=channels --exclude-tag=two-browser $(ls functional_tests/test_game_room_*.py functional_tests/test_trinket_*.py functional_tests/test_game_my_sea*.py | sed 's|/|.|g;s|\.py||')
    when:
      - event: push
        path:
Author	SHA1	Message	Date
Disco DeDisco	31cb8dfc1d	CI: route test_game_my_sea.py to test-FTs-room stage — 49 my-sea FTs DRY-reuse the room-shell hex + sea-cross picker (same Selenium surface as test_game_room_ + test_trinket_*), so they belong w. the heavy room flows instead of bloating test-FTs-non-room. Filename-regex partition stays clean (13 room + 24 non-room = 37 total, no overlap) All checks were successful ci/woodpecker/push/pyswiss Pipeline was successful Details ci/woodpecker/push/main Pipeline was successful Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:20:23 -04:00
Disco DeDisco	899e626265	CI: `_retry_failed.sh` wraps both FT steps — single-flake retries cost ~22s instead of a full 35-min step re-run. Parses Django's `FAIL:/ERROR: test_method (full.dotted.path)` lines from stdout, re-runs only those labels (deduped + sorted). Green first runs skip the retry; first-run crashes w. no parseable labels propagate the original exit code without masking infra problems Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:14:26 -04:00