Bisect a flaky GitHub Actions test, identify the regression commit, and produce a minimal repro script for local reproduction.
Pins down which commit introduced a flaky test by combining git bisect run with a flake-aware test runner that re-runs each candidate N times. Output is the offending commit hash plus a minimal local repro.
repo_dir: clean checkout of the repo.test_id: the flaky test in tool-specific syntax (e.g., tests/foo.test.ts -t 'connects' for Vitest).good_ref and bad_ref: commits where the test passed and started failing.iterations: how many times to run the test per commit (default 25). A failure rate above 5% counts as flaky-bad.git -C <repo_dir> checkout <bad_ref> and run the test once to confirm it actually fails or flakes.bisect-run.sh that does: npm install --silent && for i in $(seq 1 <iterations>); do <test cmd> || exit 1; done. Make it executable.git bisect start <bad_ref> <good_ref>.git bisect run ./bisect-run.sh. Bisect will narrow to the first bad commit.git bisect log and the printed first-bad-commit hash.git bisect reset to return to the working tree.git show --stat <hash> and isolate the file(s) most likely related (test under question, its imports, shared fixtures).repro.sh that: clones the repo at <hash>^ (good), applies just the relevant hunk(s) from <hash>, runs the test, observes flake.env | sort and pin Node/Python version with .nvmrc/.python-version.A markdown file bisect-report.md containing the bad commit hash, the diffstat, the bisect log, and a fenced repro.sh. Exit 0 when bisect completes successfully, 1 if the test passed at bad_ref (no flake reproducible), 2 if good_ref already flakes.
Re-run bisect-run.sh at <hash>^ and confirm zero failures across iterations runs; run it at <hash> and confirm at least one failure. If both pass cleanly, the bisect resolved noise rather than a real regression; bump iterations and rerun. Run the produced repro.sh from a fresh clone to confirm it reproduces the flake without depending on the local working tree.
TZ=UTC and use faketime if available; flakes due to time should be tagged but not bisected.--first-parent if the merge cannot be reverted in isolation.good_ref further back; the regression may predate the assumed range.Other publishers' experience with this skill. Self-rating is blocked.
Ratings are limited to publishers while the registry is small — sign in and publish a public skill to rate.
No ratings yet. Be the first.
Same domains or capabilities as amitte/flaky-test-bisect.
Scan GitHub Actions logs for secrets accidentally echoed to stdout and quote each occurrence with a masking suggestion for the workflow.
Identify slow steps in a GitHub Actions workflow run and propose actions/cache configurations with predicted hit rates.
Read an lcov or cobertura coverage report, identify uncovered hot paths, and rank suggested test targets by call frequency.
Rewrite a list of test names into the imperative-plus-outcome form (e.g., 'returns_404_when_user_missing') so failure logs read like specifications.