# The Agentic Coding Playbook

**How to set up an automated AI coding workflow — from testing sweep to morning review.**

*A free guide by the team behind voxspek.*

---

## What this playbook covers

You've adopted AI coding tools. Your agents ship PRs faster than you can review them. Now what?

This playbook walks you through a complete, automated workflow that turns your human observations into shipped code — overnight, while you sleep. It covers:

1. The architecture (GitHub as the message bus)
2. Setting up Claude Code skills for your project
3. Configuring Claude Cowork for nightly autonomous runs
4. The testing sweep (how to capture observations fast)
5. The planner routine (clustering issues into coherent sprints)
6. The executor routine (shipping PRs overnight)
7. The morning review process
8. Reference files you can copy into your repo today

**Time to set up:** ~2 hours for a basic loop, ~1 day for the full pipeline.

**Prerequisites:** A GitHub repo, Claude Code (Pro or Max), and a product to test.

---

## The Loop

The agentic coding workflow is a cycle with 7 steps. Four are yours (human judgment). Two are the agent's (execution). One is the bridge (voxspek).

```
  1. IDENTIFY → 2. DEFINE → 3. CAPTURE → 4. PLAN → 5. EXECUTE → 6. REVIEW → 7. TEST
       ↑                                                                          |
       └──────────────────────────── loop back ────────────────────────────────────┘
```

| Step | Who | What happens |
|---|---|---|
| **1. Identify the problem** | Human | Test, observe, notice what's wrong or missing. Only human eyes catch "correct-but-drifting" code. |
| **2. Define scope & spec** | Human | Vision, acceptance criteria, priority. The agent executes specs — you write them from user insight. |
| **3. Capture to GitHub** | voxspek | Voice → structured issue in 3 seconds. Three modes: Capture (bugs), Review (PR feedback), Spec (features). |
| **4. Agent plans sprints** | Agent | Planner routine clusters related issues into coherent sprint plans. Runs nightly. |
| **5. Agent ships PRs** | Agent | Executor routine implements the sprint. Code, tests, docs. 3-5 PRs overnight. |
| **6. Review & accept** | Human | Merge, flag, or follow up. Tiered review: LOW auto-merge, MEDIUM/HIGH human review. 30 minutes. |
| **7. Test the result** | Human | Run a testing sweep on the shipped code. Find new issues. → Back to step 1. |

**GitHub is at the center of every step.** Every step reads from GitHub and writes to GitHub. Issues, PRs, code, labels, comments — it's all in one place. The agent never needs to ask "what should I work on?" because the answer is in the issue queue.

---

## Why GitHub as hub (not just prompting agents directly)

You could skip all of this and just prompt Claude Code or Codex directly: "fix the button on the settings page." Why is the GitHub-centric loop better?

**1. Context persists across sessions.** A GitHub issue lives forever. A prompt in a chat session is gone when the session ends. When your planner routine runs at 9 PM, it reads from a persistent, structured backlog — not from your memory of what you told the agent last Tuesday.

**2. Sprints are coherent.** Direct prompting leads to one-off PRs that conflict on shared files. The planner routine sees ALL open issues and clusters them into coherent sprints — related changes ship together, not separately.

**3. Teams work.** If you add a collaborator, they can file issues, review PRs, and participate in the loop without any special tool setup. GitHub is the universal interface. Chat prompts are single-player.

**4. Audit trail is automatic.** Every observation traces from the issue → to the sprint plan → to the PR → to the merged code. When something breaks, you can walk the chain backward. Direct prompts leave no trail.

**5. Cross-agent compatible.** Your issues work with Claude Code, Cursor, Codex, Windsurf, or any future agent. Prompt-based workflows lock you into one agent's chat interface.

**6. AI reads structured input better.** A well-scoped GitHub issue with metadata, labels, and context produces better agent output than a freeform prompt. The structure IS the context.

**The bottom line:** prompting agents directly is fine for one-off tasks. The GitHub-centric loop is for sustained, compounding workflows where you ship 50-100 PRs a week for months. The overhead of structuring input pays for itself within the first week.

---

## Chapter 1 — The architecture

### GitHub is the message bus

Every observation, feature request, spec note, and bug report flows into GitHub Issues. Not a proprietary database. Not a Notion board. Not a Slack channel. GitHub Issues.

Why:
- Your agents already read from GitHub (issues, PRs, code)
- Full audit trail for free
- No vendor lock-in — you can walk away from any tool and your data stays
- Cross-agent compatible (Claude Code, Cursor, Codex — all read GitHub)

### The label schema

Create these labels in your repo:

| Label | Purpose |
|---|---|
| `source:observation` | Captured during a testing sweep |
| `source:spec` | Feature specification or requirement |
| `source:roadmap` | Planned roadmap item |
| `source:customer` | External feedback |
| `sprint-ready` | Planner may consume this issue |
| `needs-triage` | Requires human attention before execution |
| `needs-spec-expansion` | Stub — needs detail before an agent can work it |
| `blocker` | Priority flag — may trigger immediate action |
| `hold` | Do not work on this yet |
| `area:*` | Component or surface area (e.g., `area:auth`, `area:dashboard`) |

### The issue format

Every issue created in this workflow includes a machine-readable metadata footer:

```markdown
---
<!-- agentic-workflow-v1 -->
source: observation
area: /dashboard/settings
captured_at: 2026-04-21T14:30:00Z
sweep_id: sweep-2026-04-21
extension_version: 0.7.1
```

This footer lets the planner routine parse issues programmatically. The fields above the `---` are for humans; the footer is for agents.

---

## Chapter 2 — Collaboration: the many-to-many pattern

*The first time the overnight loop clicks for you, it's solo: one human + your agents. But the loop was designed for teams — and GitHub was designed for teams a decade before agents existed. The two compose naturally.*

### The problem with agent-chat workflows

Most early agentic workflows are single-player. One developer in a Claude Code terminal, one Cursor session, one Windsurf chat. Context lives inside that session. If your co-founder wants to contribute, she opens her own chat — and the two of you are working from different memories, possibly stepping on each other's file edits with no shared coordination layer.

This scales to one person. It does not scale to two.

### Why GitHub was already the answer

GitHub's entire design — issues, labels, milestones, branches, pull requests, reviews, comments, CODEOWNERS, protected branches, merge queues — is a coordination protocol for *n* humans working on the same codebase. Every piece of that protocol translates directly to *n* humans + *m* agents.

| GitHub primitive | Humans-only use | Humans + agents use |
|---|---|---|
| **Issues** | "Here's what needs doing" | Humans *and* agents file; the planner reads them all |
| **Labels** | Triage, priority, component | Routing signal for which agent or human picks up what |
| **Branches** | Parallel work without collisions | Each agent run gets its own branch; never steps on another |
| **Pull requests** | Review before merge | Agent PRs go through the same review gate |
| **Reviews & comments** | Humans debate, approve | Humans comment on agent PRs; voxspek captures those comments back as follow-up issues |
| **Milestones** | Sprint or release grouping | The planner uses these to cluster issues into coherent sprints |
| **CODEOWNERS** | Who's responsible for what | Also: which agent profile handles which area |
| **Actions / Webhooks** | CI, automation | The planner and executor themselves are Actions workflows |

You don't need new coordination infrastructure for the multi-agent era. It's already in your repo.

> **Figure 6: The many-to-many flow.** Multiple humans capture observations; multiple agents execute on branches; GitHub is the shared hub where they synchronize.

### The pattern, concretely

Imagine a two-person team shipping a product:

- **Alice** runs Claude Code with a `frontend-specialist` skill profile (tight conventions, Tailwind-aware).
- **Bob** runs Cursor with a `backend-specialist` ruleset (Node, SQL, tighter commit conventions).
- Both run nightly executor routines on the same repo.
- Both use voxspek on the client-facing product to capture observations.

Tuesday evening, Alice runs a testing sweep on the dashboard — 7 observations → 7 issues tagged `area:dashboard`. Bob runs a sweep on the API — 5 observations → 5 issues tagged `area:api`. The planner runs at 9PM and produces two sprint plans:

- `sprint-dashboard-2026-04-22` — 7 issues, routed to Alice's agent profile.
- `sprint-api-2026-04-22` — 5 issues, routed to Bob's agent profile.

By morning, each has 3–5 PRs waiting. Alice reviews hers, Bob reviews his. If a dashboard issue bleeds into the backend ("this empty state needs a new endpoint"), the planner re-routes it to Bob's sprint because the metadata footer on the issue tagged both `area:dashboard` and `area:api`.

No chat handoffs. No "hey, can you also pick this up" messages. GitHub was the coordination layer. The agents plugged into it.

### Three rules for multi-human, multi-agent teams

1. **Every observation becomes an issue before it's a discussion.** If Alice mentions something to Bob in Slack, that conversation dies there. If she captures it via voxspek, the issue lives forever and gets routed automatically. The rule: default to issue, exception to chat.
2. **Agent profiles are namespaced.** Each teammate's agent profile gets its own branch prefix (`alice/`, `bob/`), commit conventions, and CODEOWNERS mapping. This makes agent-to-agent collisions as rare as well-coordinated human-to-human collisions.
3. **Review always crosses the human/agent boundary.** Alice reviews Bob's agent PRs. Bob reviews Alice's. Agents don't approve each other's PRs — humans always sit at the approval gate. The *reviewing* is still human-to-human workflow, flowing through the same PR surface as it always did.

### What scales this further

- **CODEOWNERS + auto-assignment** — route review to whichever human owns the area, regardless of which agent opened the PR.
- **Discussion threads on issues** — when a spec is ambiguous, humans debate in the issue comments; the agent waits. (Agents read final state, not in-flight discussion.)
- **Stacked PRs and draft PRs** — when one agent run depends on a prior one, draft PRs are the coordination signal; the next run waits on merge.
- **Merge queue** — GitHub's native merge queue batches multiple agent PRs through CI in order, avoiding the "green when opened, red after merge" race.

### What breaks if you skip this

Teams that try to scale agent workflows via chat-based coordination ("I'll tell my agent to work on X, you tell yours to work on Y") hit walls within a week. Context diverges. The same issue gets worked twice. File edits collide. One agent's convention ("always add JSDoc") contradicts another's ("never add JSDoc"). Without a shared source of truth and a shared coordination protocol, two humans + two agents produce less than one human + one agent.

GitHub is that shared source of truth. It already was — you're just using more of its surface now.

---

## Chapter 3 — Setting up Claude Code skills

A Claude Code skill file tells Claude how to behave in your project. It's a markdown file that gets loaded as context when Claude Code runs.

### Your CLAUDE.md file

Create `CLAUDE.md` in your repo root. This is auto-loaded by Claude Code every session.

```markdown
# CLAUDE.md — [Your Project Name]

## Project overview
[One paragraph: what this project is, what stack it uses]

## Architecture
- Frontend: [framework, location]
- Backend: [framework, location]
- Database: [type, location]

## Git workflow
- Branch: commit to main (solo dev) or feature branches (team)
- Always run tests before committing
- Co-author trailer: Co-Authored-By: Claude Code <noreply@anthropic.com>

## Testing
[How to run tests, what to check before committing]

## Conventions
- [Naming conventions]
- [File organization rules]
- [Error handling approach]
- [Any project-specific rules]

## Known gotchas
- [Things that trip up new contributors or agents]
```

### The skill file

Create `.claude/skills/agentic-workflow.md`:

```markdown
# Agentic workflow skill

## Reading the backlog
When asked to plan or work on issues:
1. Read open issues labeled `sprint-ready`
2. Group related issues by `area:*` label
3. Propose a sprint plan before executing

## Issue format
Every issue this project creates has a metadata footer (<!-- agentic-workflow-v1 -->).
Parse the `source`, `area`, and `captured_at` fields when planning.

## Sprint composition rules
- Never mix unrelated areas in one PR
- Keep PRs under 500 lines of diff when possible
- If an issue is labeled `needs-spec-expansion`, ask for clarification before executing
- If an issue is labeled `blocker`, prioritize it above all others
- If an issue is labeled `hold`, skip it entirely

## PR conventions
- Title: concise, imperative ("Add settings validation", not "Added settings validation")
- Body: reference the issue(s) being addressed with "Closes #N"
- Always run the test suite before marking ready for review
```

---

## Chapter 4 — The testing sweep

The testing sweep is where human value is created. Everything else in the pipeline amplifies it.

### What a sweep looks like

1. Open your app in the browser
2. Open voxspek (or any voice-capture tool) in the side panel
3. Navigate through your app systematically
4. For each observation: press the hotkey, speak what you see, keep moving
5. A 20-minute sweep typically captures 10-15 observations

### What to look for

**Visual consistency:** Does this page match the rest of the product? Empty states, loading states, error states — do they use the same patterns?

**Flow integrity:** Does clicking through the intended user path work smoothly? Are there dead ends, confusing redirects, or missing back buttons?

**Context fit:** Does this feature feel integrated, or does it feel grafted on? This is the judgment call agents cannot make.

**Edge cases:** What happens with no data? With a lot of data? With special characters? With slow connections?

**Copy and labels:** Is the text clear? Are buttons labeled correctly? Do headings make sense?

### Sweep cadence

- **Daily (10-20 min):** Quick sweep of what changed overnight. Focus on PRs that were auto-merged.
- **Weekly (30-45 min):** Deep sweep of one area of the product. Rotate areas each week.
- **Pre-demo/pre-release (45-60 min):** Full product walkthrough. No shortcuts.

---

## Chapter 5 — Claude Cowork nightly routines

Claude Cowork (via Claude Code routines) can run scheduled tasks while you sleep. This is the automation layer.

### Setting up a planner routine

In the Claude Code web interface at `claude.ai/code`, create a routine:

**Name:** `nightly-planner`
**Schedule:** Every weekday at 9 PM
**Prompt:**

```
Read all open GitHub issues in [owner/repo] labeled "sprint-ready".

Group related issues into coherent sprints. Each sprint should:
- Address issues in the same area (same area:* label)
- Be implementable as a single PR without conflicting changes
- Contain 2-5 issues maximum

For each sprint, output a plan as a new GitHub issue with:
- Title: "Sprint: [area] — [brief description]"
- Body: list of issue references (#N) with a brief implementation approach
- Labels: sprint-plan

Do NOT execute any code. Only plan.
```

**Connected repo:** `[owner/repo]`

### Setting up an executor routine

**Name:** `nightly-executor`
**Schedule:** Every weekday at 10 PM (1 hour after planner)
**Prompt:**

```
Read open GitHub issues in [owner/repo] labeled "sprint-plan" that were created today.

For each sprint plan:
1. Read the referenced issues
2. Implement the changes described
3. Run the test suite
4. Open a PR with:
   - Title matching the sprint plan title
   - Body referencing all addressed issues (Closes #N, Closes #M)
   - Label: auto-sprint

Stop after completing 3 sprint plans or 2 hours, whichever comes first.
```

### Setting up a morning briefer routine

**Name:** `morning-briefer`
**Schedule:** Every weekday at 7 AM
**Prompt:**

```
Review all open PRs in [owner/repo] labeled "auto-sprint" that were created in the last 24 hours.

Create a single GitHub issue titled "Morning brief — [today's date]" with:
- Summary of each PR (1-2 sentences)
- Risk assessment: LOW (auto-mergeable), MEDIUM (review recommended), HIGH (touches shared files or has failing checks)
- Recommended review order (HIGH risk first)
- Total: N PRs shipped overnight, M issues addressed

Label: morning-brief
```

### Routine limits

| Plan | Routines/day | Notes |
|---|---|---|
| Pro | 5 | Enough for planner + executor + briefer |
| Max | 15 | Room for additional specialized routines |
| Team | 25 | Per-seat, shared across team |

---

## Chapter 6 — The morning review

You wake up. The morning brief is waiting.

### The review process

1. **Read the morning brief issue.** Scan the risk assessments.
2. **LOW risk PRs:** Quick eyeball, merge if CI passes. 30 seconds each.
3. **MEDIUM risk PRs:** Read the diff, check the approach. 2-5 minutes each.
4. **HIGH risk PRs:** Full review. Check for architectural implications. 5-15 minutes each.
5. **Follow-up observations:** For anything that looks off but isn't blocking, file a new observation. It becomes tomorrow's sprint input.

### Time budget

| Activity | Time |
|---|---|
| Read morning brief | 2 min |
| Review LOW risk PRs (3-5) | 5 min |
| Review MEDIUM risk PRs (1-2) | 10 min |
| Review HIGH risk PRs (0-1) | 10 min |
| File follow-up observations | 3 min |
| **Total** | **~30 min** |

Combined with the 20-minute evening sweep, your total daily time in the loop is ~50 minutes. The agent handles the other 23 hours.

---

## Chapter 7 — The GitHub Actions glue

Optional automation to connect the pieces.

### Auto-label new issues

`.github/workflows/auto-label.yml`:

```yaml
name: Auto-label issues
on:
  issues:
    types: [opened]

jobs:
  label:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/github-script@v7
        with:
          script: |
            const body = context.payload.issue.body || '';
            const labels = [];
            
            // Parse metadata footer
            const match = body.match(/source:\s*(\w+)/);
            if (match) labels.push(`source:${match[1]}`);
            
            const area = body.match(/area:\s*(.+)/);
            if (area) labels.push(`area:${area[1].trim()}`);
            
            // Default: mark as sprint-ready unless it needs triage
            if (!body.includes('needs-triage')) {
              labels.push('sprint-ready');
            }
            
            if (labels.length > 0) {
              await github.rest.issues.addLabels({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.payload.issue.number,
                labels,
              });
            }
```

### Auto-close sprint plan issues when PR merges

`.github/workflows/close-sprint.yml`:

```yaml
name: Close sprint plans on merge
on:
  pull_request:
    types: [closed]

jobs:
  close:
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - uses: actions/github-script@v7
        with:
          script: |
            const body = context.payload.pull_request.body || '';
            const refs = body.match(/Closes #(\d+)/g) || [];
            for (const ref of refs) {
              const num = ref.match(/\d+/)[0];
              await github.rest.issues.update({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: parseInt(num),
                state: 'closed',
              });
            }
```

---

## Chapter 8 — Putting it all together

### Day 1 setup checklist

- [ ] Create labels in your GitHub repo (Chapter 1)
- [ ] Add `CLAUDE.md` to your repo root (Chapter 2)
- [ ] Add the skill file at `.claude/skills/agentic-workflow.md` (Chapter 2)
- [ ] Set up the planner routine in Claude Code (Chapter 4)
- [ ] Set up the executor routine in Claude Code (Chapter 4)
- [ ] Set up the morning briefer routine in Claude Code (Chapter 4)
- [ ] Add the GitHub Actions (Chapter 6)
- [ ] Run your first testing sweep (Chapter 3)

### Week 1 rhythm

| Time | What | Duration |
|---|---|---|
| Morning | Read brief, review PRs, merge | 30 min |
| Evening | Testing sweep, capture observations | 20 min |
| Night | Planner → Executor (automated) | 0 min |

### What success looks like after 30 days

- 50-100 PRs shipped per week
- < 1 hour of human time per day in the loop
- Every PR traces back to a human observation or spec
- Zero "correct-but-drifting" code (because testing density is high enough to catch it)
- A clean, labeled backlog that reflects the actual state of the product

---

## Reference files

All the files from this playbook are available to download:

- [`CLAUDE.md` template](https://voxspek.com/playbook/CLAUDE-template.md)
- [`agentic-workflow.skill.md`](https://voxspek.com/playbook/agentic-workflow-skill.md)
- [`auto-label.yml` GitHub Action](https://voxspek.com/playbook/auto-label.yml)
- [`close-sprint.yml` GitHub Action](https://voxspek.com/playbook/close-sprint.yml)

---

## About voxspek

This playbook describes the workflow. [voxspek](https://voxspek.com) is the capture surface that makes the testing sweep fast — press a key, speak, get a structured GitHub issue in 3 seconds. Three modes: Capture, Review, Spec. Free for individuals.

The playbook works without voxspek. voxspek works without the playbook. Together, they're the complete human input layer for the agentic loop.

---

*© 2026 voxspek. This playbook is free to share, adapt, and build on. Attribution appreciated.*
