Testing Your Skills
A skill that works once is not reliable — it needs to produce the same quality output across different inputs. Test slash command invocation, auto-detection, and output consistency, then keep a regression log so you notice regressions when you edit a skill.
Three steps to test any skill
Run through these three phases every time you create or edit a skill.
| Step | Focus | What to verify |
|---|---|---|
| 1 | Test invocation | Slash command activates; auto-detection triggers on natural phrases |
| 2 | Test output quality | Format, length, and content match your SKILL.md spec |
| 3 | Keep a test log | Record inputs, expected outputs, and results for regression checks |
Method 1 — “Try in chat” from the skills page
The fastest way to test a skill after creating or editing it is the Try in chat button in the claude.ai skills manager. It opens a fresh conversation with the skill pre-loaded so you can type a test prompt immediately.
Your skills
Review code for bugs, security issues, and style violations.
Write a Conventional Commit message from staged changes.
Generate a daily standup update.
/standup, but something like “I need my standup for today” — to test auto-detection at the same time.Method 2 — test in Claude Code
For skills in ~/.claude/skills/, start Claude Code in your project and run the slash command directly. Test three things in sequence.
| Test | Prompt | Expected |
|---|---|---|
| A | /code-review | Skill activates immediately |
| B | can you check src/auth.py for me? | Skill activates without the slash |
| C | /code-review src/auth.py | Output matches your format spec |
cat ~/.claude/skills/code-review/SKILL.md to verify.What to check on every test run
Use this checklist for each test. If any row fails, the fix usually lives in the corresponding SKILL.md section.
| Check | If it fails |
|---|---|
| Slash command activates the skill | Check the name field in front matter |
| Auto-detection phrase triggers the skill | Add more trigger phrases to the description |
| Output uses the correct format | Add or update ## Output format |
| Output length is appropriate | Set a word or line limit in ## Output format |
| Claude stays within the skill's scope | Add a ## Constraints section |
| File access works when expected | Add Read to allowed-tools in front matter |
| Output is consistent across multiple runs | Make steps and output format more explicit |
Common failures and how to fix them
| Symptom | Cause | Fix |
|---|---|---|
| Skill never auto-detects | Trigger phrases are too vague or not listed in the description | Add: Triggered by: "standup", "daily update", "/standup" |
| Output format changes every run | No ## Output format section, or it is too vague | Specify exact structure: headings, bullet depth, max word count |
| Claude rewrites the whole file instead of annotating | No constraint preventing full rewrites | Add to ## Constraints: "Never rewrite the entire file — annotate only." |
| Skill runs but ignores allowed-tools | allowed-tools is missing or misspelled in front matter | Check YAML: allowed-tools: [Read] (capital R, list format) |
| Slash command not in autocomplete | Skill is in the wrong directory or has a YAML parse error | Verify path: ~/.claude/skills/<name>/SKILL.md and validate YAML |
Keep a simple test log
Every time you edit a skill, re-run your previous test cases. A plain Markdown file stored alongside the skill is enough — no framework needed. Include the input, expected output, and result (PASS or FAIL).
# Skill Test Log — /code-review ## Test 1 — basic Python file Input: sample.py (20 lines, one SQL injection) Expect: at least one Critical finding mentioning SQL injection Result: PASS — flagged on line 14, provided fix snippet ## Test 2 — clean file Input: utils.py (no issues) Expect: "No critical issues found" or similar clean message Result: PASS — responded "Code looks clean. 0 critical, 0 warnings." ## Test 3 — trigger phrase auto-detection Input: "can you check my auth.py for problems?" Expect: skill auto-activates and reviews auth.py Result: PASS — auto-detected and ran /code-review ## Test 4 — large file (500 lines) Input: api_server.py Expect: structured findings, does not summarise whole file Result: FAIL — rewrote entire file instead of annotating Fix: Added "Do NOT rewrite the file" to ## Constraints
Before you continue
- Test invocation (slash command and auto-detection) before output quality.
- Use Try in chat for quick checks; use Claude Code for project-level skills.
- Failed checks usually map to a specific SKILL.md section — description, steps, or constraints.
- Keep a TEST_LOG.md so regressions are visible after every edit.
- Next lesson: Pass Arguments to Skills.
What's Next
Testing ensures your skills behave correctly. Next: pass arguments so a single skill can handle many different inputs.