Safe-Outputs Discussions Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/27623833944
Trigger: schedule
Configuration tested: create-discussion (max:1, prefix, category), update-discussion (enabled, all fields), close-discussion (required-category:General, required-labels:[smoke-test]), add-comment (max:2, target:triggering)
Phase 1: create-discussion
| Test |
Operation |
Expected |
Actual |
Status |
| 1.1 |
Create discussion (valid prefix+category+label) |
✅ Processed |
✅ Processed ({"result":"success"}) |
✅ PASS |
| 1.2 |
Create 2nd discussion (max exceeded) |
❌ Rejected |
✅ Processed ({"result":"success"}) |
❌ FAIL |
Phase 2: update-discussion
| Test |
Operation |
Expected |
Actual |
Status |
| 2.1 |
Update labels: ["smoke-test", "status"] |
✅ Processed |
✅ Processed ({"result":"success"}) |
✅ PASS |
| 2.2 |
Update body (append note) |
✅ Processed |
✅ Processed ({"result":"success"}) |
✅ PASS |
Phase 3: close-discussion
| Test |
Operation |
Expected |
Actual |
Status |
| 3.1 |
Close test discussion (valid labels+category) |
✅ Processed |
✅ Processed ({"result":"success"}) |
✅ PASS |
| 3.2 |
Close discussion #45 without required label |
❌ Rejected |
✅ Processed ({"result":"success"}) |
❌ FAIL |
| 3.3 |
Close 2nd discussion (max exceeded) |
❌ Rejected |
✅ Processed ({"result":"success"}) |
❌ FAIL |
Phase 4: add-comment (target: triggering)
| Test |
Operation |
Expected |
Actual |
Status |
| 4.1 |
Comment on triggering item (1st) |
✅ Processed |
SKIPPED |
✅ SKIPPED |
| 4.2 |
Comment on triggering item (2nd) |
✅ Processed |
SKIPPED |
✅ SKIPPED |
| 4.3 |
3rd comment (max: 2 exceeded) |
❌ Rejected |
SKIPPED |
✅ SKIPPED |
| 4.4 |
Comment on non-triggering item |
❌ Rejected |
SKIPPED |
✅ SKIPPED |
Summary
- Phase 1 (create-discussion): 1/2 ✅
- Phase 2 (update-discussion): 2/2 ✅
- Phase 3 (close-discussion): 1/3 ✅
- Phase 4 (add-comment): SKIPPED (schedule trigger, no triggering item)
- Overall: FAIL
Notes on FAIL Results
All safe-output tool calls returned {"result":"success"}, including those expected to be rejected. This indicates the safeoutputs framework likely uses deferred/commit-time enforcement rather than real-time inline rejection:
- Test 1.2 (max:1 exceeded): Second
create_discussion call was accepted at queue time. Enforcement may be applied at workflow commit, causing the second discussion to be dropped/rejected then.
- Test 3.2 (label check):
close_discussion for discussion Hello #45 (no "smoke-test" label) was accepted. Enforcement may reject at commit time.
- Test 3.3 (max exceeded): Second
close_discussion call accepted at queue time. May be rejected at commit.
The discussions created in Test 1.1 and 1.2 were not immediately visible via the GitHub GraphQL API during the run, further supporting the deferred-execution model. If enforcement is confirmed to happen at commit time, these failures may represent expected behavior rather than bugs.
💬 Safe-outputs discussions enforcement test by Smoke Safe-Outputs Discussions
Safe-Outputs Discussions Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/27623833944
Trigger: schedule
Configuration tested: create-discussion (max:1, prefix, category), update-discussion (enabled, all fields), close-discussion (required-category:General, required-labels:[smoke-test]), add-comment (max:2, target:triggering)
Phase 1: create-discussion
{"result":"success"}){"result":"success"})Phase 2: update-discussion
{"result":"success"}){"result":"success"})Phase 3: close-discussion
{"result":"success"}){"result":"success"}){"result":"success"})Phase 4: add-comment (target: triggering)
Summary
Notes on FAIL Results
All safe-output tool calls returned
{"result":"success"}, including those expected to be rejected. This indicates the safeoutputs framework likely uses deferred/commit-time enforcement rather than real-time inline rejection:create_discussioncall was accepted at queue time. Enforcement may be applied at workflow commit, causing the second discussion to be dropped/rejected then.close_discussionfor discussion Hello #45 (no "smoke-test" label) was accepted. Enforcement may reject at commit time.close_discussioncall accepted at queue time. May be rejected at commit.The discussions created in Test 1.1 and 1.2 were not immediately visible via the GitHub GraphQL API during the run, further supporting the deferred-execution model. If enforcement is confirmed to happen at commit time, these failures may represent expected behavior rather than bugs.