PRD 契约(可编辑)
方案拆解(JSON,可编辑)
保存方案
模块定位(Impact Heatmap)
Target Files
确认方案后点击“模块定位”,生成 target files。
人工边界调整
应用边界调整
Evidence Ledger
运行 Official L1 Demo 或刷新当前需求证据账本后显示 workflow events。
Observability(P1)
Observability
AI calls not loaded
刷新后展示 AI 调用次数、fallback、latency、tokens 和 cost;原始日志保留下方。
刷新 Observability
—
Skill Marketplace(Stage 5)
Skill Marketplace
Skills not loaded
刷新 Skill 列表后,系统会展示可复用交付技能和 dry-run 匹配结果。
刷新 Skill 列表
Context Strategy Lab(Stage 5)
Context Strategy Lab
Strategy comparison not run
创建需求后比较 minimal / balanced / defensive,先看推荐策略,再查看原始 JSON。
L1(frontend-only)
L2(cross-stack)
Compare Strategies
点击 “Compare Strategies” 生成 minimal / balanced / defensive 对比
—
Delivery Memory Loop(Stage 5)
Delivery Memory Loop
Memory suggestion not generated
写入或召回历史交付经验后,这里会显示 reuse decision、confidence 和可应用范围。
Write Memory(from current req)
Suggest
Apply Suggestion
说明:建议包含 suggestedSkill / contextStrategy / selectedFiles / excludedFiles / doNotTouch / testStrategy / confidence / reuseDecision,并会写入 workflow events(memory_written / memory_suggested / memory_applied)。
Requirement Benchmark Suite(Stage 5)
刷新 Cases
Run All(Quick)
Benchmark Cases
点击 Refresh Cases 查看 L1 / L2 / L3 标准需求集。
Scoreboard
Run All Quick to produce average score, by-level scores, and weak areas.
Run Selected
Selected Case Detail
Select a benchmark case to inspect expected capability profile and failures.
说明:Benchmark v1 目标是评估 Skill/Clarification/PRD/Plan/Context/Cross-stack Gate(coverImage)能力,不要求所有 case 都生成 patch。L3 case 必须体现“先澄清,不直接 patch”。
Demo Scenario Runner
Refresh Scenarios
Run Selected
Scenario List
点击 Refresh Scenarios 后选择 official L1、coverImage review 或 memory reuse 场景。
Run Detail
Run Selected 后展示 scenario run、状态和 evidence refs。
Step Timeline
运行场景后显示每一步:requirement、context、patch、test、review。
Evidence Links
Scenario evidence links appear after a run.
说明:Demo Scenario Runner 将 Skill、Context、Memory、Benchmark、Review、Replay 串成可一键运行的回归/演示场景,输出 ordered steps 与 evidenceRefs。
Human Review Workbench(P1)
Generate Queue(from latest benchmark)
Refresh Queue
Refresh Summary
Review Detail
Failure reasons, evidence, actions, decisions
Replay Panel
Request Only / Dry Run / Execute Downstream / Rollback
Request Only
记录人工重放请求,不改下游产物。
Dry Run
生成 Artifact Diffs 预览,不写正式 gate。
Execute Downstream
需 approval code,执行下游重放。
Rollback
回滚已执行的 replay run。
request_only
dry_run
execute_downstream
Request Replay
Dry Run
Execute
Rollback
Safety Checks
Dry Run 或 Execute 前显示安全检查。
Artifact Diffs
Dry Run 后显示 artifact diffs。
Selected Item Detail JSON
—
说明:Review Workbench 以 Benchmark 失败/弱点队列为主,支持 approve/reject/request-replay/close;Controlled Replay 支持 request_only、dry_run、execute_downstream 和 rollback,execute 需要人工 approval code。
Evaluation Dashboard(P1)
Refresh Evaluation Dashboard
Benchmark Scoreboard
Refresh Evaluation Dashboard to load benchmark average, cross-stack gate, replay, strict verification, and validation status.
Capability Metrics
Capability metrics load Skill Match, Context Strategy, Memory Reuse, and Review Queue signals.
AI Cost Summary
AI call count, estimated tokens, latency, and cost appear after refresh.
Capability Validation Suite
Real AI Capability Test
fallback
Run L1
Run L2
Run L3
Run All
Failed Cases
Run Capability Suite to inspect failed cases. In public demo mode, real AI live calls remain CLI-only.
Real AI / Cost Status
AI_MODE and ARK_* are read only on the server. Secret values are never returned to the browser.
说明:Evaluation Dashboard 聚合 Benchmark Scoreboard、Skill Accuracy、Context Strategy Distribution、Memory Reuse Decisions、Review Queue Open Items、Cross-stack Gate、Replay success rate、Strict verification status、Validation pass rate 和 AI latency/usage/cost。
Governance Policy Engine(P1)
Governance Policy Engine
Policies not evaluated
刷新 policies 后可查看 scope、secret、cross-stack 和 human approval 门禁;Evaluate 后显示 pass/block 摘要。
Refresh Policies
Evaluate Sample
说明:Policy Engine 将 scope guard、secret hygiene、cross-stack gate 和 human approval 统一为可审计 policy decisions,并保留 evidenceRefs。
Final Evidence Center(P1)
Refresh Evidence Center
Export JSON + Markdown + HTML
Coverage Matrix
Refresh Evidence Center to see Requirement / Clarification / Plan / Context / Patch / Test / PR / Review / Replay / Deployment coverage.
Verification Commands
Verification commands appear after evidence refresh.
Export Result
Export JSON + Markdown + HTML after the evidence center is refreshed.
说明:Final Evidence Center 汇总 official L1、coverImage cross-stack、Memory、Benchmark、Review、Controlled Replay、Observability 和 strict verification 证据,并支持离线 JSON/Markdown/HTML 导出。
Patch Theatre(Stage 3)
Patch Theatre
Patch not generated
先确认方案和模块边界,再生成 patch。原始 diff 会保留在下方。
生成 Patch
Approve
Reject
Apply
Revert
—
Spec-to-Diff Trace Matrix
DoNotTouch / Scope Guard
Test Arena(Stage 3)
Test Arena
Tests not run
Patch apply 后生成 Test Plan,再运行白名单测试。
生成 Test Plan
确认 Test Plan
运行测试
—
—
QA Report(Stage 3)
刷新 QA Report
标记 READY_FOR_QA_LOCAL
—
PR & CI Manager
Preview Deploy(Stage 4)
Preview Deploy
Preview not deployed
先生成并确认 Deployment Plan,再执行 preview / smoke / release note。
生成 Deployment Plan
确认 Deployment Plan
Deploy Preview
Stop
Rollback
—
Deployment / Smoke
Run Smoke Tests
生成 Release Note
—
—
—
Native Conduit Preview Adapter
Native Preview
Native preview not checked
公网 Demo 默认阻断 native start;这里只展示真实依赖检查和 blocker reason。
Start Native Preview
Stop Native Preview
Native Preview Plan / Status
—
Dependency / Health Check
—
说明:contract preview 仍是快速稳定路径;native preview 只在真实 health check 通过后显示 URL,否则记录 NATIVE_PREVIEW_BLOCKED,不伪造成功。
Knowledge Flywheel(Stage 4)
Knowledge Flywheel
Knowledge not written
QA evidence ready 后写入知识,后续需求可复用 playbook。
写入知识
搜索
—
Reuse Replay(Stage 4)
Reuse Replay
Reuse not evaluated
输入相似需求并 Find Similar,验证历史知识是否能指导新交付。
Find Similar
Mark Verified
—
Evidence Hub(Stage 4)
Evidence Hub
Evidence hub not refreshed
刷新后汇总 requirement、PRD、plan、module location、patch、test、deploy 和 knowledge 证据。
刷新 Evidence Hub
导出 JSON/Markdown
—