· switch to light ·
— project argus — agent factory visualizer

Goal in. Governed agent out.

Four services, one pipeline. Hephaestus forges the agent from a goal spec. Nyx is the blueprint every agent inherits — Strands on AWS AgentCore, runnable locally. Proteus tunes the agent's objective against a dataset, co-creating an adversarial LLM-as-judge when the target cannot be scored with a metric. Panoptes watches everything — one view per run, one view across all experiments.

live factory floor · signals dormant viz · hephaestus → nyx → proteus → panoptes
— 01 INTAKE — 02 FORGE — 03 TUNE — 04 RUNTIME — 05 OBSERVE — input 01 Goal Spec yaml · objectives 508c · phi=✓ · tier=med — input 02 Pattern Library s3 · templates claims-triage · doc-classify · 508c — blueprint Nyx the template all agents inherit AgentCore aws · prod Local dev · optimize Strands tool gates breakers risk tiers — factory orchestrator · strands · self-hosted Hephaestus the forge · goal → deployable agent build_id bld-0f2e-a7 phase PROTEUS_AWAIT intake pattern_match scaffold_gen scaffold_eval smoke_test proteus_submit proteus_await optim_eval test_gen test_exec test_eval agent_eval packaging package_eval complete evaluator verdicts: pass ×5 retry ×0 fail ×0 · escalate ×0 judge-creator: armed — see Proteus ↔ LLM-Judge GoalSpec.yaml pattern.match(k=3) Nyx.blueprint() → AgentScaffold — autonomous optimization · strands · self-hosted Proteus the tuner · genome → winning config sweep sw-0f2e-a7·01 gen 42 / 200 — genome model = {sonnet·haiku·4o} temp = U(0.0, 1.0) prompt = {v1·v2·v3·v4} k = {3·5·10·20} strategy = bayesian(NSGA-II) — best-so-far F1 · 42 gens F1=0.78 → 0.94 Δ= +0.01 / 10 — pareto · cost ↘ vs quality ↗ 5 front · 25 dominated best.model=sonnet · temp=0.2 · k=5 · cost=$0.018/req · lat=1.2s converged=false — adversarial · co-created LLM-as-Judge dynamic evaluator for fuzzy objectives — creation Hephaestus drafts rubric Proteus tunes judge prompt calibrated vs. gold set — adversarial loop candidate ↔ judge ↔ red-team judge challenges weak outputs candidate adapts · loop until stable scaffold + genome + dataset winning_config rubric.draft() challenge verdict — artifact Deployment Package nyx-deployment.yaml · signed permissions · breakers · gates · briefing — runtime · governed by nyx Running Agents strands · fargate · agentcore · otel 508c-agent v1.2 · aws runs=142 cost=$3.2/d claims-triage v3.0 · aws runs=891 cost=$22/d doc-summarizer v0.9 · local runs=14 (opt) cost=$0 — deployment targets AgentCore · Local prod runs aws · dev+tune runs local AWS AgentCore bedrock routing · ecs fargate · step fns Local runtime (must-have) same strands agent · for Proteus tune loops package(cfg*, agent) — observability · inventory · compliance Panoptes the hundred eyes · every run · every decision events/s 183 open exp 14 phi accessaudit=ok rai alerts0 — view 01 · single run · open telemetry exp=run-a7 · 1.82s · $0.019 0ms 500 1000 1500 1820 retrieve(k=5) llm.plan(sonnet) tool.fetch_pdf gate llm.generate(sonnet) judge.score retrieve llm tool gate judge — view 02 · experiment inventory · all runs agent=508c · 42 generations EXP_IDSRCTYPE STATUSF1LAT$/reqWHEN exp-a7-42 prt optim_gen running 0.942 ★ 1.18s $0.018 just now exp-a7-41prtoptim_gen complete 0.931 1.34s $0.021 2m ago run-a7-182nyxagent_run complete 1.82s $0.019 6m ago bld-0f2e-a7hphagent_build running $0.84 4h · phase 7/14 exp-a6-12prtoptim_gen failed 0.64 4.9s $0.11 2h ago · breaker
— 01 · hephaestus
The Forge
factory orchestrator
Turns a GoalSpec into a governed agent. 14 phases with evaluator gates (pass / retry / fail / escalate). Loads the Nyx blueprint, picks a pattern, generates Strands code, hands a genome to Proteus, then packages the winning config for deployment. Drafts the adversarial LLM-as-judge when objectives are fuzzy.
— 02 · nyx
The Blueprint
template + runtime envelope
Every agent inherits Nyx — Strands tools, risk tiers, circuit breakers, gated actions, morning briefings. Targets AWS AgentCore in prod and runs locally — non-negotiable, because Proteus must optimize them on dev loops.
— 03 · proteus
The Tuner
genome → winning config
Searches a genome (models, prompts, retrieval knobs) against a versioned dataset. Bayesian + NSGA-II for multi-objective Pareto trade-offs. When a metric can't score the objective, Proteus co-trains the LLM-as-judge and runs an adversarial loop until the judge is stable and the candidate converges.
— 04 · panoptes
The Hundred Eyes
otel · inventory · compliance
Two views. Single run: an OTel trace of one execution — spans for retrieval, LLM calls, tool invocations, gates, judge verdicts. Inventory: every experiment ever — filter by source (hph / prt / nyx), status, tags, parent. Exports a self-contained audit package for regulators.