Prefer boring tools
A useful agent tool has predictable flags, predictable output, and predictable errors. Fancy comes later.
Hands on
These labs turn the map into code. Each one builds a deliberately small version of a major AI tooling idea, using plain files, shell commands, and simple JSON.
The point
These labs are not the start of the whole journey. They are the start of the tooling journey after you have picked a model-access path. The model access page explains how you get to that point in the first place.
These are not polished products. They are practice pieces. By the end, you should have a tiny local stack that grows from model access into a command-line interface, tools, structured boundaries, memory, coordination, approvals, logs, and evals.
The runnable files live in the repository's labs/ folder.
This page explains the idea; the linked artifacts are the pieces to
inspect, run, and change.
Recommended starts
You do not need to absorb the whole hub before choosing a path. Start from your actual situation, then come back to the full sequence when you want the broader build map.
Bootstrap first
The labs assume one thing only: you have chosen how you will reach a model. That choice can be a subscription product, a direct provider API, a managed model platform, a router, or a local host.
Pick the access path. Use starting paths if you are not sure whether you need a product, API, platform, router, or local host.
Reduce it to one boring interface. Before any agent tooling exists, prove you can make one repeatable request and inspect one repeatable response. If that path uses a provider key, read API key security before you wire it into a host.
Then enter lab 00. From that point on, the labs are about wrapping and extending the model surface, not choosing it.
Optional deeper entrance
This is not a separate curriculum. It is the deeper version of the same beginning for people who want to start from an actual local model instead of a toy model surface or hosted endpoint.
Pick a small instruct model with permissive enough terms and modest hardware needs. Optimize for learnability and easy hosting, not prestige.
Run one local host that can expose a stable endpoint. Avoid mixing several runtimes at the same time while learning the boundary.
Pick one artifact format and quantization that the runtime actually supports, then document exactly what was chosen and why.
Before any tooling work begins, show one repeatable prompt and response against the local endpoint.
Once the local endpoint is real, wrap it in the same boring interface shape used by lab 00. From there, the main lab path stays the same.
Artifact: optional real local bootstrap path. Reference pages: model access and local hosting and model artifacts.
Run and reset
Run all examples with labs/run_all.py.
Restore mutable lab files with labs/reset.py.
python3 labs/run_all.py
python3 labs/reset.py
Build map
| Lab | You need | You add | Why it matters |
|---|---|---|---|
| Bootstrap | A decision about model access | One chosen path plus one repeatable request or CLI surface | The rest of the lab path only makes sense after the model surface exists. |
| Optional pre-bootstrap | No usable model surface yet | A real local model endpoint | This is the deeper version of the same beginning, not a different journey. |
| 0 | A way to talk to a model | A tiny model CLI | Model access becomes something you can actually use. |
| 1 | A useful action outside chat | A dumb CLI an AI can call | The model can now rely on a deterministic capability. |
| 2 | Machine-readable results | A stable JSON wrapper | Tool calls become easier to validate, log, and replay. |
| 3 | Tool discovery | A tiny protocol adapter | A host can discover and call tools without knowing CLI flags. |
| 4 | Repeatable judgment | A skill/procedure file | The system learns when and how to use the tool well. |
| 5 | Boundary checks | A lifecycle hook | Policy and logging happen without changing the tool. |
| 6 | Multi-step work | A tiny agent loop | The system can observe, decide, act, and evaluate. |
| 7 | Durable state | A memory/task graph | Work survives restarts and dependency order becomes visible. |
| 8 | More than one worker | A workspace coordinator | Claims and handoffs keep parallel work from colliding. |
| 9 | A usable control surface | A host-like CLI | Users can inspect tools, approve calls, and see results. |
| 10 | Trust and repeatability | Governance, evals, and tool-call logs | Actions become auditable and failures become visible. |
| 11 | The whole shape | A capstone flow | The pieces form one small governed workflow. |
Lab rules
A useful agent tool has predictable flags, predictable output, and predictable errors. Fancy comes later.
JSON makes tool results easy to inspect, log, validate, replay, and pass between layers.
If an agent can act, you should be able to see what it tried, what happened, and why the next step was chosen.
Main spine
Each lab now has its own page. Use this hub to keep the sequence in view, then open the dedicated page when you want the fuller teaching copy, runnable command, artifact links, and a real-world analog.
Choose the path to the model, then reduce it to one stable request or CLI surface before you build any tooling on top.
Real-world analog: curl for proving one boring request/response path.
Turn model access into one repeatable local command so the rest of the stack has something concrete to wrap.
Real-world analog: Ollama CLI.
Build one deterministic capability with stable flags, useful exit codes, and no hidden state.
Real-world analog: Git CLI, especially commands like git status and git grep.
Keep the tool, but give it one machine-readable output shape that callers can validate, log, and replay.
Real-world analog: ripgrep's JSON mode.
Expose discovery and tool calling as a protocol boundary instead of making every host learn raw CLI flags.
Real-world analog: Model Context Protocol.
Separate the procedure from the tool so the usage pattern is reusable, reviewable, and teachable.
Real-world analog: Make and similar checked-in procedure files.
Add policy and logging around the tool without editing the tool itself.
Real-world analog: Git hooks.
Make the observe-decide-act-evaluate loop visible before you let a real model hide that control flow.
Real-world analog: LangGraph.
Persist task state and dependencies so work survives restarts and the next unblocked task is always visible.
Real-world analog: Taskwarrior.
Coordinate multiple workers with a queue, claims, and a readable handoff trail.
Real-world analog: GitHub Actions.
Give the user a control surface that can list tools, request approvals, and show a readable history of actions.
Real-world analog: Aider.
Surround the stack with audit records, replayable evals, and explicit policy outcomes.
Real-world analog: OpenTelemetry.
Combine host, tools, policy, durable state, and evals into one small governed workflow.
Real-world analog: OpenHands.
Optional paths
These are not mandatory next steps in the main spine. They are useful companion tracks when you want to go deeper into persistent assistants or credential boundaries.
This stretch goal turns the late-game comparison into working code: a tiny gateway with channels, durable memory, skills, routing, scheduling, approvals, and logs.
Real-world analogs: OpenClaw and Hermes Agent.
This optional security side path adds a narrow local proxy so a host or agent can use model access without receiving the raw provider key directly.
Real-world analog: internal API gateways and local credential brokers.
Probe a real local runtime, list models, and prove one boring local response before you rejoin the main lab spine.
Real-world analogs: Ollama, LM Studio, and OpenAI-compatible local servers.
Replace the localhost-only teaching split with a stronger production-shaped boundary where the backend owns the provider secret and the host presents only backend credentials.
Real-world analogs: internal API gateways, backend-for-frontend services, managed-identity-backed apps.
After the labs
Your toy protocol is not MCP, your task graph is not Beads, and your coordinator is not Gas Town. But the boundaries should feel familiar: typed calls, durable state, claims, handoffs, approvals, and logs.
Look at the tooling catalog and ask the same questions: what is open, what is hosted, what runs locally, where does memory live, who approves actions, and what becomes long-running once the system leaves the terminal?