AI Tooling Labs

The point

Make the invisible boundaries visible.

These labs are not the start of the whole journey. They are the start of the tooling journey after you have picked a model-access path. The model access page explains how you get to that point in the first place.

These are not polished products. They are practice pieces. By the end, you should have a tiny local stack that grows from model access into a command-line interface, tools, structured boundaries, memory, coordination, approvals, logs, and evals.

The runnable files live in the repository's labs/ folder. This page explains the idea; the linked artifacts are the pieces to inspect, run, and change.

Recommended starts

Pick the shortest entry that matches where you already are.

You do not need to absorb the whole hub before choosing a path. Start from your actual situation, then come back to the full sequence when you want the broader build map.

Bootstrap first

Do this before lab 00.

The labs assume one thing only: you have chosen how you will reach a model. That choice can be a subscription product, a direct provider API, a managed model platform, a router, or a local host.

Pick the access path. Use starting paths if you are not sure whether you need a product, API, platform, router, or local host.

Reduce it to one boring interface. Before any agent tooling exists, prove you can make one repeatable request and inspect one repeatable response. If that path uses a provider key, read API key security before you wire it into a host.

Then enter lab 00. From that point on, the labs are about wrapping and extending the model surface, not choosing it.

Starting from zero Begin at step 1. If you do not yet know whether you are using a subscription, API, platform, router, or local host, start with the access-choice step and work straight through. Already have an API key Skip to step 2. If you already have a provider API key or local endpoint, you have chosen the access path. Now reduce it to one boring request/response shape, then continue into lab 00 below. Already have platform access Skip to step 2. If your organization already gives you a deployment or endpoint through a managed platform, treat that platform surface as your starting interface and reduce it to one boring request shape. Already have CLI agent access Skip to step 3. If you are arriving through a hosted CLI agent surface such as Claude Code-style subscription access, treat that CLI as your starting model surface. Read lab 00 for translation, then jump to lab 01 if you want to focus on tooling layers.

Optional deeper entrance

You can now start from a real local endpoint before bootstrap.

This is not a separate curriculum. It is the deeper version of the same beginning for people who want to start from an actual local model instead of a toy model surface or hosted endpoint.

Choose a tiny open model

Pick a small instruct model with permissive enough terms and modest hardware needs. Optimize for learnability and easy hosting, not prestige.
Choose one runtime

Run one local host that can expose a stable endpoint. Avoid mixing several runtimes at the same time while learning the boundary.
Download one artifact variant

Pick one artifact format and quantization that the runtime actually supports, then document exactly what was chosen and why.
Prove one local request

Before any tooling work begins, show one repeatable prompt and response against the local endpoint.
Translate into lab 00

Once the local endpoint is real, wrap it in the same boring interface shape used by lab 00. From there, the main lab path stays the same.

Artifact: optional real local bootstrap path. Reference pages: model access and local hosting and model artifacts.

Run and reset

There is a smoke-test path now.

Run all examples with labs/run_all.py. Restore mutable lab files with labs/reset.py.

python3 labs/run_all.py
python3 labs/reset.py

Build map

The mini stack you will assemble

Lab	You need	You add	Why it matters
Bootstrap	A decision about model access	One chosen path plus one repeatable request or CLI surface	The rest of the lab path only makes sense after the model surface exists.
Optional pre-bootstrap	No usable model surface yet	A real local model endpoint	This is the deeper version of the same beginning, not a different journey.
0	A way to talk to a model	A tiny model CLI	Model access becomes something you can actually use.
1	A useful action outside chat	A dumb CLI an AI can call	The model can now rely on a deterministic capability.
2	Machine-readable results	A stable JSON wrapper	Tool calls become easier to validate, log, and replay.
3	Tool discovery	A tiny protocol adapter	A host can discover and call tools without knowing CLI flags.
4	Repeatable judgment	A skill/procedure file	The system learns when and how to use the tool well.
5	Boundary checks	A lifecycle hook	Policy and logging happen without changing the tool.
6	Multi-step work	A tiny agent loop	The system can observe, decide, act, and evaluate.
7	Durable state	A memory/task graph	Work survives restarts and dependency order becomes visible.
8	More than one worker	A workspace coordinator	Claims and handoffs keep parallel work from colliding.
9	A usable control surface	A host-like CLI	Users can inspect tools, approve calls, and see results.
10	Trust and repeatability	Governance, evals, and tool-call logs	Actions become auditable and failures become visible.
11	The whole shape	A capstone flow	The pieces form one small governed workflow.

Lab rules

Keep the pieces small enough to understand.

Prefer boring tools

A useful agent tool has predictable flags, predictable output, and predictable errors. Fancy comes later.

Use JSON at boundaries

JSON makes tool results easy to inspect, log, validate, replay, and pass between layers.

Log decisions

If an agent can act, you should be able to see what it tried, what happened, and why the next step was chosen.

Main spine

Build from model access to governed agent

Each lab now has its own page. Use this hub to keep the sequence in view, then open the dedicated page when you want the fuller teaching copy, runnable command, artifact links, and a real-world analog.

Bootstrap the model surface

Choose the path to the model, then reduce it to one stable request or CLI surface before you build any tooling on top.

Real-world analog: curl for proving one boring request/response path.

Open the bootstrap page.

Start with model access

Turn model access into one repeatable local command so the rest of the stack has something concrete to wrap.

Real-world analog: Ollama CLI.

Open lab 00.

Build a dumb CLI tool an AI can call

Build one deterministic capability with stable flags, useful exit codes, and no hidden state.

Real-world analog: Git CLI, especially commands like git status and git grep.

Open lab 01.

Wrap the CLI in a stable JSON interface

Keep the tool, but give it one machine-readable output shape that callers can validate, log, and replay.

Real-world analog: ripgrep's JSON mode.

Open lab 02.

Build a tiny protocol adapter

Expose discovery and tool calling as a protocol boundary instead of making every host learn raw CLI flags.

Real-world analog: Model Context Protocol.

Open lab 03.

Write a skill/procedure file

Separate the procedure from the tool so the usage pattern is reusable, reviewable, and teachable.

Real-world analog: Make and similar checked-in procedure files.

Open lab 04.

Add a hook/lifecycle automation example

Add policy and logging around the tool without editing the tool itself.

Real-world analog: Git hooks.

Open lab 05.

Build a tiny agent loop

Make the observe-decide-act-evaluate loop visible before you let a real model hide that control flow.

Real-world analog: LangGraph.

Open lab 06.

Build a memory/task graph

Persist task state and dependencies so work survives restarts and the next unblocked task is always visible.

Real-world analog: Taskwarrior.

Open lab 07.

Build a workspace coordinator

Coordinate multiple workers with a queue, claims, and a readable handoff trail.

Real-world analog: GitHub Actions.

Open lab 08.

Build a host-like CLI agent

Give the user a control surface that can list tools, request approvals, and show a readable history of actions.

Real-world analog: Aider.

Open lab 09.

Add governance, evals, and logging around a tool call

Surround the stack with audit records, replayable evals, and explicit policy outcomes.

Real-world analog: OpenTelemetry.

Open lab 10.

Wire the tiny stack together

Combine host, tools, policy, durable state, and evals into one small governed workflow.

Real-world analog: OpenHands.

Open lab 11.

Optional paths

Branch when the narrative fits your starting point.

These are not mandatory next steps in the main spine. They are useful companion tracks when you want to go deeper into persistent assistants or credential boundaries.

Stretch

Stretch goal: build a toy persistent assistant platform

This stretch goal turns the late-game comparison into working code: a tiny gateway with channels, durable memory, skills, routing, scheduling, approvals, and logs.

Real-world analogs: OpenClaw and Hermes Agent.

Open the persistent-platform stretch page.

Security

Security side path: put model access behind a localhost broker

This optional security side path adds a narrow local proxy so a host or agent can use model access without receiving the raw provider key directly.

Real-world analog: internal API gateways and local credential brokers.

Open the local-broker side path.

Local

Optional path: bootstrap a real local model endpoint

Probe a real local runtime, list models, and prove one boring local response before you rejoin the main lab spine.

Real-world analogs: Ollama, LM Studio, and OpenAI-compatible local servers.

Open the real local bootstrap path.

Backend

Optional path: move the broker behind a backend boundary

Replace the localhost-only teaching split with a stronger production-shaped boundary where the backend owns the provider secret and the host presents only backend credentials.

Real-world analogs: internal API gateways, backend-for-frontend services, managed-identity-backed apps.

Open the backend-broker path.

After the labs

What to compare with real tools

Compare shape, not polish

Your toy protocol is not MCP, your task graph is not Beads, and your coordinator is not Gas Town. But the boundaries should feel familiar: typed calls, durable state, claims, handoffs, approvals, and logs.

Then inspect real products

Look at the tooling catalog and ask the same questions: what is open, what is hosted, what runs locally, where does memory live, who approves actions, and what becomes long-running once the system leaves the terminal?