The Short Answer

Fine-tuning changes model weights to specialize behavior. Kayba improves agent behavior through in-context learning — better system prompts, built from a Skillbook of learned strategies. No GPUs, no training data pipelines, no model lock-in.

Choose fine-tuning if you need a model that inherently knows domain-specific patterns and you have clean, labeled training data.

Choose Kayba if you want your agent to learn from its own execution traces and improve continuously without the infrastructure overhead of fine-tuning.

How They Work

Fine-Tuning

Fine-tuning adjusts the model's weights using supervised examples. You provide input-output pairs, and the model learns to reproduce patterns from that data.

The process:

Collect and curate training data (hundreds to thousands of examples)
Format data for the fine-tuning API (or run your own training pipeline)
Train on GPUs (minutes to hours depending on model size)
Evaluate the fine-tuned model
Deploy and swap the model endpoint
Repeat when behavior drifts or new patterns emerge

Kayba

Kayba improves agents by learning at the prompt level. It analyzes what your agent did (execution traces), extracts what worked and what didn't (skills), and generates better system prompts.

The process:

Pipe in your agent's execution traces (any format)
The Recursive Reflector analyzes traces via REPL-based code execution
Skills are extracted into a Skillbook with helpful/harmful counters
You review and approve learned skills
New system prompts are generated from the Skillbook
Continuous — new traces refine existing skills over time

Comparison

Dimension	Fine-Tuning	Kayba
What changes	Model weights	System prompt (in-context)
GPU required	Yes	No
Training data needed	Hundreds to thousands of labeled examples	Existing agent traces (no labeling)
Time to improve	Hours (training) + deployment	Minutes (analysis + prompt generation)
Model lock-in	Yes — fine-tuned model is provider-specific	No — works with any LLM provider
Transparency	Opaque (weight changes aren't interpretable)	Transparent (Skillbook is human-readable)
Continuous learning	Requires retraining	Incremental (delta updates to Skillbook)
Cost	GPU compute + data engineering	LLM API calls for analysis only
Rollback	Swap model versions	Remove or adjust individual skills

Why Teams Are Moving Away from Fine-Tuning for Agents

Fine-tuning was the default answer for "how do I make my model better at X." But for production agents, several realities have shifted:

Frontier models keep improving. When GPT-5 or Claude 4.5 drops, a fine-tuned GPT-4o is suddenly behind a base model that's better out of the box. Fine-tuning locks you to a snapshot. In-context learning works with whatever model you switch to.

Agent behavior is prompt-shaped, not weight-shaped. Most agent failures aren't about the model lacking knowledge — they're about the system prompt not covering the right edge cases. A support agent that forgets to check return policy eligibility doesn't need weight updates; it needs a skill that says "always verify return eligibility before processing."

Training data is the bottleneck. Good fine-tuning requires clean, representative examples. For agents, this means manually labeling conversation traces as good or bad — essentially doing the work Kayba automates.

Debugging is harder. When a fine-tuned model behaves unexpectedly, you can't inspect why. With Kayba's Skillbook, every learned behavior links back to the trace that produced it. You can see exactly what the agent learned and remove or adjust individual skills.

When Fine-Tuning Still Makes Sense

Fine-tuning is the right choice when:

You need the model to learn a new output format or domain-specific language
You have abundant, clean training data and the infrastructure to use it
Latency is critical and you want to reduce prompt length (fine-tuning internalizes behaviors)
You're training a specialized model for a narrow, stable task that won't change

When to Use Kayba Instead

Kayba is the better fit when:

Your agent is already on a strong base model but makes the same mistakes repeatedly
You don't have labeled training data (just raw execution traces)
You want to switch LLM providers without losing what the agent learned
You need transparency — knowing exactly what the agent learned and why
You want continuous improvement without retraining cycles
Your team doesn't have GPU infrastructure or ML engineering capacity
Your agent's domain evolves (new policies, new edge cases, new integrations)

Using Them Together

Kayba and fine-tuning aren't mutually exclusive. Some teams use fine-tuning for baseline domain knowledge and Kayba for continuous improvement on top. The Skillbook captures operational learnings that fine-tuning can't easily address — the difference between knowing the domain and knowing how to handle every edge case in production.

Getting Started

Kayba is open-source and works with any LLM provider — including fine-tuned models.

pip install ace-framework

Documentation — Setup guides and API reference
GitHub — Source code and examples
Dashboard — Hosted version with visual Skillbook management