The Short Answer
Fine-tuning changes model weights to specialize behavior. Kayba improves agent behavior through in-context learning — better system prompts, built from a Skillbook of learned strategies. No GPUs, no training data pipelines, no model lock-in.
Choose fine-tuning if you need a model that inherently knows domain-specific patterns and you have clean, labeled training data.
Choose Kayba if you want your agent to learn from its own execution traces and improve continuously without the infrastructure overhead of fine-tuning.
How They Work
Fine-Tuning
Fine-tuning adjusts the model's weights using supervised examples. You provide input-output pairs, and the model learns to reproduce patterns from that data.
The process:
- Collect and curate training data (hundreds to thousands of examples)
- Format data for the fine-tuning API (or run your own training pipeline)
- Train on GPUs (minutes to hours depending on model size)
- Evaluate the fine-tuned model
- Deploy and swap the model endpoint
- Repeat when behavior drifts or new patterns emerge
Kayba
Kayba improves agents by learning at the prompt level. It analyzes what your agent did (execution traces), extracts what worked and what didn't (skills), and generates better system prompts.
The process:
- Pipe in your agent's execution traces (any format)
- The Recursive Reflector analyzes traces via REPL-based code execution
- Skills are extracted into a Skillbook with helpful/harmful counters
- You review and approve learned skills
- New system prompts are generated from the Skillbook
- Continuous — new traces refine existing skills over time
Comparison
| Dimension | Fine-Tuning | Kayba |
|---|---|---|
| What changes | Model weights | System prompt (in-context) |
| GPU required | Yes | No |
| Training data needed | Hundreds to thousands of labeled examples | Existing agent traces (no labeling) |
| Time to improve | Hours (training) + deployment | Minutes (analysis + prompt generation) |
| Model lock-in | Yes — fine-tuned model is provider-specific | No — works with any LLM provider |
| Transparency | Opaque (weight changes aren't interpretable) | Transparent (Skillbook is human-readable) |
| Continuous learning | Requires retraining | Incremental (delta updates to Skillbook) |
| Cost | GPU compute + data engineering | LLM API calls for analysis only |
| Rollback | Swap model versions | Remove or adjust individual skills |
Why Teams Are Moving Away from Fine-Tuning for Agents
Fine-tuning was the default answer for "how do I make my model better at X." But for production agents, several realities have shifted:
Frontier models keep improving. When GPT-5 or Claude 4.5 drops, a fine-tuned GPT-4o is suddenly behind a base model that's better out of the box. Fine-tuning locks you to a snapshot. In-context learning works with whatever model you switch to.
Agent behavior is prompt-shaped, not weight-shaped. Most agent failures aren't about the model lacking knowledge — they're about the system prompt not covering the right edge cases. A support agent that forgets to check return policy eligibility doesn't need weight updates; it needs a skill that says "always verify return eligibility before processing."
Training data is the bottleneck. Good fine-tuning requires clean, representative examples. For agents, this means manually labeling conversation traces as good or bad — essentially doing the work Kayba automates.
Debugging is harder. When a fine-tuned model behaves unexpectedly, you can't inspect why. With Kayba's Skillbook, every learned behavior links back to the trace that produced it. You can see exactly what the agent learned and remove or adjust individual skills.
When Fine-Tuning Still Makes Sense
Fine-tuning is the right choice when:
- You need the model to learn a new output format or domain-specific language
- You have abundant, clean training data and the infrastructure to use it
- Latency is critical and you want to reduce prompt length (fine-tuning internalizes behaviors)
- You're training a specialized model for a narrow, stable task that won't change
When to Use Kayba Instead
Kayba is the better fit when:
- Your agent is already on a strong base model but makes the same mistakes repeatedly
- You don't have labeled training data (just raw execution traces)
- You want to switch LLM providers without losing what the agent learned
- You need transparency — knowing exactly what the agent learned and why
- You want continuous improvement without retraining cycles
- Your team doesn't have GPU infrastructure or ML engineering capacity
- Your agent's domain evolves (new policies, new edge cases, new integrations)
Using Them Together
Kayba and fine-tuning aren't mutually exclusive. Some teams use fine-tuning for baseline domain knowledge and Kayba for continuous improvement on top. The Skillbook captures operational learnings that fine-tuning can't easily address — the difference between knowing the domain and knowing how to handle every edge case in production.
Getting Started
Kayba is open-source and works with any LLM provider — including fine-tuned models.
pip install ace-framework
- Documentation — Setup guides and API reference
- GitHub — Source code and examples
- Dashboard — Hosted version with visual Skillbook management