Why Open-Source Matters for Agent Learning
When you let a tool learn from your agent's execution traces — its failures, its successes, its interaction patterns — you're giving it access to some of the most sensitive data in your product. You need to know exactly what it's doing with that data.
Most agent improvement tools are closed-source. You send your traces to their API, something happens, and you get back suggestions. You can't audit the analysis process, you can't self-host, and you can't verify that your data isn't being used for anything else.
Kayba is fully open-source under the MIT license. You can read every line of code, self-host the entire pipeline, and verify exactly how your traces are analyzed.
What Kayba Is
Kayba is a framework that makes AI agents self-improve from their own experience. It sits on top of any agent framework and adds a learning layer: analyze traces, extract skills, build a Skillbook, generate better prompts.
The framework synthesizes three published research streams:
- Agentic Context Engineering (ACE) — Three-agent architecture with delta updates for incremental Skillbook refinement. From Stanford/SambaNova research, published at ICLR 2026 (arXiv:2510.04618).
- Recursive Language Models (RLM) — REPL-based trace introspection that goes deeper than single-pass LLM analysis. From MIT CSAIL (arXiv:2512.24601).
- Dynamic Cheatsheet — Self-curated external memory with usage tracking and persistent learning. From Stanford/Together AI (arXiv:2504.07952).
Kayba is the only framework that combines these approaches into a unified, production-ready system.
The Open-Source Landscape
Agent Learning / Improvement
| Tool | Open Source | Approach | GitHub Stars |
|---|---|---|---|
| Kayba (ACE) | Yes (MIT) | Trace analysis → Skillbook → prompt generation | 2k+ |
| Lemma | No | Drift detection → prompt optimization | N/A |
| ZeroEval | No | LLM judges → prompt rewriting | N/A |
| Theta | No | Simulation → agent training | N/A |
| Redapto | No | Audit interactions → update SOPs | N/A |
| Poetiq | No | Recursive self-improving reasoning | N/A |
| Modaic | No | DSPy-based optimization | N/A |
None of the direct competitors in the agent improvement space are open-source. Kayba is the only option if you need source code access, self-hosting, or the ability to extend the framework.
Adjacent Open-Source Tools
| Tool | What It Does | Relationship to Kayba |
|---|---|---|
| LangFuse | Open-source observability (traces, evals) | Complementary — observability, not learning |
| Laminar | Open-source tracing + debugging | Complementary — visibility, not improvement |
| DSPy | Prompt optimization via search | Different approach — optimization vs. learning from experience |
| OptiLLM | Inference-time proxy with optimization techniques | Different — runtime optimization, not persistent learning |
What You Get
Core Framework (pip install ace-framework)
- Recursive Reflector — REPL-based trace analysis engine. Uses a Python sandbox with sub-LLM calls to programmatically explore agent execution traces, catching patterns that surface-level analysis misses.
- SkillManager — Manages the Skillbook via atomic operations (ADD, UPDATE, TAG, REMOVE) with embedding-based deduplication to prevent bloat.
- Prompt Generator — Compiles approved skills into organized system prompts, grouped by section.
- LiteLLM integration — Works with any LLM provider (OpenAI, Anthropic, Google, Mistral, local models).
- Multi-format trace support — Markdown, JSON, plain text. If your agent produces logs, Kayba can process them.
Key Technical Features
- Delta updates — Incremental Skillbook modifications instead of full rewrites. Prevents information loss during adaptation.
- Provenance tracking — Every skill records which trace produced it, enabling audit and debugging.
- Helpful/harmful counters — Skills track their impact over time. Reinforced when helpful, flagged when harmful.
- Embedding-based deduplication — Semantic similarity detection prevents duplicate skills from accumulating.
- TOON encoding — Tab-delimited Skillbook serialization saving 16-62% tokens vs markdown in production.
Hosted Dashboard (Optional)
For teams that want a visual interface: the hosted dashboard at use.kayba.ai provides Skillbook management, analysis pipelines, and prompt generation through a web UI. $29/month with bring-your-own API key.
The framework works entirely standalone — the dashboard is a convenience, not a dependency.
Built on Published Research
Every core concept in Kayba traces back to peer-reviewed research:
| Concept | Source | What It Contributes |
|---|---|---|
| Three-agent architecture (Generator, Reflector, Curator) | ACE paper (ICLR 2026) | Structured pipeline for agent improvement |
| Delta updates | ACE paper | Incremental learning without information loss |
| REPL-based trace analysis | RLM paper (MIT CSAIL) | Deep, programmatic analysis beyond LLM context limits |
| Self-curated external memory | Dynamic Cheatsheet paper | Persistent skill storage with usage tracking |
| Embedding-based deduplication | Kayba implementation | Production optimization for Skillbook management |
| TOON encoding | Kayba implementation | Token-efficient Skillbook serialization |
This isn't a wrapper around an API. It's a framework built on specific research contributions, extended with production engineering (deduplication, encoding, provenance tracking).
Who Uses It
Kayba is used by teams building:
- Coding agents — Learning from code review failures, codebase conventions, test patterns
- Customer support agents — Learning from policy violations, escalation mistakes, resolution patterns
- Browser/computer-use agents — Learning from navigation failures, form-filling errors (30% → 100% success rate, 82% fewer steps, up to 2x consistency improvement on τ2-bench)
- Internal tooling agents — Learning from operational patterns and team-specific workflows
The framework is framework-agnostic: LangChain, CrewAI, OpenAI Agents SDK, browser-use, AutoGen, or custom implementations.
Getting Started
pip install ace-framework
The quickest path:
- Install the framework
- Point it at your agent's execution traces
- Run analysis — the Recursive Reflector extracts skills automatically
- Review the Skillbook — approve, edit, or reject skills
- Generate an improved system prompt
- Deploy and repeat
- Documentation — Setup guides, API reference, examples
- GitHub — Full source code, issues, discussions
- Dashboard — Optional hosted interface
- Discord — Community support