Why Open-Source Matters for Agent Learning

When you let a tool learn from your agent's execution traces — its failures, its successes, its interaction patterns — you're giving it access to some of the most sensitive data in your product. You need to know exactly what it's doing with that data.

Most agent improvement tools are closed-source. You send your traces to their API, something happens, and you get back suggestions. You can't audit the analysis process, you can't self-host, and you can't verify that your data isn't being used for anything else.

Kayba is fully open-source under the MIT license. You can read every line of code, self-host the entire pipeline, and verify exactly how your traces are analyzed.

What Kayba Is

Kayba is a framework that makes AI agents self-improve from their own experience. It sits on top of any agent framework and adds a learning layer: analyze traces, extract skills, build a Skillbook, generate better prompts.

The framework synthesizes three published research streams:

Agentic Context Engineering (ACE) — Three-agent architecture with delta updates for incremental Skillbook refinement. From Stanford/SambaNova research, published at ICLR 2026 (arXiv:2510.04618).
Recursive Language Models (RLM) — REPL-based trace introspection that goes deeper than single-pass LLM analysis. From MIT CSAIL (arXiv:2512.24601).
Dynamic Cheatsheet — Self-curated external memory with usage tracking and persistent learning. From Stanford/Together AI (arXiv:2504.07952).

Kayba is the only framework that combines these approaches into a unified, production-ready system.

The Open-Source Landscape

Agent Learning / Improvement

Tool	Open Source	Approach	GitHub Stars
Kayba (ACE)	Yes (MIT)	Trace analysis → Skillbook → prompt generation	2k+
Lemma	No	Drift detection → prompt optimization	N/A
ZeroEval	No	LLM judges → prompt rewriting	N/A
Theta	No	Simulation → agent training	N/A
Redapto	No	Audit interactions → update SOPs	N/A
Poetiq	No	Recursive self-improving reasoning	N/A
Modaic	No	DSPy-based optimization	N/A

None of the direct competitors in the agent improvement space are open-source. Kayba is the only option if you need source code access, self-hosting, or the ability to extend the framework.

Adjacent Open-Source Tools

Tool	What It Does	Relationship to Kayba
LangFuse	Open-source observability (traces, evals)	Complementary — observability, not learning
Laminar	Open-source tracing + debugging	Complementary — visibility, not improvement
DSPy	Prompt optimization via search	Different approach — optimization vs. learning from experience
OptiLLM	Inference-time proxy with optimization techniques	Different — runtime optimization, not persistent learning

What You Get

Core Framework (`pip install ace-framework`)

Recursive Reflector — REPL-based trace analysis engine. Uses a Python sandbox with sub-LLM calls to programmatically explore agent execution traces, catching patterns that surface-level analysis misses.
SkillManager — Manages the Skillbook via atomic operations (ADD, UPDATE, TAG, REMOVE) with embedding-based deduplication to prevent bloat.
Prompt Generator — Compiles approved skills into organized system prompts, grouped by section.
LiteLLM integration — Works with any LLM provider (OpenAI, Anthropic, Google, Mistral, local models).
Multi-format trace support — Markdown, JSON, plain text. If your agent produces logs, Kayba can process them.

Key Technical Features

Delta updates — Incremental Skillbook modifications instead of full rewrites. Prevents information loss during adaptation.
Provenance tracking — Every skill records which trace produced it, enabling audit and debugging.
Helpful/harmful counters — Skills track their impact over time. Reinforced when helpful, flagged when harmful.
Embedding-based deduplication — Semantic similarity detection prevents duplicate skills from accumulating.
TOON encoding — Tab-delimited Skillbook serialization saving 16-62% tokens vs markdown in production.

Hosted Dashboard (Optional)

For teams that want a visual interface: the hosted dashboard at use.kayba.ai provides Skillbook management, analysis pipelines, and prompt generation through a web UI. $29/month with bring-your-own API key.

The framework works entirely standalone — the dashboard is a convenience, not a dependency.

Built on Published Research

Every core concept in Kayba traces back to peer-reviewed research:

Concept	Source	What It Contributes
Three-agent architecture (Generator, Reflector, Curator)	ACE paper (ICLR 2026)	Structured pipeline for agent improvement
Delta updates	ACE paper	Incremental learning without information loss
REPL-based trace analysis	RLM paper (MIT CSAIL)	Deep, programmatic analysis beyond LLM context limits
Self-curated external memory	Dynamic Cheatsheet paper	Persistent skill storage with usage tracking
Embedding-based deduplication	Kayba implementation	Production optimization for Skillbook management
TOON encoding	Kayba implementation	Token-efficient Skillbook serialization

This isn't a wrapper around an API. It's a framework built on specific research contributions, extended with production engineering (deduplication, encoding, provenance tracking).

Who Uses It

Kayba is used by teams building:

Coding agents — Learning from code review failures, codebase conventions, test patterns
Customer support agents — Learning from policy violations, escalation mistakes, resolution patterns
Browser/computer-use agents — Learning from navigation failures, form-filling errors (30% → 100% success rate, 82% fewer steps, up to 2x consistency improvement on τ2-bench)
Internal tooling agents — Learning from operational patterns and team-specific workflows

The framework is framework-agnostic: LangChain, CrewAI, OpenAI Agents SDK, browser-use, AutoGen, or custom implementations.

Getting Started

pip install ace-framework

The quickest path:

Install the framework
Point it at your agent's execution traces
Run analysis — the Recursive Reflector extracts skills automatically
Review the Skillbook — approve, edit, or reject skills
Generate an improved system prompt
Deploy and repeat

Documentation — Setup guides, API reference, examples
GitHub — Full source code, issues, discussions
Dashboard — Optional hosted interface
Discord — Community support