Session Data Export
QueryMT Agent records every session interaction in its database (~/.qmt/agent.db). The export system lets you extract this data in standardized formats for analysis, compliance, and — most notably — fine-tuning LLMs on your own agent trajectories.
Export Formats
ATIF (Agent Trajectory Interchange Format)
The ATIF v1.5 format exports sessions as structured agent trajectories with steps, tool calls, observations, and metrics. Useful for interoperability with other agent systems and trajectory analysis.
See the export_atif example.
SFT Training Data (JSONL)
Exports session data as JSONL files suitable for supervised fine-tuning (SFT). Two output formats are supported:
| Format | Description | Use with |
|---|---|---|
openai |
OpenAI chat completions format | OpenAI fine-tuning API, Axolotl, torchtune |
sharegpt |
ShareGPT conversation format | unsloth, LLaMA-Factory |
Architecture
Both exporters share a common turn materialization layer that walks the event journal and produces structured Turn objects:
graph LR
DB[(agent.db<br/>event_journal)] --> MT[materialize_turns]
MT --> ATIF[ATIF Exporter]
MT --> SFT[SFT Exporter]
SFT --> OAI[OpenAI JSONL]
SFT --> SGT[ShareGPT JSONL]
A Turn captures one complete LLM request/response cycle:
- User message (if any)
- Assistant text content
- Thinking/reasoning content (optional)
- Tool calls with arguments
- Tool results
- Model, provider, usage metrics, and cost
SFT Export
Quick Start
The fastest way to export is with the export_sft example:
CLI Options
HTTP API
When the agent server is running, the export is also available as a streaming HTTP endpoint:
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
format |
string | openai |
Output format: openai or sharegpt |
min_turns |
int | 1 |
Minimum LLM turns per session |
models |
string | all | Comma-separated model filter |
exclude_errored |
bool | false |
Exclude sessions with errors |
max_tool_error_rate |
float | 1.0 |
Max tool error rate (0.0-1.0) |
scrub_paths |
bool | false |
Anonymize home directory paths |
include_thinking |
bool | false |
Include reasoning content |
include_tool_results |
bool | true |
Include tool result content |
max_context |
int | 40 |
Max context messages per example |
stats_only |
bool | false |
Return stats JSON instead of JSONL |
The endpoint streams JSONL with Content-Type: application/x-ndjson and sets Content-Disposition: attachment for browser downloads.
Output Formats
OpenAI Chat Format
Each line is a JSON object with a messages array following the OpenAI fine-tuning format:
ShareGPT Format
Each line uses the ShareGPT conversation format, compatible with unsloth and LLaMA-Factory:
Session Filtering
Filters control which sessions are included in the export. All filters are applied at the session level before any data is written.
Minimum turns (--min-turns): Skip sessions with fewer than N LLM turns. Useful for excluding trivial/test sessions.
Model filter (--models): Only include sessions that used specific models. Particularly useful for:
- Distillation: Export only Claude Opus sessions to fine-tune a smaller model on the strongest model's outputs.
- Model-specific training: Fine-tune a model on its own previous outputs (self-distillation/SSD).
Error exclusion (--exclude-errored): Skip sessions that contain error events.
Tool error rate (--max-tool-error-rate): Skip sessions where the proportion of failed tool calls exceeds the threshold. Value between 0.0 (no errors allowed) and 1.0 (all errors allowed).
Context Windowing
Long sessions (100+ messages) would produce training examples that exceed model context limits. The --max-context option controls how many messages are included as context for each training example.
The algorithm:
- The system message is always preserved (if present).
- For each assistant response, include the last N messages as context.
- Messages are kept on natural boundaries (complete tool call chains are not split).
Set --no-context-limit to include the full conversation history (use only if your training framework handles truncation).
Path Scrubbing
The --scrub-paths option replaces home directory paths (e.g., /Users/alice/projects/myapp) with /workspace. This is useful when:
- Sharing training data across users or machines
- Preventing PII leakage in fine-tuned models
- Normalizing paths for consistent model behavior
Fine-Tuning with Exported Data
The primary use case for the SFT export is distillation — training a smaller or local model on recorded outputs from a stronger model. For example, training a local Qwen 35B model on your Claude Opus session trajectories.
The exported data contains real, verified agent trajectories: multi-step reasoning, tool calls with arguments, tool results, and final responses that produced correct outcomes. This teaches the student model your specific tool-use patterns, coding style, and the agent's operational conventions.
1. Export training data
2. Fine-tune with unsloth
3. Convert to GGUF (for llama.cpp)
4. Use with QueryMT
Point the agent at your fine-tuned model by configuring the llama.cpp provider with the new GGUF file.
Programmatic API
The export functionality is available as a Rust library for integration into custom tools:
Working with individual sessions
ATIF Export
The ATIF exporter produces structured JSON trajectories:
ATIF trajectories contain:
- Agent metadata: name, version, model, tool definitions
- Steps: ordered sequence of user, system, and agent actions
- Tool calls and observations: with argument/result correlation
- Metrics: per-step and aggregate token usage and cost
See the ATIF specification for the full schema.