Platform
Traces & Evaluations
View agent runs, trace spans, and evaluation results in the dashboard.
The traces and evaluations views let you inspect your agent's execution history and quality assessments.
Viewing Traces
Run List
Each agent shows its recent runs with:
- Status - running, completed, failed
- Trigger - manual, automatic, test
- Overall score - evaluation result (if complete)
- Timestamp - when the run started and completed
Span Detail
Clicking a run shows its trace spans - the individual function calls recorded by @projectkate.trace():
- Node name - the function or span name
- Span kind - LLM call, tool use, or custom
- Input - what was passed to the function
- Output - what was returned
- Duration - execution time in milliseconds
- Token count - LLM tokens used (if applicable)
Spans are displayed in a waterfall view showing the execution timeline.
Evaluations
Intelligence Summary
Each evaluated run produces an intelligence summary:
- Overall score - composite quality score (0.0 to 1.0)
- Natural language summary - what went well and what didn't
- Recommendations - specific suggestions for improvement
- Regression detection - alerts if scores dropped from previous runs
Score Trends
The trends chart shows scores over time, letting you visualize:
- Whether your agent is improving after knowledge acquisition
- Whether recent changes caused regressions
- Score stability across different types of requests
Per-Node Breakdown
Evaluate which parts of your agent perform well and which are weak:
- Each traced function gets individual metrics
- Identify bottlenecks (slow spans) and quality issues (low-scoring spans)
- Compare node performance across runs
Triggering Evaluations
Evaluations run automatically when a run is completed. You can also trigger one manually:
- Navigate to your agent's detail page
- Click "Trigger Evaluation"
- Wait for the evaluation to complete (typically 30-60 seconds)
Via SDK:
result = await client.evals.trigger(agent_id="your-agent-id")Next Steps
- Tracing (SDK) - instrument your agent
- Runs (SDK) - manage runs programmatically
- Evals Client - evaluation API