Chapter 10 Logging & Observability

Understanding what your agent did

When your agent runs in the sandbox, you need to understand what happened: which nodes executed, what the LLM thought, which tools were called, and where things went wrong (if they did). Agentish provides two channels for this: the sandbox execution log and Langfuse.

Sandbox Execution Log

When you submit a bundle to the sandbox, the execution page streams real-time logs from your agent. These include:

Node transitions — which node is currently executing.
Tool calls — when the LLM calls a tool, the arguments, and the response.
Final state — the complete global state after execution finishes.
Errors — any compilation or runtime errors.

The final state is your most valuable debugging artifact. Read it to see exactly what each agent produced (see Chapter 8: State).

Langfuse

Langfuse is an open-source LLM observability platform. It captures detailed traces of every LLM call your agent makes, giving you a structured view of the entire execution.

Access: During the competition, you will be provided with a Langfuse portal link and credentials. These will be shared with your team along with the challenge materials.

What Langfuse Shows You

Feature	What It Shows	Why It’s Useful
Traces	A timeline of your agent’s execution from start to finish.	See the big picture: which nodes ran, in what order, and how long each took.
Spans	Individual LLM calls within each node, with full input/output.	See exactly what the LLM received (system prompt + messages) and what it produced.
Tool calls	Which tools were called, with what arguments, and what was returned.	Verify that tools returned correct data. Spot tool failures.
Token usage	Input and output tokens per LLM call.	Identify context window issues (input too long) or unexpected verbosity.
Latency	Time spent on each operation.	Find slow nodes or slow tool calls that could be optimized.

How to Read a Langfuse Trace

Start at the top-level trace to see the overall execution timeline.
Click into a span to see the full LLM conversation (messages sent and received).
Look at tool call spans to verify tool arguments and responses.
Check token counts — if input tokens are near the model’s context limit, the LLM may be losing earlier information.

Debugging Workflow

When your agent doesn’t produce the right result, follow this process:

Read the final state in the sandbox log. Which variables have unexpected values?
Open Langfuse and find the trace for your run.
Trace the problem backwards. If final_report is wrong, look at the Finalizer’s LLM call. Was its input (analysis_result) correct?
Go upstream. If the input was wrong, look at the node that produced it. Check its LLM conversation and tool calls.
Fix and re-submit. Adjust prompts, state keys, or topology based on what you found.

Chapter Summary

Key Takeaways:

The sandbox streams execution logs with node transitions, tool calls, and final state.
Langfuse provides detailed LLM traces — full conversations, tool calls, token usage, and timing.
Langfuse credentials will be provided to you during the competition.
Debug by reading the final state first, then tracing problems backwards through Langfuse spans.

← Chapter 9: Tools Chapter 11: Troubleshooting →