Structured Output: YAML vs JSON

"Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering" is often cited for the following diagram:

Image of token count differences between JSON and YAML for the same input data — From the appendix of "Code Generation with AlphaCodium"

which pushed "Flow Engineering" into the mainstream. However, the appendix of the paper contains a perhaps unexpected assertion that "YAML output is far better for code generation".

The intuition is reasonable - by avoiding rules on where various special symbols can and cannot go inside a JSON object (brackets, quotes, etc), whereas YAML is fairly permissive, as long as the indentation is correct. Given this, it should take less inference time to create syntactically correct YAML, vs syntactically correct JSON.

YAML output is far better for code generation"
AlphaCodium Flow Engineering Paper

The takeaway is that 'best practices' and 'standards' are evolving rapidly, which implies the assumptions we make about the way LLMs interface with our applications should be evolving as well.

Looking forward to where MCP takes us. The specification is JSON heavy, following existing work around Structured Outputs. It makes sense given that we're dealing with REST APIs where `application/json` is king - but there does seem to be room for a more efficient translation layer between JSON and 'some representation that is better for the LLM' (YAML? Something even better?) in the future.

Unalarming

Structured Output: YAML vs JSON

Unalarming

Unalarming

Structured Output: YAML vs JSON

Newsletter Subscription

Recent posts

Spatial and Temporal Memory

Helping AI Agents Remember

RIDGE and Visual Information Extraction

Unalarming