openai-structured-outputs-are-really-useful

# OpenAI's "Structured Outputs" Are Really Useful Written Dec 26, 2024. ---- The traditional ChatGPT API worked in a very unstructured way. You provided some text, called a "prompt", and you got back some response generated by the LLM. That response could have any structure or any form – there were no explicit constraints on what the LLM would generate. But this changed few months back when OpenAI added "[Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs)" to their ChatGPT API. What Structured Outputs means in practical terms is that you can provide a JSON schema, which might be something like this (pseudocode): ```json { "given_name": String, "family_name": String } ``` And the response is guaranteed to conform to that schema. If you want to parse ChatGPT's output, this is very useful. For example, let's say you want to make a website where someone can enter the cooking ingredients they have and get back a nicely formatted recipe using those ingredients. You could imagine providing a schema like this: ```json { "dish": String, "ingredients": [String], "steps": [{ "step_title": String, "detailed_instructions": String, "step_time_minutes": Int }] } ``` And then you could parse the response and render it as a nicely-formatted recipe page. ## This is amazing for prompt engineering Structured Outputs solve many issues with respect to prompt engineering. (What I mean by "prompt engineering" is getting the LLM to do what you want by crafting the perfect input. This input is also known as a prompt.) For example, suppose you want the recipe's step times to always be reported in minutes. Without Structured Outputs, this is the type of thing that low-power models like `gpt-4o-mini` easily get wrong. Even if you tell them in the system prompt "always give me the step time in minutes", by the time it actually starts writing the time, it might forget what you told it and write "2 hours" instead. But Structured Outputs open up a whole new world of prompt engineering. Normally we think of prompt engineering the same way we think of a writing prompt - a piece of text that we put at the beginning that is used as a source of inspiration or direction for what the LLM does. But with Structured Outputs, we can guarantee that `"step_time_minutes":` appears directly before it writes the step time, and this makes it very unlikely for any model to forget that we want the answer to be in minutes. Essentially, at some point while responding, the model will see this: ``` User: generate a recipe for pasta. ``` ```json Bot: { "dish": "Pasta", "ingredients": ["Spaghetti", "Salt"], "steps": [{ "step_title": "Boil water", "detailed_instructions": "Put water in a pot and bring it to boil", "step_time_minutes": ``` And then the model needs to predict what will come next. And OpenAI will constrain it to only write a number here.[^1] Any intelligent person could see that what comes next is going to be a number of minutes, and LLMs seem to be able to see this too. I'm going to use this all the time. As a recent example, I was lazy so I used `gpt-4o-mini` to extract some information from a website (instead of doing it myself with BeautifulSoup). Among other things, the website had lots of relative links to other parts of the site (like `<a href="../contact"></a>`) that I wanted to pull out. This is very easy to do without an LLM, but as I said, I was feeling lazy. So I just put the page URL in the prompt, and had a `"relevant_absolute_links": [String]` field in my schema. I didn't even need to adjust my prompt to tell it to give me the links - it was guaranteed to have `"relevant_absolute_links": [` in its output, and at that point the most reasonable completion would be to find any relevant-seeming links in the page, convert them to absolute links, and write them. ![[scraping-cheat-code.png]] [follow me on Twitter!](https://x.com/ChadNauseam/status/1872502541243801797) This approach to converting relative links to absolute links is an example of how I'd like to use ChatGPT's API more. Hear me out! Often, when prototyping an app, I don't feel like implementing something "the right way", so I just put in a call to ChatGPT. Later, once I have some validation that my whole idea is workable, I go through those places where I called ChatGPT and change my code so it does things the right way. That's what ended up happening with my absolute links thing - my code no longer uses ChatGPT's API, but it was nice at the beginning when I was just focused on getting a prototype working in the least amount of time possible. [^1]: The technical details of how this works are fairly interesting. LLMs are "autoregressive", meaning they work by repeatedly using their previous output as their new input. This is why they appear to generate responses a few characters at a time. Their output is not actually one token - instead it is a probability for each of the many thousands of tokens they know about. This probability is the LLM's estimation of the likelihood of token appearing next (simplifying somewhat). Normally, you just take the most likely token, stick it on the end of the input, then repeat the process over again to generate yet another token. What OpenAI does with your schema is it "masks" the output so that only tokens that could conform to your schema are considered. ## Why don't more people use this? I think part of the problem might be that you have to write a JSON schema yourself. Writing JSON schemas is more technical and tedious than regular prompt engineering. Speaking from experience, it's actually a bit of a pain. I write most of my OpenAI wrappers in Rust, so I created [tysm](https://github.com/not-pizza/tysm). It's a Rust crate that writes the schema for you from your Rust type. The developer experience is pretty slick if I may say so myself. Here's what it looks like: ```rust use tysm::ChatClient; /// We want names separated into `first` and `last`. #[derive(serde::Deserialize, schemars::JsonSchema)] struct Name { first: String, last: String, } async fn get_president_name() { // Create a client. let client = ChatClient::from_env("gpt-4o").unwrap(); // Request a chat completion from OpenAI and // parse the response into our `Name` struct. let name: Name = client .chat("Who was the first US president?") .await .unwrap(); assert_eq!(name.first, "George"); assert_eq!(name.last, "Washington"); } ``` Come on, that's pretty good, right? So how does this work? Notice the `name: Name` type annotation here: ```rust let name: Name = client .chat("Who was the first US president?") .await .unwrap(); ``` That tells Rust that the output of the `chat` function needs to involve the `Name` struct. Knowing this, the `chat` function creates a JSON schema for the the `Name` struct and includes it in the API request. Then the API response is deserialized into a `Name`. I just think it's so cool. Imagine you added `age_of_death: Option<u64>` field to the `Name` struct. It would just magically get populated without you making any other code or prompt changes. ## A new programming paradigm? Probably not, but I'm still excited. In addition to being useful in the places where you actually need to call ChatGPT, I'm also hopeful that this will let me use ChatGPT in more places in my prototypes, as I touched on earlier. With my library I can easily do a ChatGPT call any time I need a struct populated, and I don't have to fiddle around with parsing its output myself. Here's an example: imagine you're making an app that needs to know the distance in miles between two cities. Not hard, but an annoying thing to implement when you just want to quickly prototype your app to see if it's even useful. So in the prototyping stage, I'm thinking you could just do this: ```rust #[derive(serde::Deserialize, schemars::JsonSchema)] struct Distance { distance_in_miles: u64, } let miles: u64 = client .chat(format!("What is the distance between {city1} and {city2}?")) .await .unwrap() .distance_in_miles; ``` You wouldn't want to do this in your finished product if you didn't have to, because it would be much more expensive, slower, and less reliable than writing code to do it directly. But I think for prototyping it could be super cool.[^2] [^2]: I don't anticipate this would be useful for [my game](the-good-and-bad-of-cpp-as-a-rust-dev) or my other more hardcore projects, but I've found it handy for some things already. ## Why the name "tysm"? The name stands for "thank you so much", which is what I say when I ask ChatGPT a question and get a great answer! If you prefer, it could also stand for "**Ty**ped **S**chema **M**agic". ## Pairs well with `victor-db` If `tysm` satisfies your chat-completion needs, I hope that [`victor`](https://crates.io/crates/victor-db) will also satisfy your vector database needs. Victor was created to be a super-simple vector database that can work in-memory, on the native filesystem, or in a browser. It's great for simple projects when you need a vector store for a reasonable number of vectors and want something lightweight. (Both tysm and victor are 100% free and open source and not monetized by me in any way.) ## Disclaimer Structured Outputs are still a little rough around the edges. For example, OpenAI doesn't support the JSON schemas equivalent of `HashMap<String, _>`. It just gives you a confusing error message instead of a response. There are a number of quirks in what OpenAI expects, but the ones I noticed I tried to monkey-patch in `tysm`.