AI & Machine Learning • iOS, macOS, visionOS • 21:43
Learn how to take your intelligence features further with Foundation Models framework primitives for dynamic context and agentic workflows. We’ll walk through engineering shared context, setting up privacy boundaries, and managing key value caching. Discover how to orchestrate smooth handoffs between local and server models.
Speakers: Erik Hornberger, Oliver O'Neill
Downloads from Apple
Transcript
Hello everyone! And thank you for joining us! My name is Erik. And I’m Oliver. Today, we’re going to dig into a new set of APIs that open up whole new possibilities for your apps; Dynamic profiles! But before we dive into the code, we want to lay the groundwork by identifying the problems these APIs solve, and our philosophy behind their design.
The first challenge these APIs solve is context management. In long running sessions, dynamic profiles let you trim or summarize the transcript to keep it within the model’s context window. The second problem these APIs solve is establishing boundaries. When using multiple models, you should design around capability and cost considerations. Dynamic profiles give you that option.
This field is changing week-to-week. The primitives that we’re introducing are designed to be flexible, ensuring it’s possible to build today’s abstractions, and tomorrow’s. Exactly! Dynamic profiles enable context engineering, defining model boundaries, and can be scaffolded into just about any architecture. And it’s in that spirit today that we’re announcing a new package; Foundation Models framework utilities.
Utilities is an open source Swift package that houses components helpful for building agentic experiences. It will be updated in between OS releases and give you access to emerging or experimental patterns, all backed by dynamic profiles. So now that we’ve set the stage, let’s jump into our agenda. In the first half of this video, Oliver is going to teach you about the mechanics of dynamic profiles. In the second half, I’ll rejoin to cover some advanced topics related to orchestration patterns.
Finally, we’ll wrap up with a foray into performance and accuracy considerations. So with that, it’s over to you Oliver! Thanks Erik. With the introduction of the LanguageModel protocol and PrivateCloudComputeLanguageModel, you now have more models than ever to choose from. DynamicProfile is a new API that gives you the ability to switch models within your LanguageModelSession, providing you with the flexibility to select the best configuration for the task at hand. DynamicProfile is the foundation on which you can build many useful abstractions, such as agents or skills. Today, I’ll give you a tour of the API starting with leveraging multiple models, before diving into transcript considerations and finishing with session lifecycle events. Let’s start by looking at an example.
I’m working on a craft app called Origami which can produce both origami and crochet tutorials. Here, the user will upload images and our app will help them brainstorm ideas using the image as inspiration. The user can provide feedback on the shortlist of ideas before a tutorial is generated for the selected concept. While the user works through the tutorial, they can upload in-progress photos and get advice on their technique.
Each stage in the app requires shared context but individually, they have a unique set of priorities. They may benefit from a diverse set of models, with different instructions and generation options. These configurations are agents - they act on your app’s behalf, and are configured with a particular goal and set of capabilities in mind.
DynamicProfile allows you to declare individual Profiles, which represent a configuration state or agent in your LanguageModelSession. A Profile is made up of instructions, tools, and modifiers for configuring things like the model, temperature, samplingMode and more. So let’s start by declaring a DynamicProfile for our craft experience. Here, we have an Observable class called CraftOrchestrator that will track the different phases of the app. We’ll focus on the brainstorming phase first, which is used for presenting different craft project ideas to the user.
Here, our new profile has some instructions explaining its goal and a tool for generating titles. Because origami is a complicated craft, let’s also include some additional instructions and tools but only when the user is working on an origami project. OrigamiExpert makes use of another new type called DynamicInstructions.
DynamicInstructions enables grouping of relevant tools and instructions together into a single component that can be reused throughout your codebase. OrigamiExpert contains knowledge and tools that can be reused every time we’re prompting a model about origami. DynamicInstructions are also composable so nesting OrigamiExpert inside another DynamicInstructions body will concatenate the instructions and tools together.
Here, we’ve created BrainstormFacilitator to hold our profile’s instructions. Now we can clean up our brainstorming profile using the new declaration. Since brainstorming requires both a broad knowledge of crafts and creative thinking, this profile will use PrivateCloudComputeLanguageModel, which is a new model available in Foundation Models. Be sure to check out the talk from Louis on PCC in Foundation Models to learn more about what this model has to offer.
We’ll also set the temperature to 1, to allow the model to produce more creative responses. We’ve just defined our first agent using DynamicProfiles. Let’s move on to the next mode in our profile: planning. The “planning” profile is responsible for creating directions for an agreed upon craft project. Again, we’ll use PCCLanguageModel since this requires in-depth knowledge of crafts.
We’ll also configure reasoningLevel, which is a capability available to most server models. This controls the model’s capacity to think through the problem before responding. Since generating a tutorial is complex, we’ll set it to deep. Lastly, the “reviewing” phase provides advice and guidance as the user works through the tutorial. To save on unnecessary server calls, this makes use of SystemLanguageModel. And just like that, we’ve finished defining our crafting DynamicProfile.
To make use of DynamicProfile in your session, it’s as simple as using the new LanguageModelSession initializer. Note that the body of a DynamicProfile is re-evaluated each time the model is prompted, so as the app moves between each mode, the persona of the LanguageModelSession changes. You can think of this as swapping hats, or switching agents. You can move from brainstorming to planning, to reviewing. All by changing the mode. You’ve now seen how you can route between different models using DynamicProfile. But it’s important to consider that each model may have different context size limits.
Our craft example switches between PCCLanguageModel and SystemLanguageModel. When moving between models, you may need to trim unnecessary entries to stay within the context size. But that’s not the only reason for adjusting the model’s context. You can also improve the model’s focus by removing irrelevant entries, or redact private information from existing entries when moving to a less private model.
The transcript is LanguageModelSession’s representation of the model’s context. DynamicInstructions offers one way to modify the transcript. More specifically, it allows modifying the instructions entry. For updating the remaining entries, we’ll use a window into the transcript called “history”. Dropping tool calls is one easy way to trim history. Let’s take a look at how you’d implement this.
historyTransform can be applied to a profile to transform the history prior to prompting the model. This is the opportune time to filter out entries that may not be necessary for the request. Applying a transformation on our “reviewing” profile helps keep the transcript within the on device model’s context size. Transforms don’t permanently mutate the session’s transcript. Instead, they’re local transformations applied prior to prompting the model. This means you don’t need to worry about losing context that may become relevant at a later point.
Our historyTransform has a lot going on. Let me show you how we can use custom modifiers to hide the complexity of our transform. First, we’ll declare a new type that conforms to DynamicProfileModifier and apply our historyTransform. We can then make it available for reuse by implementing an extension on DynamicProfile. Any new Profiles that would benefit from reducing context can now utilize the new modifier.
We’ve made a number of useful modifiers available in the new Foundation Models framework utilities package. We encourage you to take a look. Custom modifiers are a great way to build reusable configuration for your declarations. But transforms aren’t the only way that you can influence the transcript. Let’s take a look at another more stateful approach.
At certain points in the session, you may need to summarize earlier entries from the existing transcript to reclaim context. Doing this after each model’s response provides a clear boundary in the session’s lifecycle. Let’s take a look at how we can perform our summarize operation after each response using a new set of modifiers.
Lifecycle modifiers provide access to your profile’s progress by giving you the opportunity to run imperative code directly in your profile declaration. This can be useful for updating state external to your session, like reflecting progress in UI. But it’s also useful for internal state updates, like changing the mode in our craft profile or modifying the session’s history.
Let’s use the onResponse modifier to mutate the history at the response boundary that I mentioned earlier. You’ll notice this is also making use of another new concept: session properties. Session properties allow you to define state that’s accessible from any Tool or Profile. The history property that we just used is a built-in property provided by the framework. It captures the session’s history and can be used as an alternative to historyTransform for updating the transcript.
Keep in mind that the history property is lossy and its changes will be reflected across all profiles in the session. For lossless transformations targeted to specific profiles, you should prefer historyTransform. In addition to history, you can also create your own session properties. Let’s create a new property to store our conversation summary when onResponse is called.
You can declare properties using the @SessionPropertyEntry macro within an extension on SessionPropertyValues. All session properties are mutable and must have an initial value. Here, we’ve declared our summary as an optional string. Each Profile can now read the value of the summary by accessing the session property that we just declared. We’ll include the summary in our profile’s instructions to ensure they have the context on the transcript entries that were dropped.
Any profile can write to the property and changes will be visible across the session. Now let me produce a conversation summary for you. Use lifecycle modifiers to run code at specific points in the session. Use the history property to update the session’s history for all profiles. And use custom session properties for storing state that’s shared by all session components. And with that, I’ll hand it back to Erik to teach you about agent orchestration. Thanks Oliver!
Hopefully, you’re starting to develop an intuition for how profiles can be used to build things like agents. Let’s take a look at two common patterns for orchestrating agentic experiences. We like to refer to these patterns as baton-pass and phone-a-friend. Baton-pass is a collaboration and phone-a-friend is a consultation. Let’s look at baton-pass first.
In this pattern, there are two or more profiles, typically each leveraging different models. There also needs to be a variable that controls which profile is active. Finally, we give each profile a tool that allows the model to set that variable. Together, these pieces make up the baton-pass pattern.
If we’re currently brainstorming and ask how to fold a crane, the brainstorm profile will call a tool to pass the baton to the tutorial profile. A tool output signals a successful handoff, and the tutorial profile produces the final answer. The most important attributes of the baton-pass pattern are that the full transcript history is visible to both profiles, and that the profile that receives the baton can carry it across the finish line and provide the final response. Both of those attributes will be in contrast to the next pattern we look at: phone-a-friend.
In the phone-a-friend pattern, you also rely on tool calling. The key difference is that instead of toggling a variable, the tool spawns a short-lived session. If we ask for a fun project for kids, the model may reason that it needs a title for the project, and call its phone-a-friend tool to consult with the title profile. The phone-a-friend tool spawns a new session with an independent transcript prompts it, and then delivers the response back as tool output.
The child session disappears, and the parent session produces the final response. The most important attributes of the phone-a-friend pattern are that the transcripts for each profile are isolated, and that the parent profile is always responsible for giving the final answer. Baton-pass and phone-a-friend are good tools to have in your belt, but there are other options as well.
For example, the Foundation Models framework utilities package houses a Skills type, which you may be familiar with as a popular pattern for procedural context loading. So now that you’ve got a grasp on the many ways tools can be used for orchestration, we’re going to look at a new knob you can use to exert control over when tool calls happen - Tool calling mode.
Tool calling mode has three options: allowed, disallowed, and required. The default value is “allowed”, which is the existing behavior. The model may produce a tool call or it may respond directly. This is the option to use when you just don’t know if tools will be necessary or not, which is the most common case.
“disallowed” prevents the model from calling tools. This can be helpful if the user navigates into a part of your app where the session’s tools are known to be irrelevant. Finally, “required” means that the model can only call tools. And this can be particularly useful in agentic systems that represent all actions as tool calls.
If you’re using profiles, you can specify tool calling mode with a modifier. If you’re not using a profile, tool calling mode can be set via GenerationOptions when calling respond(to:). Here’s the most important thing to remember. When tool calling is required, the model is essentially in a while loop - it is your job to ensure that there is an exit condition of some kind. One good option is to conditionalize the tool call mode on a variable. Here, we’re requiring tool calls until the model calls the database tool.
A second, more forceful option is to equip your model with a final answer tool that throws an error. Throwing an error aborts the tool calling loop and immediately returns control flow to you. By default, when you throw an error from a tool, or when you cancel a response, your session’s transcript will roll back to its previous state.
For advanced use cases where you want to allow cancelling part way through a response and then resuming again, you need to keep your transcript in state after an error. We’ve added new API to enable this. If you’re using profiles, you can now set “transcriptErrorHandlingPolicy” using a modifier. If you’re not using a profile, you can set it directly on your session.
The two options are “.revertTranscript” and “.preserveTranscript”. When using “.preserveTranscript”, the onus is on you to put your transcript back into a good state if you intend to continue using your session. To facilitate that, the “transcript” property on session is now mutable. Remember though, you can only modify the transcript when the session’s “isResponding” property is false. Attempting to mutate the transcript during a response is a programmer error.
Now that we’ve taken a look at our new APIs, we need to talk about the implications of mutating the transcript on performance and accuracy. Key-value, or KV caches are an important optimization mechanism in large language models and they can be invalidated by transcript mutations. Generally, appending to the transcript preserves the KV cache, and minimizes the time-to-firsttoken.
If you rewrite history by removing entries, changing the attached tools, or updating the instructions, that will typically trigger a cache invalidation, and can increase latency. Now, we didn’t talk about this last year because we intentionally shaped LanguageModelsSession APIs to be append only. By default, they ensured optimal use. But this year, we’re taking the training wheels off, so to say.
It’s important to understand that different models have different caching behavior and the only way to be certain is by measuring. The best way to do that is the upgraded Foundation Models Instrument in Xcode. For more about detecting cache invalidations with Instruments, make sure to check out our video on debugging and profiling.
In addition to performance implications, the other thing you have to be careful about when rewriting history is accuracy, because it’s possible to confuse the model. Let’s say I have a session where I asked the model to think of fun origami project names. And then let’s say I add a generate title tool to the session, and prompt it for more ideas. What do you expect will happen next? If we’re lucky, the model will use the tool like we want.
But it’s also possible that the model will notice it previously generated titles without the tool, and may think it’s supposed to do that again. That’s not what we want. Our history modification confused the model. When you start to get into nuanced transcript modifications like this, it becomes even more important to use the Evaluations framework to create eval sets and quantify the effect of context engineering strategies. Data driven optimization is the only way to be confident. I highly recommend watching all of our videos about the evaluations framework.
Alright, that brings us to the end of our section on performance and accuracy. That was a lot! Are you ready to bring it home Oliver? You know it. We’ve shown you how dynamic profiles allow you to steer model behavior and manage your session’s transcript. We talked through patterns like phone-a-friend and baton-pass, tool calling mode, manual transcript management, and even KV caches. And we hope you’re as enthusiastic about Foundation Models framework utilities as we are! Next, try playing around with the sample app. Or test out PCC together with the revamped Xcode instrument. Until next time, thanks for watching. Thank you!