I don’t have any insider knowledge—unless you count the Information article that just leaked about a possible Her-like assistant—but I think I have a good feeling for what’s coming.
How? Or what am I basing my hunches on? Two things:
-
Building actively in AI since the end of 2022, and
-
Stalking Watching Sam Altman’s comments very closely
What I anticipate
So here’s what I think is going to happen.
Here’s my thought process…
He’s basically sandbagging us. Which means saying not to anticipate much—that it’ll be incremental—and that it’ll slowly build up over time.
-
He’s been telling us to not expect great things in the short term
-
He keeps prepping us for incremental gains
-
This tells me he’s planning to under-promise and over-deliver in a way that surprises and delights
-
He’s also been telling us that the capabilities they’ll release won’t always come from models themselves, but often from supplemental and stacking (think D&D) capabilities. Like 2 + 1 + 7 + 1 = 1249.
-
That to me means the next functionality isn’t a 4.5 or 5 model release
-
This means to me things like: cooperation between models, additional sensors on models, additional integrations with models, etc.
Basically, ecosystem things that magnify models in extraordinary ways.
And I think agents are one of those things.
All about agents
Sam has talked a lot about agents—kind of in passing and in the same humble way that sets off my alarm bells. It makes me think he’s working hard on them.
It’s precisely the type of thing that could amaze people like the last live event but without announcing GPT-5.
So here’s what I’m thinking (some of this will be for future releases and not necessarily on Monday).
I think this will soon seem silly and antiquated.
❝
Prompting is the most natural interface to AI—not counting brain links.
Talking to agents (vocally or in writing) is THE PENULTIMATE (next to brain link) interface to AI. Prompting is explaining yourself clearly, and that’s the thing AI needs most.
So I think that we’ll soon be able to simply describe what we want and the model will figure out:
-
What needs to be a zero-shot to the model
-
What needs stored state
-
What needs live internet lookups
-
And how to combine all those into the answer
It’ll also be able to figure out how fast you need it based on the context. And if it needs additional information, it’ll just ask you.
So instead of defining agents with langchain, langgraph, AutoGen, or CrewAI, you’ll just say something human and rambling and flawed, like:
When a request comes in, validate it’s safe, and then see if it’s business related or personal, and then figure out what kind of business task it is. Once you know what kind of business task it is, if it’s a document for review then have a team of people look at it with different backgrounds. It needs to be pristine when it comes out the other side. Like spellcheck, grammar, but also that it uses the proper language for that profession. If it’s a request to do research on a company, go gather tons of data on the company, from mergers and acquisitions to financial performance, to what people are saying about the company leadership, to stock trends, whatever. That should output a super clean one-pager with all that stuff along with current data in graphs and infographics.
How we’ll soon create agent tasks via prompting
From there, the model will break that into multiple pieces—by itself:
-
Security check on initial input (we’ll use an agent that has 91 different prompt injection and trust/safety checks).
-
Categorize for business or personal
-
Categorize within business
-
Team of writer, proofreader, and editor agents specialized in different professions
-
Team of agents for researching the performance of companies by pulling X and Reddit conversations, Google results, Bloomberg dashboard analysis
-
An agent team that creates the financial report
-
An agent team that creates insanely beautiful infographics specialized towards finance
And it’ll figure out how big and expensive those agent teams need to be by either knowing the person asking, and the projects they’re working on, or by asking a few clarifying questions.
A personal DA
There’s a rumor that they’ll be launching what I have been calling a Digital Assistant since 2016. Which at this point everyone is talking about. The best version I’ve seen of this was from the movie Her, where Scarlett Johansen was the voice of the AI.
If that’s happening—which it looks like it is—that could be the whole event and it would also feel like a GPT-5-level event.
And that would definitely be agent themed as well, but I am hoping it’ll be more of what I talked about above.
A mix of agent stuff
One possibility is that they basically have an Agent-themed event.
-
They launch the Digital Assistant
-
They launch native agents in a new GPT-4 model, which allows you to create and control agents through direct instructions (prompting)
-
They give you the ability to call the agents in the prompt, like I’m talking about above, and like you currently can with Custom GPTs
If they go that route, I think they’ll likely sweeten the event with a couple extra goodies:
-
Increased context windows
-
Better haystack performance
-
Updated knowledge dates
-
A slight intelligence improvement of the new version of 4
Summary
-
Sam is sandbagging us in order to under-promise and over-deliver
-
This ensures we’ll be delighted whenever he releases something
-
He’s been hinting at agents for a long time now
-
Prompting is the natural interface to AI
-
Agent instructions will merge into prompts
-
Eventually we’re heading towards DAs, which he might start at the event
I’ll be watching this event with as much anticipation as for WWDC this year. No meetings. No work. Just watching and cheering. Can’t wait.
And if you’re curious about where I see this all going in the longer term, here’s my hour long video explaining the vision…