Using Reflections to Compress LLM Context Data


As I’ve discussed in previous posts, traditional software is about to be replaced by LLM-based software using SPQA. In short, rather than having traditional databases and traditional queries, we’ll just have LLMs that consume context about entities.

If it’s HR software, it’ll be people and employment context. If it’s security software, it’ll be context about systems and people and identities and permissions. And if it’s sales software, it’ll be context around prospects and opportunities.

LLMs thrive on context, but they are easily overwhelmed by too much prompt and embedding input.

One challenge companies will face along their SPQA journey, however, is how to go from giga/tera/peta-bytes of data into something that can be consumed by LLMs. You can’t simply pipe your entire AWS state, or endpoint logs, into an LLM. It’ll fall over. And that’s assuming you can afford the LLM processing time.

Reflections

Complex behaviors can be guided by agents’ recursive synthesis of recordings into higher-level observations. The agent’s memory stream is a database that contains a complete account of the agent’s prior experiences. To adapt to its shifting surroundings, the agent can access relevant data from its memory stream, process this knowledge, and formulate an action plan.

This is extraordinary. Let me try to translate, or at least give my interpretation.

Lots of things are happening to a given agent. They see things. They have interactions. Events take place around them. That’s one bucket, which is basically observations.

Using Reflections to Compress LLM Context Data

Then there is another bucket which is Reflections. These are periodic reviews of those observations that culiminate in a thought, or a, um…reflection…about what was seen or experienced.

Let’s say a neighbor has a dog that keeps pooping on your yard. And they play loud music past 10pm on a regular basis. But the one time you tried to have a party in your backyard, during the day on a Saturday, they called the cops.

You’d have observations:

  1. Their dog pooped on my lawn the first time

  2. Their dog pooped on my lawn the second time sometime later

  3. Then a third and fourth times

  4. Then there’s the first time my kid got woken up on a school night from music being too loud

  5. And the second and third times

  6. And then they called the cops on us three weeks later

These can all be turned into a Reflection.

My neighbor is an asshole.

That might not seem useful, but humans do something similar. As Kahneman and others have talked about extensively, humans often use shortcuts when remembering things. When you think about whether you like your neighbor, you don’t recall every incident—good and bad—from the last 14 years they lived next to you. Instead you use an emotional heuristic that gives you kind of a thumbs-up or thumbs-down based on all the interactions.

We often can’t remember what happened, but we can remember what we thought about it and how it made us feel.

Not being an expert on human memory it feels a lot like a compression mechanism that saves space and processing power. Is this dangerous? Should I do business with them? Etc. You often don’t have time in those moments to rehash the entire history. You need a heuristic.

Applying Reflections to real-word applications

Ok, cool. So what’s the analog to LLMs, and LLM-based software?

Easy: Event compression. Log compression. Data compression. SPQA State compression. Like going from a massive time series to extracting the meaning from it and sending that to an LLM.

We can’t send the state of AWS to an LLM. It’s a maze. And it’s changing constantly. We can’t monitor OSQuery for all hosts in the environment and send that to an LLM continuously. It’s this way with the entire business. The data are too numerous and too chatty. We need a way to compress those raw events down to something usable for LLMs.

Reflections will be a major part of that story, at least early on.

Examples

Using Reflections to Compress LLM Context Data

Here are some examples of this:

Using Reflections to Compress LLM Context Data
  • Characterizing a market

    • Our competitors seem to be pivoting to small microservices vs. large product launches, and that’s catching on

    • Perhaps we should release some of our internal tooling as a microservice that customers can use?

  • Characterizing a culture

    • Employees seem to be taking advantage of the open leave policy in the Portland office

    • We should let managers there know to slightly tighten their tolerances on what gets approved vs. not

    • We should watch for abuses of policies that benefit the whole that end up costing the company money and lost trust with employees

  • Characterizing a trend

    • We’re seeing a trend of less-developed documentation prior to releases, and PRDs that are less vetted with the broader team

    • This seems associated with more customer complaints about quality issues, and more rework being done within Github

    • We should consider raising the bar for PRD quality and reviews

These are just a few hasty examples. The point is that computers are good at looking at lots of events and doing things with them.

Imagine a multi-phased process where you go from gigabytes or terrabytes of data, into various legacy-tech consolidation/compression processes that prune duplicates and look for needles.

Think of reflections as LLMs looking down at everything and saying to themselves, “Hmm, that’s interesting”…and then writing it down somewhere for us.

Those can then be put into a second/third process that classifies them using more classical ML. And perhaps the final step takes a much smaller number of highly-refined events and gets them to the LLM for analysis.

Using Reflections to Compress LLM Context Data

Like most compression, there is loss here. We can lose the original events in the process in a way that makes it difficult to go backward and do attribution. So that’s something to think about. There will be use cases that are horrible for Reflections, but I think there will be many more where they’re incredibly useful.

How do you know what to keep and what to discard?

Well, that’s the question, isn’t it?

Expect hundreds of companies to spring up to work on this problem. Companies are in the world of terrabytes and petabytes, and LLMs are in the world of kilobytes and megabytes.

In order to make full use of LLM-based software, that gap needs to be closed.

Summary

Eventually this will turn into permanent pipelines flowing into continuous custom model training (and fine-tuning). And Reflections will be somewhat less useful when you don’t need that compression.

But that’ll be a while. Continuous training of company-scale custom models will be cost-prohibitive for most companies for many years, requiring us to continue to rely on large-context prompting, vector embeddings, and Reflections.

At least that’s how I’m seeing it. Let me know if I’ve missed something.



Source link