Why I believe in SOTA models over custom ones


I’ve never been a big believer in training custom models. I’ve also never believed in fine-tuning.

Going all the way back to 2023, my intuition has always pushed me towards the best SOTA model possible, combined with context management.

I just finally crystallized my reasoning around this:

Anytime you think you’re using a small model for a small task, there’s usually a whole lot more going into a given decision than just that individual area of expertise.

For example, labeling emails. Writing reports. Processing security events. Searching for threats on a network.

On one hand I think these are specialized, but the fact is the smarter and more experienced a human is who has this expertise, the better job they will do.

This is because most specialized tasks still benefit from the general life experience of the person doing the execution.

This is why I think the future is not a whole bunch of extremely small specialized models throughout the enterprise.

I think what’s far more likely is more of an opus sonnet haiku model where the best of the best just keeps coming down in price, including being open source.

And those smaller models are used in conjunction with context to perform all the different tasks in an organization at much lower cost.

But they will still be extremely general models, not tiny and narrow custom ones.



Source link