AI Inference: Let's Peek Under the Hood

You know how in our (last chat)[https://junod.dev/everyone-should-understand-ai-inference/], we talked about AI inference being as simple as putting something into an AI model and getting something back? Well, today I want to share some more interesting bits about this process.

Think of AI Models Like Video Games

You know how some video games can run on your phone, while others need a powerful gaming PC? AI models are exactly like that. Let me share a story that might help explain this better.

The other day, my phone recognized my face to unlock it - that's a small AI model running right on my phone. But when I ask ChatGPT a question, that's like playing a super-demanding game that needs powerful computers in the cloud to run.

Where Should Your AI Live?

Remember when we were kids and had to decide whether to play a game on our Nintendo or needed a bigger console? The same thing happens with AI. Here's how I think about it:

On Your Device (Like Your Phone)
- When you need things to happen really fast
- When you want to keep your data private
- When you don't need the AI to be super smart
In the Cloud (Like on the Internet)
- When you need the AI to be really clever
- When you don't mind waiting a tiny bit
- When you need to process lots of stuff at once

The Cool Part About Modern AI

Here's something that gets me excited: Remember how phones used to be really bad at games, and now they can run amazing graphics? The same thing is happening with AI.

It's amazing to see how small AI models are getting better and better at doing things we thought only big models could do, and how quickly it's happening. It's like how your phone can now run games that used to need a massive computer.

Where and how do I inference?

The AAA games of inference are the big products everyone is talking about, ChatGPT, Anthropic, MidJourney etc. The have the best experience, but come at a cost.

The studio games are the cloud apis, almost or just as capable as the AAA counterparts, but rough edges and techincal expertise is required. The cost comes down here, you can do a LOT for a LITTLE on the apis.

The indie games are local models. With a GPU or M4 you can run pretty capable models, your mobile phone can run more streamlined and single purpose models.

What This Means for Everyone

If you're using AI (and these days, who isn't?), here's what matters:

Small doesn't always mean weak anymore, targeted and small is super powerful
Your phone is getting smarter every day
Sometimes the best inference strategy isn't the easiest

Looking Forward

I've spent my career watching technology evolve, and this is one of the most exciting times I can remember. We're moving from a world where all the smart stuff had to happen in big data centers to one where intelligence is everywhere - in your phone, your watch, maybe even your coffee maker soon!

The really interesting part? This is just the beginning. Just like how phones went from simple calling devices to pocket computers, we're going to see AI move from big server rooms right into our everyday devices.