AI Paradox: Smarter Than Us, Dumber Than Cats

An article we liked from Thought Leader Daniel Jeffries:

Why LLMs are Much Smarter than You and Dumber than Your Cat

And What We Can Do About It - A Blueprint for AGI

These two things are true at the same time:

GPT-4/4o, Claude Sonnet 3.5 and Llama 3.1 are infinitely smarter than you.
GPT-4/4o, Claude Sonnet 3.5 and Llama 3.1 are much, much dumber than your cat.

How is that even possible?

How can we have a machine that can pass the bar exam, write code, speak 100 languages fluently and yet can't figure out new patterns, forgets what it was doing five minutes ago, and rewrites the exact same function in your code with a different name despite the other function sitting right there under its nose in the same damn file?

The answer will take us on an epic journey across the bleeding edge of modern AI, the entire multi-billion year evolution of life on this planet, and deep into that wonderful lump of grey matter in your skull, the human brain.

It's easy to get swept up in the accelerationist/doomer narrative that the Singularity is near and Artificial Super Intelligence (ASI) is just around the corner and it will give us all runaway superpowers or kill us all.

I love the systems we have today. They're limited but they're incredibly useful. They've helped me learn to code at the ripe old age of 48 and they've helped me learn A1 Spanish and they rapidly find answers to hard questions for me via RAG and search enabled LLMs. I used to use Google first for everything and now I use Perplexity 80% of the time and only do deep searches on Google when I need to crawl through results myself manually using my human common sense and intuition to find the answer.

But they're still limited.

If you work with the modern marvels of Claude Sonnet 3.5 or GPT 4o or Qwen 2 (and don't get me wrong, these little machines are marvelous) it doesn't take long for the illusion of super intelligence to wear off.

If you want to understand the limits of AI today, try to build an agent.

And I mean a real agent, one that does complex, open ended tasks in the real world, not one that fetches a few PDFs and summarizes them.

You can tell your human co-worker: "I need you to go off and put together a complete plan for outreach to our potential customers, craft a list of needed assets, setup an outreach sequence and come up with a novel way to make those interactions feel more natural rather than doing cold email outreach."

They'll figure how to do it and all the complex intermediate steps to take along the way, while asking questions or getting the help they need to get it done if they don't understand each step.

An agent will hallucinate five to twenty steps into such an open ended task and go completely off the rails.

It will never figure out the right assets or where to look. It won't figure out how to find and set up the right outreach software and craft the message based on theory of mind (aka understanding other people) and how they will respond. It will fail in countless ways big and small. It just can't do this yet. Not with today's technology and probably not with tomorrow's either.

Open ended agents are what we want and what every advanced team is working towards.

We really want digital people or digital super people working closely with us as partners and friends.

That would mean we have intelligent machines that can go out with minimal instruction and do open ended tasks, plan ahead, simulate outcomes, learn as they go and adjust in real time to ever changing circumstances in the endless variation of the real world, all while overcoming obstacles and adjusting their mental structure with an advanced neuroplasticity that models the brain, while also constantly updating their understanding of the world and how they play in it and what they did "right" or "wrong" in the ever shifting arena of life.

But that is not what we have today and anyone who thinks ASI is just around the corner is likely getting high on their own supply.

At best we have narrow agents that can fetch things from the web and summarize them, score them, do some outreach if the systems are already set up for them, automate parts of some tasks if those tasks are scoped down really, really tightly and compartmentalized with a lot of glue code.

My team is building agents. And every team I've talked to that's building agents has pivoted from "we'll build a general purpose agent" to building a "narrow agent" where they've drastically pruned the scope of their ambitions and surrounded a Multimodal Large Language Model (MLLM) with expert heuristics and guardrails and smaller models to...