Now that we all have latency in our inference, we’re going to call it thinking

If you are regular of TTS, you will know that I do a weekly roundup of all major developments in the land of AI models and stuff. What you will have noticed, is that every week, there’s someone who has just updated their model with the latest gizmo’s. And with every new shiny new model, they promise us that this one will revolutionize the way we think, work, and procrastinate on social media.

Well, the latest latest brigade of AI gladiators are the reasoning models, like: ChatGPT o1, Alibaba’s Marco-o1, Claude from Anthropic, and Meta’s own Frankenstein creation.

And each of these titans boast about their “reasoning” capabilities, and they are promising deep and structured thought processes that rival, well, not quite human brains, but hey, close enough for now.

And in this article, I will try to explain why “reasoning” causes so much headache…


If you like this article and you want to support me:

  1. Comment, or share the article; that will really help spread the word 🙌
  2. Connect with me on Linkedin 🙏
  3. Subscribe to TechTonic Shifts to get your daily dose of tech 📰


What on earth is “reasoning” anyway

Let’s break it down into little pieces, for you unfortunate uninitiated individuals.

In the labyrinthine corridors of AI jargon, what is called reasoning, refers to the ability of the AI model to perform what is called (brace-brace-brace!): multi-step logical deductions.

DON’T YOU RUN AWAY NOW !!

Cause I’ll explain.

You can compare it to the detective who solves a whodunit (well in the case of an AI with reasoning “capabilties”, the detective took 12 hours to think through each clue and still got distracted by a squirrel). Simply speaking, reasoning involves the complicated dance of data parsing, pattern recognition, and some algorithmic wizardry to derive conclusions from given inputs. In theory it should not be about just spitting out pre-learned responses, but about constructing answers through a semblance of logical progression.

I said, in theory….

Going a little deeper, reasoning in AI models works because of complex architectures like the transformer architecture (The T in GPT), attention mechanisms, and sometimes, what we like to call a whiff of “magic”. These models should simulate a form of cognitive processing, but without things like pesky emotions or existential crises that plague us hoomans.

GPT models create responses by predicting one word (or a bunch of sentences) at a time based on the input and the most likely next word using statistical probabilities. This detailed calculation process takes time, which causes the delay you experience while waiting for the response to load.

In more swagger terms, reasoning is all about deductive and inductive processes.

Deductive reasoning is the AI’s ability to what you call “applying general rules to specific cases”. For instance, you know that all bachelors are unmarried men and thus you can deduce that your mate John, being a bachelor, is unmarried. Inductive reasoning, is all about “generalizing from specific instances”: The first conscious man alive noticed that the the sun has risen every day and so he concluded that it probably will again tomorrow. That is kind off the difference between being a know-it-all and a jack-of-all-trades in the digital realm.

And there’s also something that is called abductive reasoning.

And no, it is not your red neck cousins’ explanation of how he got a second butt-hole.

Abductive reasoning is the AI’s way of making the best guess based on incomplete information. You probably will have seen your phone autocorrect things like “duck tape” to “duct tape” and you wonder if it’s got your back or if it’s just trolling you.

This triad of reasoning types is what these big AI tech firms tout as their claim to intellectual superiority.


Reasoning. Step two on the road to AGI

Reasoning is the second step on the grand journey toward AGI according to the omnipotent Uncle Sam.

In my previous article—remember that gem?—I outlined this five-step roadmap where reasoning is the eager sophomore: it is full of potential but not quite the valedictorian yet. Read: The Future of ChatGPT – according to Sam (not about AGI) | LinkedIn

Step one in the grand scheme of things is Data Ingestion. And this is where these little electronic oompa loompas gorge themselves on every scrap of hooman knowledge on the internet (legal or not-so legal), and it turns them into walking (well, typing) encyclopedias. That is where we are now folks.

Step two, is where they want to be: Reasoning. And that is the AI’s ability to actually think things through. Kinda like , your chatbot thinking about it’s reason for existing while you are stuck waiting for it to respond. And step three is Creativity, because what else can you do with AGI if you don’t have the power to create your next meme or, heaven forbid, write a sonnet that doesn’t make you cringe.

Then we stumble into step four, which is Emotional Intelligence. And that is where the AI is supposed to develop a form of empathy, which is perfect for those moments when your virtual assistant pretends to care about your daily dread and it schedules your next dentist appointments. And last we there is step five: Autonomy. Supposedly the pièce de résistance where these AIs become independent entities which are making decisions on their own.


The sluggish side of “reasoning”

AI models that claim they are able to reason, have apparently decided to take the scenic route.

Latency is the uninvited guest at our reasoning party. If you frequently use an o1 model, you probably have developed repetitive strain injury because you have been twiddling their thumbs for so long, and you’re now wearing computer glasses because you watched the loading bar inch forward like it’s contemplating the meaning of life.

What was once a snappy, responsive tool now seems like waiting for a sloth to finish a crossword puzzle.

And that is the reason why I never use a model that reasons.

And to be clear, it does not produce any better results.

This newfound slowness is a full-blown crisis.

If you dare to ask your AI for a simple calculation, you are being treated to the suspense of a season finale cliffhanger. This delay stands in the way of us being productive with the tools, and it also makes interactions feel more like a therapy session than an efficient problem-solving endeavor.

It’s as if these AI models decided that “thinking deeply” meant taking a nap halfway through your query.

So slow, in fact, that some users are starting to suspect these companies are just hitting the snooze button on performance enhancements. The promise of deep reasoning has been hijacked by the need to process these convoluted algorithms, which is resulting in a trade-off where depth comes at the expense of speed.

It’s a classic case of selling a high-latency product masquerading as groundbreaking reasoning.

Bravo, big AI companies, bravo!


The great latency conspiracy

Now, let’s peel back the layers of this sluggish onion.

How exactly has latency become the star of the reasoning show?

It starts with model complexity.

The deeper and more intricate the architecture, the longer it takes to churn out a response. You can compare it to a simple manual typewriter and a high-end printing press. The latter can produce more complex documents, but the setup and the operation are painstakingly slow for everyday tasks.

And then there’s the computational overhead.

Reasoning requires more processing power, and more processing power equals more time. These models are cruising in the hyper car lane of our data centers, where every microsecond counts. The sheer volume of calculations needed to simulate reasoning processes bogs down response times. And this is making interactions feel like dial-up connections in our mostly fiber-optic world.

There’s also something called the resource allocation debacle. Well, ok, it was me who added the “debacle” thingy to it. Big AI companies have to juggle countless requests from all over the world. And all of these users all want to have a piece of the reasoning pie. And this congestion leads to bottlenecks, where your query gets stuck behind the latest request to predict the stock market.


The technical anatomy of latency

Next up is that I drag you through the nitty-gritty of how this latency magic trick is performed.

The layered architecture of these models has multiple stages of processing. And each of these steps are adding milliseconds (hahaha… in the case of the current models, minutes), to the response time. And each layer is designed to parse, to analyze, and to generate text with an increasing levels of complexity.

But the thing is that with each addition comes more of the dreaded lag.

The attention mechanisms I mentioned before make these models so “intelligent”, but they are computationally expensive. They require the model to weigh the importance of each word in a sentence against every other word. And that is a process that scales quadratically with input length.

Think of it this way… I am always having trouble conversing at parties. The last time I really did my best to fit in, I was trying to give every person in the crowded room my undivided attention…. simultaneously. Now for an introvert, that is impressive, but not exactly efficient nor fast (like simultaneous chess playing).

And also these models rely heavily on distributed computing resources. This means that they are spread out across many servers and data centers around the world. This means that response times are also subject to network latency and server load. And even with state-of-the-art infrastructure, the physical distance between data centers and users will introduce unavoidable delays.


The Apple researchers call BS

As you have probably learnt by now, researchers at Apple have thrown a wrench into this “reasoning” charade. Read: We should all start spelling AI as Ai because LLM’s are full of shit (according to Tim) | LinkedIn

They have done some serious digging into these models, and ran a few tests, and their findings suggest that what the industry calls “reasoning” is nothing more than a series of parlor tricks, held together by nothing but duct tape and a lot of wishful thinking. According to these scientists, the supposed “reasoning capabilities” are merely sophisticated pattern-matching games, parroting us, and lacking true understanding and having no cognitive depth at all. They say that the latency is NOT a sign of deep thought but rather a symptom of overcomplicated algorithms that are trying to pretend they are mimicking human behavior, just to get the hype going and the investors happy.


My personal take on all-o-this

I think that LLMs are not even as intelligent as our pets, let alone capable of AGI.

“My Dachshund, after all, has a mental model of the physical world, persistent memory, some good reasoning ability, and a capacity for planning. And none of these qualities are present in today’s ‘frontier’ AIs, including those made by the bigguns ”.

A key distinction between LLMs and humans (or even animals) is our ability to understand and interpret the physical world, or “context.” When humans learn, we rely not only on data but also on experiences that provide context. We understand spatial relationships, physical objects, cause-and-effect scenarios, and more. For example, a child learns not to touch a hot stove after being told once or by experiencing the sensation of heat. We are continuously processing sensory information and updating our mental model of the world accordingly.

In AI, however, context is much more limited. Current models can take documents, pictures, or videos as context, but they are far from interpreting and responding to the real-world physical environment in a meaningful way. Even in advanced applications like robotics, AI struggles with adapting to real-world complexities without extensive retraining. This is because the world is not a static dataset, and our experiences shape our understanding over time.

Human reasoning involves the ability to think critically, connect disparate pieces of information, and come to conclusions that aren’t explicitly stated. For instance, if my dog encounters a snake, it immediately reacts with caution, even if it has never seen a snake before. It can reason that the snake is a potential threat based on visual cues and its prior experiences with danger.

And the day that will happen, I not only will cheer that it can reason, but also that it has achieved AGI.

Signing off – Marco


Well, that’s a wrap for today. Tomorrow, I’ll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ♨️


Think a friend would enjoy this too? Share the newsletter and let them join the conversation. Google appreciates your likes by making my articles available to more readers.

Become an AI Expert !

Sign up to receive insider articles in your inbox, every week.

✔️ We scour 75+ sources daily

✔️ Read by CEO, Scientists, Business Owners, and more

✔️ Join thousands of subscribers

✔️ No clickbait - 100% free

We don’t spam! Read our privacy policy for more info.

Leave a Reply

Up ↑

Discover more from TechTonic Shifts

Subscribe now to keep reading and get access to the full archive.

Continue reading