The State of AI: Are We Missing Its Purpose While Overshooting Our Expectations?

Artificial intelligence has entered that strange and familiar stage of technological evolution where everyone agrees it is important, but very few people agree on what it is actually for.

To some, AI is the beginning of a new industrial revolution. To others, it is a speculative bubble dressed up in a hoodie and a GPU cluster. To the true believers, artificial general intelligence is just around the corner. To the skeptics, large language models are impressive statistical machines that can imitate intelligence without truly understanding the world.

The truth, as usual, is less emotionally satisfying than either extreme.

AI is not nothing. That much is obvious to anyone who has used it properly. It can summarize, classify, translate, write, compare, analyze, code, brainstorm, generate images, examine documents, and automate pieces of knowledge work that used to consume hours. Used carefully, it can be extraordinary.

But that does not mean it is magic.

And that may be the core problem. We are not only overestimating AI. We may be misunderstanding its purpose entirely.

We Have Confused Capability With Intelligence

The current AI boom has been driven by a fairly simple idea: make the models bigger, feed them more data, give them more computing power, and eventually something like general intelligence will emerge.

That assumption is now being questioned more seriously.

In the Steve Eisman discussion with Gary Marcus, Marcus describes the roots of the modern deep learning boom as a hardware breakthrough. GPUs, originally designed for video games, turned out to be remarkably good at handling the parallel computation required for neural networks. That hardware shift allowed researchers to train models on enormous amounts of data, creating the conditions for today’s large language models. But Marcus argues that this also gave rise to what he calls the scaling fallacy: the belief that if adding more data initially makes a model better, then adding vastly more data indefinitely will eventually produce general intelligence. That is the issue.

The jump from early models to today’s systems has been impressive. No honest person should deny that. But impressive does not automatically mean intelligent in the deeper sense. Large language models are very good at pattern recognition. They are very good at generating plausible language. They can mimic style, structure, tone, and logic. They can produce answers that feel intelligent. But feeling intelligent is not the same as being intelligent.

Marcus characterizes LLMs as “autocomplete on steroids,” designed to predict the next item in a sequence based on statistical patterns. The concern is that they operate primarily like fast, reflexive pattern systems rather than slow, deliberate reasoning systems. In Daniel Kahneman’s terms, they are closer to System 1 than System 2. They react. They pattern-match. They generate. But they do not reliably reason in the way humans do when we stop, test, question, doubt, and reconsider.

This distinction matters because much of the AI hype assumes that fluency equals understanding. It does not.

Hallucination Is Not a Minor Glitch

One of the great mistakes in the public conversation is treating AI hallucination as though it is a small bug that will eventually be ironed out with a few more updates.

Maybe some of it will improve. But hallucination is not merely a typo problem. It is connected to how these systems work.

The Eisman-Marcus summary explains that LLMs break information into small statistical fragments and reassemble them based on probability. In doing so, they can lose contextual connections and produce false information with complete confidence. Marcus gives the example of an LLM falsely claiming that actor Harry Shearer was a British voiceover artist, apparently blending together statistical associations around comedians, voice actors, and British performers while ignoring basic biographical facts.

That is not a harmless quirk when AI is used casually.

It becomes a serious problem when AI is placed inside business processes, customer service systems, legal workflows, medical tools, procurement reviews, financial analysis, or public administration.

This is where the conversation needs to mature. The question is not, “Can AI produce a good answer?” Of course it can. The better question is, “What happens when AI produces a bad answer that sounds good?” That is where the danger lives.

The Real World Is Not a Closed-Domain Test

AI often performs best where the domain is bounded, the rules are strict, and the answer can be verified. Coding, math, geometry, language translation, data extraction, classification, and structured document review can be excellent use cases. But the open world is messier.

The Eisman-Marcus document draws a useful distinction between closed domains and open-ended real-world scenarios. AI can do well in areas where rules are clear, and datasets are verifiable. It struggles more when novelty, ambiguity, judgment, politics, human behaviour, or unexpected circumstances enter the picture. The document gives the example of pattern-recognition systems identifying known hazards, such as bicycles or pedestrians, but failing with unexpected objects or scenarios that fall outside the system’s learned expectations. This is a crucial point.

The real world is not a benchmark

The real world has exceptions, edge cases, bad data, incomplete instructions, emotional customers, vague policies, conflicting rules, changing laws, poor documentation, and people who do things that make absolutely no sense, usually five minutes before a deadline. AI can help with that world. But it should not be mistaken for a complete understanding of that world.

The Great Pullback: When AI Meets Operations

The second PDF, The Great AI Pullback, adds the missing business reality. It argues that many companies that aggressively pushed AI into operations are now quietly walking back those decisions.

The examples are telling.

Starbucks reportedly rolled out an AI-powered inventory system across 11,000 stores to replace manual counting. It was promoted as much faster and highly accurate, but according to the document, the system missed items and even confused milk types. Nine months later, employees were instructed to return to manual counting. Pizza Hut faced a lawsuit involving a mandated AI dispatch system that allegedly harmed delivery performance and customer satisfaction. Air Canada lost a tribunal case after its chatbot invented a non-existent bereavement fare policy. Builder.ai, once valued at over $1 billion, allegedly claimed AI could automatically build software when much of the work was actually being done by human engineers.

This is where the rubber meets the road. The problem is not that AI cannot help companies. It clearly can. The problem is that companies often treat AI as if it can simply replace operational knowledge without consequences. That is fantasy.

Inventory management is not just counting. Delivery dispatch is not just routing. Customer service is not just answering. Software development is not just code generation. These are human systems with context, trade-offs, exceptions, experience, and judgment embedded inside them.

When companies rip out that human layer too quickly, they often discover that what looked like inefficiency was actually institutional memory. And institutional memory, inconvenient as it may be, is often what keeps the wheels from flying off.

The Workforce Reversal

The same pattern appears in workforce decisions.

The Great AI Pullback document describes companies that reduced staff or contractors after embracing AI, only to reverse course later. Klarna reportedly reduced its workforce after claiming an AI chatbot could perform the work of hundreds of agents, then later acknowledged damage to customer service quality and began rehiring. Salesforce reduced customer support staffing while using AI agents, then later shifted tone and announced new graduate hiring. Duolingo declared itself “AI first,” phased out contractors, faced backlash, and later softened its position. This should not surprise us.

Customer service is not only a transaction. It is trust management.

People do not contact support because they are having a lovely day and want to celebrate capitalism. They contact support because something has gone wrong. They are confused, annoyed, worried, or already irritated. In that moment, speed matters, but so does judgment. So does empathy. So does escalation. So does the ability to understand that a technically correct answer may still be a terrible answer.

AI can support customer service. It can summarize tickets. It can suggest responses. It can identify patterns. It can speed up resolution. It can help agents do better work.

But when companies use AI primarily as a headcount reduction weapon, they often confuse cost-cutting with improvement. Those are not the same thing.

The ROI Problem: AI Is Not Free Just Because It Feels Effortless

Another under-discussed problem is cost. To the end user, AI can feel almost weightless. You type a question, and an answer appears. It feels like magic. But behind the curtain are massive computing costs, data centres, GPUs, energy consumption, licensing fees, integration costs, security reviews, governance work, and the very real expense of cleaning up bad implementations.

The Great AI Pullback document highlights several cost and return-on-investment concerns. Uber is described as having widespread engineering use of AI tools, with AI accounting for a large share of committed code, while also burning through its AI budget faster than expected and struggling to connect usage to useful customer-facing outcomes. Microsoft is described as pledging major AI data-centre investment, then reportedly cancelling or delaying some capacity. The document also mentions enterprise AI tool costs spiralling when employees are given broad, uncapped access.

This points to a deeper issue.

AI adoption is easy to measure.

AI value is harder to measure.

A company can count how many employees used an AI tool. It can count prompts. It can count generated code. It can count documents summarized. It can count hours supposedly saved.

But that is not the same as measuring better decisions, better products, better customer satisfaction, lower risk, improved compliance, fewer errors, or higher profitability. Usage is not value. Activity is not progress.

And a dashboard full of AI metrics can still be a very expensive way of avoiding the harder question: did this actually improve the business?

The Bubble Question

The financial side of the AI boom is now impossible to ignore.

The Eisman-Marcus document argues that hyperscalers spent roughly $500 billion on GPUs in 2025 alone and suggests the industry may be reaching diminishing returns. It also raises the concern that if the entire market is pursuing the same scaling strategy, the technology risks becoming commoditized, with less durable competitive advantage than many investors assume.

The Great AI Pullback document also notes that major technology leaders and financial figures have begun moderating earlier expectations. It summarizes claims that some executives have walked back predictions about job displacement, while financial voices have warned about overvaluation and comparisons to previous infrastructure or dot-com style bubbles.

This does not mean AI is useless. That would be the wrong conclusion. The dot-com bubble did not mean the internet was useless. It meant the market temporarily lost its mind about what the internet could do, how quickly it could do it, and which companies would survive. The same may be true here. AI may be transformative and overhyped at the same time. That is not a contradiction. That is usually how major technology cycles behave.

The Mistake: Replacing People Instead of Expanding Them

The most important issue is not whether AI is useful.

The question is whether we are using it for the right purpose.

Too much of the current AI conversation is built around replacement. Replace workers. Replace writers. Replace lawyers. Replace analysts. Replace customer service. Replace developers. Replace teachers. Replace managers. Replace thinking.

That framing is not only socially dangerous; it is operationally naive.

AI works best when it expands human capability. It works best when it reduces the burden of repetitive cognitive labour. It works best when it helps people see patterns, summarize complexity, compare documents, identify risk, draft first versions, challenge assumptions, and move faster through information-heavy work. In other words, AI is not at its best when it removes the human. AI is at its best when it gives the human better leverage. That is a very different philosophy.

A calculator did not replace mathematics. Excel did not replace financial judgment. Search engines did not replace research. Power BI did not replace management. They changed the nature of the work. They allowed humans to operate at a different level.

AI should be seen in that same tradition. It is a cognitive power tool. But a power tool still requires an operator.

Human-in-the-Loop Is Not a Temporary Inconvenience

Some AI advocates treat human oversight as a temporary limitation. The assumption is that once models improve enough, humans can step aside. That is the wrong lesson.

In many domains, the human-in-the-loop is not a weakness. It is the safety system. Law, medicine, public procurement, finance, engineering, municipal government, education, mental health, compliance, and executive decision-making all require more than pattern recognition. They require accountability. They require judgment. They require values. They require someone who can say, “This may be technically correct, but it is not appropriate here.”

AI can help identify a risky contract clause, but it does not live with the lawsuit. AI can summarize medical notes, but it does not sit across from the patient. AI can recommend a hiring decision, but it does not bear moral responsibility for bias. AI can draft a procurement report, but it does not answer to taxpayers, councils, vendors, auditors, or courts. That matters.

The human-in-the-loop is not just a compliance checkbox. It is where responsibility lives.

The Better Future: AI as a Thinking Partner, Not a Thinking Replacement

The better future for AI is not a world where humans stop thinking, as I write about this in another post here.

It is a world where humans are freed from some of the administrative sludge that prevents them from thinking well. This is where AI can be genuinely powerful. It can help a lawyer review a contract faster.

It can help a procurement officer detect compliance risks earlier. It can help a project manager identify weak assumptions. It can help a doctor summarize a patient history. It can help a researcher compare evidence. It can help a small business owner produce professional documentation. It can help an analyst move from raw data to useful insight. It can help overworked people rise above the waterline.

That last point matters. In many organizations, people are not failing because they lack intelligence. They are failing because they are buried. Buried in emails. Buried in documents. Buried in approvals. Buried in systems that do not talk to each other. Buried in meetings about meetings. Buried in work that feels urgent but is often not important.

AI’s real value may be giving people back enough room to do the work that actually requires them.

We Need World Models, Not Just Word Models

The Eisman-Marcus document ends with an important technical argument: if AI is going to move beyond fluent pattern generation, it likely needs something closer to world models and neuro-symbolic reasoning. In simple terms, AI needs internal representations of how the world works, not just statistical representations of how words tend to appear together. The document argues that future systems may need to combine neural networks’ pattern-recognition strengths with more rule-based, logical, symbolic reasoning. That is a crucial point.

A system that can generate a paragraph about gravity does not necessarily understand gravity.

A system that can summarize a contract does not necessarily understand business risk.

A system that can describe ethics does not necessarily behave ethically.

A system that can write about common sense does not necessarily have common sense.

That does not make the system useless. It means we need to understand what kind of machine we are dealing with.

Right now, much of AI is closer to a language engine than a judgment engine. It can help us with words, patterns, summaries, structures, associations, and possibilities. That is incredibly useful. But it is not the same as grounded understanding.

The Next Phase Must Be Less Theatrical

The next phase of AI needs to be more disciplined and less theatrical.

Less “AI will replace everything.”

More “AI can improve this specific process under proper supervision.”

Less “the model knows.”

More “the model suggests, and the human verifies.”

Less “we need an AI strategy.”

More “we need a business strategy that uses AI where it actually makes sense.”

Less “look what the demo can do.”

More “show me what happens after six months in production.”

That may sound less exciting, but it is far more useful.

The real state of AI is not failure. It is not destiny. It is not salvation. It is not a hoax.

It is a powerful, immature, uneven, rapidly evolving technology being pushed into the world faster than many institutions, businesses, laws, and habits can properly absorb it. That does not mean we should reject it. It means we should grow up about it.

The Other Problem: Many People Are Still Using AI Like a Fancy Search Box

There is another problem in the AI conversation that does not get enough attention. Many people using AI are not really using AI. They are using a tiny surface layer of it.

They ask it to summarize a PDF. They ask it to prepare a slide deck. They ask it to write an email, create an executive summary, clean up a paragraph, or turn a long document into bullet points.

Those are useful tasks. There is nothing wrong with them. In fact, for many people, that is the first “aha” moment with AI. You give it a 70-page document, and seconds later it gives you the main points. It feels impressive because, frankly, it is impressive.

But that is not where the real power is. That is entry-level AI use. The deeper value comes when you stop treating AI like a vending machine and start treating it like a directed thinking partner.

There is a major difference between asking, “Create a presentation from this document,” and asking, “Research the audience I am presenting to. Understand their culture, pressures, vocabulary, public priorities, political environment, business concerns, and likely objections. Then help me build a presentation that speaks their language, respects their reality, and makes the strongest possible case without sounding like a generic vendor pitch.”

That is a completely different use of the tool. One is output generation. The other is strategic alignment. And that is where AI starts to become genuinely powerful.

Prompting Is Not Typing. It Is Directing.

The best use of AI is not simply asking a question and accepting the answer.

It is directing the model. You can tell it what mindset to use. You can tell it to think like a lawyer, a procurement officer, a CFO, a skeptical journalist, a policy analyst, a systems architect, a municipal CAO, a professor, a customer, or an irritated board member who has seen too many bad presentations and has no interest in being dazzled by buzzwords.

You can tell it to be factual and stripped of emotion. You can ask it to remove all sales language and write as if it were preparing evidence for a boardroom discussion. You can ask it to be blunt, clinical, and diagnostic — something closer to how Dr. House might present a case: no fluff, no dancing around the obvious, just the uncomfortable truth with the edges left on.

Or you can ask it to articulate an argument with the structured psychological precision of Jordan Peterson and the skeptical wit of Bill Maher. Not to imitate them lazily, but to borrow the useful qualities: clarity, hierarchy, sharpness, rhythm, and a refusal to pretend weak ideas are strong ones.

That kind of prompting changes the output dramatically. AI responds to direction. The better the direction, the better the result.

Context Is the Real Fuel

The real power of AI comes from context.

A weak prompt gives the model almost nothing to work with.

A strong prompt gives it a world.

That world may include background documents, PDFs, transcripts, research notes, personal observations, audience profiles, company history, competitive positioning, prior drafts, objections, tone preferences, legal constraints, industry realities, and the specific purpose of the final output.

This is where many users fall short. They ask for a high-quality answer but provide low-quality context. Then they blame the model.

That is like giving a chef a potato, a paper plate, and a microwave, then complaining they did not produce fine dining.

For serious work, I tend to give AI what amounts to a small library. I provide background documents, PDFs, notes, examples, prior thinking, concerns, audience details, and my own interpretation of the issue. I tell it what matters, what not to overstate, what tone to avoid, what outcome I want, and what assumptions should be challenged. At that point, AI is not guessing in the dark. It is working inside a structured field of information. That is when the quality changes.

You Can Bring Experts Into the Room

One of the most underused strengths of AI is the ability to simulate expert perspectives. Not perfectly. Not as a replacement for real expertise. But as a structured thinking exercise, it can be extremely useful. You can ask the model to review an idea from the perspective of a lawyer, an economist, a psychologist, a compliance officer, a cybersecurity expert, a procurement specialist, a project manager, a taxpayer, a customer, or a skeptical executive.

You can ask each perspective to identify risks, weak assumptions, missing evidence, likely objections, ethical concerns, implementation issues, or communication problems. This does not mean the AI has become those people. It means you are forcing the model to examine the work through different lenses.

That matters because most of us naturally think from our own position. AI can help widen that field of view. It can reveal blind spots. It can show where an argument is too emotional, too technical, too thin, too self-congratulatory, or too disconnected from the audience. In that sense, AI becomes a structured opposition system. It helps you argue with yourself before the world gets the chance.

The Final Step: Make AI Critique Its Own Work

The first answer AI gives you is rarely the best answer. That is another mistake many users make. They prompt once, receive an output, and call it done. That is not how serious AI use works.

Once the model produces something, you can turn it around and ask it to review the work harshly. You can ask it to identify weak logic, unsupported claims, vague language, tone problems, missing context, audience mismatch, overstatement, factual risk, or sections that sound too much like generic AI writing. You can even ask it to be brutally honest — the Simon Cowell stage of the process. Not cruel for the sake of it, but direct. No polite applause. No participation trophy. Tell me where this fails. Tell me what sounds fake. Tell me what a skeptical reader would reject. Tell me what needs more evidence. Tell me what should be cut.

That review process is often where the real improvement happens. AI should not just help produce the work. It should help interrogate the work. That is the difference between using AI as a shortcut and using AI as a quality system.

The Skill Gap Nobody Wants to Admit

This is why the question “Can AI do good work?” is incomplete.

A better question is: “Can the person using AI direct it well enough to get good work?” That is the uncomfortable part.

AI exposes the quality of the user’s thinking. If the user gives vague instructions, thin context, poor direction, and accepts the first answer uncritically, the result will often be average. Maybe polished. Maybe useful. But average. The model may be powerful, but the human still has to frame the task. The human still has to know what good looks like.

The human still has to judge whether the output is accurate, appropriate, ethical, persuasive, and useful. In that sense, AI does not eliminate the need for intelligence. It rewards it.

The best users of AI will not be the people who simply know how to type prompts. They will be the people who know how to think clearly, ask better questions, provide better context, challenge weak answers, and direct the model toward a specific outcome.

That is where the gap will form. Not between people who use AI and people who do not. But between people who use AI casually and people who use it seriously.

AI Is Not the Shortcut. It Is the Amplifier.

This may be the most important point. AI does not automatically make weak thinking strong. It amplifies the thinking behind the prompt. If the prompt is shallow, the answer will often be shallow. If the prompt is strategic, contextual, disciplined, and demanding, the answer can become dramatically better.

That is why using AI well is not about asking it to do your thinking for you. It is about building a better thinking process around it. The real power is not in saying, “Write this for me.”

The real power is in saying, “Help me think through this properly. Challenge my assumptions. Study the audience. Bring in relevant perspectives. Organize the argument. Remove the fluff. Strengthen the evidence. Find the weak spots. Make it sharper. Then review it like a critic who owes me nothing.”

That is a very different relationship with AI. And it may be where the true value is hiding.

Final Thought: The Purpose Was Never to Stop Thinking

The most dangerous misunderstanding about AI is the idea that it should think for us. That is exactly backwards.

AI should force us to think better. It should help us ask sharper questions, test assumptions, compare evidence, reduce cognitive overload, and see what we might otherwise miss. It should help us do the first draft, the second look, the risk scan, the pattern search, the summary, the comparison, and the sanity check.

But it should not become a substitute for human responsibility. The future of AI should not be humans stepping aside while machines take over judgment. The better future is humans using machines to reclaim judgment from the administrative fog that has buried so much modern work. That may be the real purpose of AI. Not artificial intelligence replacing human intelligence.

But artificial intelligence giving human intelligence room to breathe.